{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T21:41:43Z","timestamp":1777498903891,"version":"3.51.4"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T00:00:00Z","timestamp":1660176000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T00:00:00Z","timestamp":1660176000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"AIDOART project","award":["101007350"],"award-info":[{"award-number":["101007350"]}]},{"DOI":"10.13039\/501100006256","name":"Universit\u00e0 degli Studi dell\u2019Aquila","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006256","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Software repositories are increasingly essential to support the management of typical artifacts building up projects, including source code, documentation, and bug reports. GitHub is at the forefront of this kind of platforms, providing developer with a reservoir of code contained in more than 28M repositories. To help developers find the right artifacts, GitHub uses topics, which are short texts assigned to the stored artifacts. However, assigning inappropriate topics to a repository might hamper its popularity and reachability. In our previous work, we implemented MNBN and TopFilter to recommend GitHub topics. MNBN exploits a stochastic network to predict topics, while TopFilter relies on a syntactic-based function to recommend topics. In this paper, we extend our work by building HybridRec, a recommender system based on stochastic and collaborative-filtering techniques to generate more relevant topics. To deal with unbalanced datasets, we employ a Complement Na\u00efve Bayesian Network (CNBN). Furthermore, we apply a preprocessing phase to clean and refine the input data before feeding the recommendation engine. An empirical evaluation demonstrates that HybridRec outperforms three state-of-the-art baselines, obtaining a better performance with respect to various metrics. We conclude that the conceived framework can be used to help developers increase their projects\u2019 visibility.<\/jats:p>","DOI":"10.1007\/s10489-022-03864-y","type":"journal-article","created":{"date-parts":[[2022,8,11]],"date-time":"2022-08-11T06:04:52Z","timestamp":1660197892000},"page":"9708-9730","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["HybridRec: A recommender system for tagging GitHub repositories"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7909-3902","authenticated-orcid":false,"given":"Juri","family":"Di Rocco","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5077-6793","authenticated-orcid":false,"given":"Davide","family":"Di Ruscio","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9872-9542","authenticated-orcid":false,"given":"Claudio","family":"Di Sipio","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3666-4162","authenticated-orcid":false,"given":"Phuong T.","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9622-5949","authenticated-orcid":false,"given":"Riccardo","family":"Rubei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,8,11]]},"reference":[{"key":"3864_CR1","doi-asserted-by":"publisher","unstructured":"Al-Shamri MYH Similarity modifiers for enhancing the recommender system performance. Applied Intelligence. https:\/\/doi.org\/10.1007\/s10489-021-02900-7https:\/\/doi.org\/10.1007\/s10489-021-02900-7","DOI":"10.1007\/s10489-021-02900-7 10.1007\/s10489-021-02900-7"},{"key":"3864_CR2","doi-asserted-by":"publisher","unstructured":"Altarawy D, Shahin H, Mohammed A, Meng N (2018) Lascad: Language-agnostic software categorization and similar application detection. J Syst Softw, 142. https:\/\/doi.org\/10.1016\/j.jss.2018.04.018https:\/\/doi.org\/10.1016\/j.jss.2018.04.018","DOI":"10.1016\/j.jss.2018.04.018 10.1016\/j.jss.2018.04.018"},{"key":"3864_CR3","doi-asserted-by":"publisher","unstructured":"Borges H, Hora AC, Valente MT (2016) Understanding the factors that impact the popularity of GitHub repositories. In: 2016 IEEE International conference on software maintenance and evolution, ICSME 2016, Raleigh, NC, USA, October 2-7, 2016, pp 334\u2013344. IEEE Computer Society. https:\/\/doi.org\/10.1109\/ICSME.2016.31","DOI":"10.1109\/ICSME.2016.31"},{"key":"3864_CR4","doi-asserted-by":"publisher","unstructured":"Cai X, Zhu J, Shen B, Chen Y (2016) Greta: graph-based tag assignment for github repositories. In: 2016 IEEE 40th Annual computer software and applications conference (compsac), vol 1, pp 63\u201372. https:\/\/doi.org\/10.1109\/COMPSAC.2016.124","DOI":"10.1109\/COMPSAC.2016.124"},{"key":"3864_CR5","doi-asserted-by":"publisher","unstructured":"Cosentino V, Luis J, Cabot J (2016) Findings from github: methods, datasets and limitations. In: Proceedings of the 13th international conference on mining software repositories, MSR \u201916. https:\/\/doi.org\/10.1145\/2901739.2901776. Association for Computing Machinery, New York, pp 137\u2013141","DOI":"10.1145\/2901739.2901776"},{"key":"3864_CR6","doi-asserted-by":"publisher","unstructured":"Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, ICML \u201906. https:\/\/doi.org\/10.1145\/1143844.1143874. ACM, New York, pp 233\u2013240","DOI":"10.1145\/1143844.1143874"},{"key":"3864_CR7","doi-asserted-by":"publisher","unstructured":"Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen P, Rubei R (2020) Topfilter: an approach to recommend relevant github topics. In: Proceedings of the 14th ACM \/ IEEE international symposium on empirical software engineering and measurement (ESEM), ESEM \u201920. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/3382494.3410690","DOI":"10.1145\/3382494.3410690"},{"key":"3864_CR8","doi-asserted-by":"publisher","unstructured":"Di Sipio C, Rubei R, Di Ruscio D, Nguyen PT (2020) A multinomial na\u00efve bayesian (mnb) network to automatically recommend topics for github repositories. In: Proceedings of the evaluation and assessment in software engineering, EASE \u201920. https:\/\/doi.org\/10.1145\/3383219.3383227. Association for Computing Machinery, New York, pp 71\u201380","DOI":"10.1145\/3383219.3383227"},{"key":"3864_CR9","doi-asserted-by":"publisher","unstructured":"Fan H, Zhong Y, Zeng G, Ge C Improving recommender system via knowledge graph based exploring user preference. Applied Intelligence. https:\/\/doi.org\/10.1007\/s10489-021-02872-8","DOI":"10.1007\/s10489-021-02872-8"},{"key":"3864_CR10","unstructured":"Ganesan K Topic suggestions for millions of repositories - the GitHub Blog (2017). https:\/\/github.blog\/2017-07-31-topics\/"},{"key":"3864_CR11","doi-asserted-by":"crossref","unstructured":"Gousios G, Spinellis D (2012) Ghtorrent: Github\u2019s data from a firehose. In: 2012 9th IEEE Working conference on mining software repositories (MSR), pp 12\u201321. IEEE","DOI":"10.1109\/MSR.2012.6224294"},{"key":"3864_CR12","unstructured":"Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki. https:\/\/www.aclweb.org\/anthology\/L18-1550"},{"issue":"5","key":"3864_CR13","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s10664-021-09976-2","volume":"26","author":"M Izadi","year":"2021","unstructured":"Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Empir Softw Eng 26(5):93. https:\/\/doi.org\/10.1007\/s10664-021-09976-2https:\/\/doi.org\/10.1007\/s10664-021-09976-2","journal-title":"Empir Softw Eng"},{"issue":"1","key":"3864_CR14","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1007\/s10664-016-9436-6","volume":"22","author":"J Jiang","year":"2017","unstructured":"Jiang J, Lo D, He J, Xia X, Kochhar PS, Zhang L (2017) Why and how developers fork what from whom in GitHub? Empir Softw Eng 22(1):547\u2013578. https:\/\/doi.org\/10.1007\/s10664-016-9436-6","journal-title":"Empir Softw Eng"},{"key":"3864_CR15","doi-asserted-by":"publisher","unstructured":"Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories - MSR 2014. https:\/\/doi.org\/10.1145\/2597073.2597074. ACM Press, Hyderabad, India, pp 92\u2013101","DOI":"10.1145\/2597073.2597074"},{"key":"3864_CR16","doi-asserted-by":"crossref","unstructured":"Kibriya AM, Frank E, Pfahringer B, Holmes G (2005) Multinomial naive bayes for text categorization revisited. In: Webb GI, Yu X (eds) AI 2004: advances in artificial intelligence. Springer, Berlin, pp 488\u2013499","DOI":"10.1007\/978-3-540-30549-1_43"},{"key":"3864_CR17","unstructured":"Kohavi R, et al. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, vol 14, Montreal, pp 1137\u20131145"},{"key":"3864_CR18","doi-asserted-by":"crossref","unstructured":"Li X, Wang H, Yin G, Wang T, Yang C, Yu Y, Tang D (2012) Inducing taxonomy from tags: an agglomerative hierarchical clustering framework. In: Zhou S, Zhang S, Karypis G (eds) Advanced data mining and applications. Springer, Berlin, pp 64\u201377","DOI":"10.1007\/978-3-642-35527-1_6"},{"issue":"3","key":"3864_CR19","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1007\/s10664-012-9230-z","volume":"19","author":"M Linares-V\u00e1squez","year":"2014","unstructured":"Linares-V\u00e1squez M, Mcmillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empir Softw Engg 19(3):582\u2013618. https:\/\/doi.org\/10.1007\/s10664-012-9230-z","journal-title":"Empir Softw Engg"},{"key":"3864_CR20","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems - volume 2, NIPS\u201913. Curran Associates Inc., Red Hook, pp 3111\u20133119"},{"key":"3864_CR21","doi-asserted-by":"publisher","first-page":"110,460","DOI":"10.1016\/j.jss.2019.110460","volume":"161","author":"PT Nguyen","year":"2020","unstructured":"Nguyen PT, Di Rocco J, Di Ruscio D, Di Penta M (2020) CrossRec: supporting software developers by recommending third-party libraries. J Syst Softw 161:110,460. https:\/\/doi.org\/10.1016\/j.jss.2019.110460https:\/\/doi.org\/10.1016\/j.jss.2019.110460, http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0164121219302341","journal-title":"J Syst Softw"},{"key":"3864_CR22","doi-asserted-by":"publisher","unstructured":"Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https:\/\/doi.org\/10.3115\/v1\/D14-1162, https:\/\/www.aclweb.org\/anthology\/D14-1162. Association for Computational Linguistics, Doha, pp 1532\u20131543","DOI":"10.3115\/v1\/D14-1162"},{"key":"3864_CR23","unstructured":"Rennie JDM, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on international conference on machine learning, ICML\u201903, pp 616\u2013623. AAAI Press"},{"issue":"4","key":"3864_CR24","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1109\/MS.2009.161","volume":"27","author":"M Robillard","year":"2010","unstructured":"Robillard M, Walker R, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Softw 27(4):80\u201386. https:\/\/doi.org\/10.1109\/MS.2009.161","journal-title":"IEEE Softw"},{"issue":"4","key":"3864_CR25","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s10664-021-09963-7","volume":"26","author":"J Di Rocco","year":"2021","unstructured":"Di Rocco J, Di Ruscio D, Di Sipio C, Nguyen PT, Rubei R (2021) Development of recommendation systems for software engineering: the CROSSMINER experience. Empir Softw Eng 26(4):69","journal-title":"Empir Softw Eng"},{"key":"3864_CR26","unstructured":"Sas C, Capiluppi A. (2021) Labelgit: a dataset for software repositories classification using attributed dependency graphs"},{"key":"3864_CR27","first-page":"291","volume-title":"The adaptive web. chap. Collaborative filtering recommender systems","author":"JB Schafer","year":"2007","unstructured":"Schafer JB, Frankowski D, Herlocker J, Sen S (2007) The adaptive web. chap. Collaborative filtering recommender systems. Springer, Berlin, pp 291\u2013324. http:\/\/dl.acm.org\/citation.cfm?id=1768197.1768208"},{"key":"3864_CR28","doi-asserted-by":"publisher","unstructured":"Soll M, Vosgerau M (2017) Classifyhub: an algorithm to classify github repositories, pp 373\u2013379. https:\/\/doi.org\/10.1007\/978-3-319-67190-1_34https:\/\/doi.org\/10.1007\/978-3-319-67190-1_34","DOI":"10.1007\/978-3-319-67190-1_34 10.1007\/978-3-319-67190-1_34"},{"issue":"5","key":"3864_CR29","doi-asserted-by":"publisher","first-page":"672","DOI":"10.1108\/OIR-09-2012-0152","volume":"37","author":"B Taraghi","year":"2013","unstructured":"Taraghi B, Grossegger M, Ebner M, Holzinger A (2013) . Web analytics of user path tracing and a novel algorithm for generating recommendations in open journal systems 37(5):672\u2013691. https:\/\/doi.org\/10.1108\/OIR-09-2012-0152, Publisher: Emerald Group Publishing Limited","journal-title":"Web analytics of user path tracing and a novel algorithm for generating recommendations in open journal systems"},{"issue":"1","key":"3864_CR30","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s10844-020-00633-6 10.1007\/s10844-020-00633-6","volume":"57","author":"TNT Tran","year":"2020","unstructured":"Tran TNT, Felfernig A, Trattner C, Holzinger A (2020) . Recommender systems in the healthcare domain: state-of-the-art and research issues 57(1):171\u2013201. https:\/\/doi.org\/10.1007\/s10844-020-00633-6https:\/\/doi.org\/10.1007\/s10844-020-00633-6","journal-title":"Recommender systems in the healthcare domain: state-of-the-art and research issues"},{"key":"3864_CR31","doi-asserted-by":"publisher","unstructured":"Vargas-Baldrich S, Linares-V\u00e1squez M, Poshyvanyk D (2015) Automated tagging of software projects using bytecode and dependencies. In: 2015 30th IEEE\/ACM international conference on automated software engineering (ASE), pp 289\u2013294. https:\/\/doi.org\/10.1109\/ASE.2015.38","DOI":"10.1109\/ASE.2015.38"},{"key":"3864_CR32","doi-asserted-by":"publisher","unstructured":"Vel\u00e1zquez-Rodr\u00edguez C, Roover CD (2020) MUTAMA: an automated multi-label tagging approach for software libraries on maven. In: 2020 IEEE 20th international working conference on source code analysis and manipulation (SCAM), pp 254\u2013258. https:\/\/doi.org\/10.1109\/SCAM51674.2020.00034, ISSN: 2470-6892","DOI":"10.1109\/SCAM51674.2020.00034"},{"issue":"1","key":"3864_CR33","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s11704-013-2394-x","volume":"8","author":"T Wang","year":"2014","unstructured":"Wang T, Wang H, Yin G, Ling CX, Li X, Zou P (2014) Tag recommendation for open source software. Front Comput Sci 8(1):69\u201382. https:\/\/doi.org\/10.1007\/s11704-013-2394-x","journal-title":"Front Comput Sci"},{"key":"3864_CR34","doi-asserted-by":"crossref","unstructured":"Zhang Y, Xu F, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories","DOI":"10.1109\/ICDM.2019.00098"},{"key":"3864_CR35","doi-asserted-by":"publisher","unstructured":"Zhao ZD, Shang Ms (2010) User-based collaborative-filtering recommendation algorithms on hadoop. In: Proceedings of the 2010 third international conference on knowledge discovery and data mining, WKDD \u201910. https:\/\/doi.org\/10.1109\/WKDD.2010.54. IEEE Computer Society, Washington, DC, pp 478\u2013481","DOI":"10.1109\/WKDD.2010.54"},{"key":"3864_CR36","doi-asserted-by":"publisher","unstructured":"Zhou Y, Wu J, Sun Y (2021) Ghtrec: a personalized service to recommend github trending repositories for developers. In: 2021 IEEE International conference on web services (ICWS), pp 314\u2013323. https:\/\/doi.org\/10.1109\/ICWS53863.2021.00049","DOI":"10.1109\/ICWS53863.2021.00049"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03864-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-03864-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03864-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,30]],"date-time":"2023-04-30T09:30:19Z","timestamp":1682847019000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-03864-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,11]]},"references-count":36,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["3864"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-03864-y","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,11]]},"assertion":[{"value":"7 June 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 August 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}