{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:31:52Z","timestamp":1740173512034,"version":"3.37.3"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,8,4]],"date-time":"2022-08-04T00:00:00Z","timestamp":1659571200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,4]],"date-time":"2022-08-04T00:00:00Z","timestamp":1659571200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Sci Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The recommendation of items based on the sequential past users\u2019 preferences has evolved in the last few years, mostly due to deep learning approaches, such as BERT4Rec. However, in scientific fields, recommender systems for recommending the next best item are not widely used. The main goal of this work is to improve the results for the recommendation of the next best item in scientific domains using sequence aware datasets and algorithms. In the first part of this work, we present the adaptation of a previous method (LIBRETTI) for creating sequential recommendation datasets for scientific fields. The results were assessed in Astronomy and Chemistry. In the second part of this work, we propose a new approach to improve the datasets, not the algorithms, to obtain better recommendations. The new hybrid approach is called sequential enrichment (SeEn), which consists of adding to a sequence of items the n most similar items after each original item. The results show that the enriched sequences obtained better results than the original ones. The Chemistry dataset improved by approximately seven percentage points and the Astronomy dataset by 16 percentage points for Hit Ratio and Normalized Discounted Cumulative Gain.<\/jats:p>","DOI":"10.1038\/s41597-022-01598-7","type":"journal-article","created":{"date-parts":[[2022,8,4]],"date-time":"2022-08-04T15:05:42Z","timestamp":1659625542000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["SeEn: Sequential enriched datasets for sequence-aware recommendations"],"prefix":"10.1038","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9728-9618","authenticated-orcid":false,"given":"Marcia","family":"Barros","sequence":"first","affiliation":[]},{"given":"Andr\u00e9","family":"Moitinho","sequence":"additional","affiliation":[]},{"given":"Francisco M.","family":"Couto","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,8,4]]},"reference":[{"key":"1598_CR1","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1109\/TCBB.2018.2864739","volume":"17","author":"H Wang","year":"2018","unstructured":"Wang, H., Xi, J., Wang, M. & Li, A. Dual-layer strengthened collaborative topic regression modeling for predicting drug sensitivity. IEEE\/ACM transactions on computational biology and bioinformatics 17, 587\u2013598 (2018).","journal-title":"IEEE\/ACM transactions on computational biology and bioinformatics"},{"key":"1598_CR2","doi-asserted-by":"crossref","first-page":"1352","DOI":"10.1109\/TCBB.2019.2913855","volume":"17","author":"C Lan","year":"2019","unstructured":"Lan, C., Chandrasekaran, S. N. & Huan, J. On the unreported-profile-is-negative assumption for predictive cheminformatics. IEEE\/ACM transactions on computational biology and bioinformatics 17, 1352\u20131363 (2019).","journal-title":"IEEE\/ACM transactions on computational biology and bioinformatics"},{"key":"1598_CR3","doi-asserted-by":"publisher","first-page":"75","DOI":"10.3389\/fgene.2020.00075","volume":"11","author":"A Emdadi","year":"2020","unstructured":"Emdadi, A. & Eslahchi, C. Dsplmf: a method for cancer drug sensitivity prediction using a novel regularization approach in logistic matrix factorization. Frontiers in genetics 11, 75 (2020).","journal-title":"Frontiers in genetics"},{"key":"1598_CR4","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1109\/TCBB.2020.2968442","volume":"18","author":"H Lim","year":"2020","unstructured":"Lim, H. & Xie, L. A new weighted imputed neighborhood-regularized tri-factorization one-class collaborative filtering algorithm: Application to target gene prediction of transcription factors. IEEE\/ACM transactions on computational biology and bioinformatics 18, 126\u2013137 (2020).","journal-title":"IEEE\/ACM transactions on computational biology and bioinformatics"},{"key":"1598_CR5","doi-asserted-by":"crossref","unstructured":"Garg, S. Drug recommendation system based on sentiment analysis of drug reviews using machine learning. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 175\u2013181 (IEEE, 2021).","DOI":"10.1109\/Confluence51648.2021.9377188"},{"key":"1598_CR6","doi-asserted-by":"publisher","first-page":"e18035","DOI":"10.2196\/18035","volume":"23","author":"R De Croon","year":"2021","unstructured":"De Croon, R. et al. Health recommender systems: Systematic review. Journal of Medical Internet Research 23, e18035 (2021).","journal-title":"Journal of Medical Internet Research"},{"key":"1598_CR7","doi-asserted-by":"publisher","first-page":"22","DOI":"10.3847\/1538-4365\/aaadb2","volume":"235","author":"N Mukund","year":"2018","unstructured":"Mukund, N. et al. An information retrieval and recommendation system for astronomical observatories. The Astrophysical Journal Supplement Series 235, 22 (2018).","journal-title":"The Astrophysical Journal Supplement Series"},{"key":"1598_CR8","doi-asserted-by":"publisher","first-page":"49","DOI":"10.3847\/1538-4357\/ab27c0","volume":"880","author":"NR Hinkel","year":"2019","unstructured":"Hinkel, N. R., Unterborn, C., Kane, S. R., Somers, G. & Galvez, R. A recommendation algorithm to predict giant exoplanet host stars using stellar elemental abundances. The Astrophysical Journal 880, 49 (2019).","journal-title":"The Astrophysical Journal"},{"key":"1598_CR9","doi-asserted-by":"publisher","first-page":"5147","DOI":"10.1093\/mnras\/stab316","volume":"502","author":"K Malanchev","year":"2021","unstructured":"Malanchev, K. et al. Anomaly detection in the zwicky transient facility dr3. Monthly Notices of the Royal Astronomical Society 502, 5147\u20135175 (2021).","journal-title":"Monthly Notices of the Royal Astronomical Society"},{"key":"1598_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2018.23","volume":"5","author":"D Torre","year":"2018","unstructured":"Torre, D. et al. Datasets2tools, repository and search engine for bioinformatics datasets, tools and canned analyses. Scientific data 5, 1\u201310 (2018).","journal-title":"Scientific data"},{"key":"1598_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3190616","volume":"51","author":"M Quadrana","year":"2018","unstructured":"Quadrana, M., Cremonesi, P. & Jannach, D. Sequence-aware recommender systems. ACM Computing Surveys (CSUR) 51, 1\u201336 (2018).","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"1598_CR12","doi-asserted-by":"crossref","unstructured":"Ricci, F., Rokach, L. & Shapira, B. Recommender systems: introduction and challenges. In Recommender systems handbook, 1\u201334 (Springer, Boston, MA, 2015).","DOI":"10.1007\/978-1-4899-7637-6_1"},{"key":"1598_CR13","doi-asserted-by":"publisher","first-page":"178","DOI":"10.15265\/IY-2016-022","volume":"25","author":"M Barros","year":"2016","unstructured":"Barros, M. & Couto, F. M. Knowledge representation and management: a linked data perspective. Yearbook of medical informatics 25, 178\u2013183 (2016).","journal-title":"Yearbook of medical informatics"},{"key":"1598_CR14","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1007\/s10462-017-9539-5","volume":"50","author":"JK Tarus","year":"2018","unstructured":"Tarus, J. K., Niu, Z. & Mustafa, G. Knowledge-based recommendation: a review of ontology-based recommender systems for e-learning. Artificial Intelligence Review 50, 21\u201348 (2018).","journal-title":"Artificial Intelligence Review"},{"key":"1598_CR15","doi-asserted-by":"publisher","first-page":"D1214","DOI":"10.1093\/nar\/gkv1031","volume":"44","author":"J Hastings","year":"2015","unstructured":"Hastings, J. et al. Chebi in 2016: Improved services and an expanding collection of metabolites. Nucleic acids research 44, D1214\u2013D1219 (2015).","journal-title":"Nucleic acids research"},{"key":"1598_CR16","first-page":"D330","volume":"47","author":"GO Consortium","year":"2018","unstructured":"Consortium, G. O. The gene ontology resource: 20 years and still going strong. Nucleic acids research 47, D330\u2013D338 (2018).","journal-title":"Nucleic acids research"},{"key":"1598_CR17","doi-asserted-by":"publisher","first-page":"D955","DOI":"10.1093\/nar\/gky1032","volume":"47","author":"LM Schriml","year":"2018","unstructured":"Schriml, L. M. et al. Human disease ontology 2018 update: classification, content and workflow expansion. Nucleic acids research 47, D955\u2013D962 (2018).","journal-title":"Nucleic acids research"},{"key":"1598_CR18","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1016\/j.csi.2017.11.005","volume":"57","author":"U Liji","year":"2018","unstructured":"Liji, U., Chai, Y. & Chen, J. Improved personalized recommendation based on user attributes clustering and score matrix filling. Computer Standards & Interfaces 57, 59\u201367 (2018).","journal-title":"Computer Standards & Interfaces"},{"key":"1598_CR19","first-page":"1","volume":"5","author":"FM Harper","year":"2015","unstructured":"Harper, F. M. & Konstan, J. A. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 1\u201319 (2015).","journal-title":"Acm transactions on interactive intelligent systems (tiis)"},{"key":"1598_CR20","doi-asserted-by":"crossref","unstructured":"Bennett, J., S. et al. The netflix prize. In Proceedings of KDD cup and workshop, 2007, 35 (Citeseer, 2007).","DOI":"10.1145\/1345448.1345459"},{"key":"1598_CR21","unstructured":"Spotify datasets. https:\/\/research.atspotify.com\/datasets\/ [Online; accessed 17-May-2022] (2022)."},{"key":"1598_CR22","unstructured":"Amazon datasets. http:\/\/snap.stanford.edu\/data\/web-Amazon.html [Online; accessed 17-May-2022] (2022)."},{"key":"1598_CR23","doi-asserted-by":"publisher","first-page":"176668","DOI":"10.1109\/ACCESS.2019.2958002","volume":"7","author":"M Barros","year":"2019","unstructured":"Barros, M., Moitinho, A. & Couto, F. M. Using research literature to generate datasets of implicit feedback for recommending scientific items. IEEE Access 7, 176668\u2013176680 (2019).","journal-title":"IEEE Access"},{"key":"1598_CR24","unstructured":"Pubmed results for paracetamol. https:\/\/pubmed.ncbi.nlm.nih.gov\/?term=paracetamol [Online; accessed 17-May-2022] (2022)."},{"key":"1598_CR25","unstructured":"ChEBI entity paracetamol. https:\/\/www.ebi.ac.uk\/chebi\/searchId.do?chebiId=CHEBI:46195 [Online; accessed 17-May-2022] (2022)."},{"key":"1598_CR26","unstructured":"Shani, G., Heckerman, D., Brafman, R. I. & Boutilier, C. An mdp-based recommender system. Journal of Machine Learning Research 6 (2005)."},{"key":"1598_CR27","doi-asserted-by":"crossref","unstructured":"Hidasi, B. & Karatzoglou, A. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings of the 27th ACM international conference on information and knowledge management, 843\u2013852 (2018).","DOI":"10.1145\/3269206.3271761"},{"key":"1598_CR28","doi-asserted-by":"crossref","unstructured":"Tang, J. & Wang, K. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 565\u2013573 (2018).","DOI":"10.1145\/3159652.3159656"},{"key":"1598_CR29","doi-asserted-by":"crossref","unstructured":"Kang, W.-C. & McAuley, J. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM), 197\u2013206 (IEEE, 2018).","DOI":"10.1109\/ICDM.2018.00035"},{"key":"1598_CR30","doi-asserted-by":"crossref","unstructured":"Sun, F. et al. Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management, 1441\u20131450 (2019).","DOI":"10.1145\/3357384.3357895"},{"key":"1598_CR31","unstructured":"Kenton, J. D. M.-W. C. & Toutanova, L. K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, 4171\u20134186 (2019)."},{"key":"1598_CR32","doi-asserted-by":"publisher","DOI":"10.6084\/m9.figshare.18857549.v1","author":"M Barros","year":"2022","unstructured":"Barros, M. seen_datasets, figshare, https:\/\/doi.org\/10.6084\/m9.figshare.18857549.v1 (2022)."},{"key":"1598_CR33","doi-asserted-by":"crossref","unstructured":"Cola\u00e7o, F., Barros, M. & Couto, F. M. Drecpy: A python framework for developing deep learning-based recommenders. In Fourteenth ACM Conference on Recommender Systems, 675\u2013680 (2020).","DOI":"10.1145\/3383313.3418483"},{"key":"1598_CR34","doi-asserted-by":"publisher","first-page":"2174","DOI":"10.21105\/joss.02174","volume":"5","author":"N Hug","year":"2020","unstructured":"Hug, N. Surprise: A python library for recommender systems. Journal of Open Source Software 5, 2174 (2020).","journal-title":"Journal of Open Source Software"},{"key":"1598_CR35","doi-asserted-by":"crossref","unstructured":"Barros, M., Moitinho, A. & Couto, F. M. Hybrid semantic recommender system for chemical compounds. In European Conference on Information Retrieval, 94\u2013101 (Springer, 2020).","DOI":"10.1007\/978-3-030-45442-5_12"},{"key":"1598_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00495-2","volume":"13","author":"M Barros","year":"2021","unstructured":"Barros, M., Moitinho, A. & Couto, F. M. Hybrid semantic recommender system for chemical compounds in large-scale datasets. Journal of cheminformatics 13, 1\u201318 (2021).","journal-title":"Journal of cheminformatics"},{"key":"1598_CR37","doi-asserted-by":"publisher","first-page":"356","DOI":"10.1093\/mnras\/stab770","volume":"504","author":"WS Dias","year":"2021","unstructured":"Dias, W. S. et al. Updated parameters of 1743 open clusters based on gaia dr2. Monthly Notices of the Royal Astronomical Society 504, 356\u2013371 (2021).","journal-title":"Monthly Notices of the Royal Astronomical Society"},{"key":"1598_CR38","doi-asserted-by":"crossref","unstructured":"Pazzani, M. J. & Billsus, D. Content-based recommendation systems. In The adaptive web, 325\u2013341 (Springer, 2007).","DOI":"10.1007\/978-3-540-72079-9_10"},{"key":"1598_CR39","doi-asserted-by":"crossref","unstructured":"Couto, F. & Lamurias, A. Semantic similarity definition. Encyclopedia of bioinformatics and computational biology 1 (2019).","DOI":"10.1016\/B978-0-12-809633-8.20401-9"},{"key":"1598_CR40","unstructured":"Resnik, P. Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg\/9511007 (1995)."},{"key":"1598_CR41","unstructured":"Lin, D. et al. An information-theoretic definition of similarity. In Icml, 98, 296\u2013304 (Citeseer, 1998)."},{"key":"1598_CR42","unstructured":"Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg\/9709008 (1997)."},{"key":"1598_CR43","doi-asserted-by":"crossref","unstructured":"Lee-Thorp, J., Ainslie, J., Eckstein, I. & Ontanon, S. Fnet: Mixing tokens with fourier transforms. arXiv preprint arXiv:2105.03824 (2021).","DOI":"10.18653\/v1\/2022.naacl-main.319"},{"key":"1598_CR44","doi-asserted-by":"publisher","first-page":"e1000937","DOI":"10.1371\/journal.pcbi.1000937","volume":"6","author":"JD Ferreira","year":"2010","unstructured":"Ferreira, J. D. & Couto, F. M. Semantic similarity for automatic classification of chemical compounds. PLoS Comput Biol 6, e1000937 (2010).","journal-title":"PLoS Comput Biol"},{"key":"1598_CR45","unstructured":"Lamurias, A., Grego, T. & Couto, F. M. Chemical compound and drug name recognition using crfs and semantic similarity based on chebi. In BioCreative Challenge Evaluation Workshop, 2, 75 (Citeseer, 2013)."},{"key":"1598_CR46","doi-asserted-by":"publisher","first-page":"306","DOI":"10.3389\/fbioe.2019.00306","volume":"7","author":"X Wang","year":"2019","unstructured":"Wang, X. et al. Sts-nlsp: a network-based label space partition method for predicting the specificity of membrane transporter substrates using a hybrid feature of structural and semantic similarity. Frontiers in Bioengineering and Biotechnology 7, 306 (2019).","journal-title":"Frontiers in Bioengineering and Biotechnology"},{"key":"1598_CR47","unstructured":"DiShIn: Semantic Similarity Measures using Disjunctive Shared Information. https:\/\/github.com\/lasigeBioTM\/DiShIn [Online; accessed 17-May-2022] (2022)."},{"key":"1598_CR48","doi-asserted-by":"publisher","first-page":"A1","DOI":"10.1051\/0004-6361\/201629272","volume":"595","author":"T Prusti","year":"2016","unstructured":"Prusti, T. et al. The gaia mission. Astronomy & Astrophysics 595, A1 (2016).","journal-title":"Astronomy & Astrophysics"},{"key":"1598_CR49","doi-asserted-by":"publisher","unstructured":"Dias, W. S. et al. Updated parameters of 1743 open clusters based on Gaia DR2. 504, 356\u2013371, https:\/\/doi.org\/10.1093\/mnras\/stab770 (2021).","DOI":"10.1093\/mnras\/stab770"},{"key":"1598_CR50","unstructured":"ChEBI entity (R)-noradrenaline. https:\/\/www.ebi.ac.uk\/chebi\/searchId.do?chebiId=CHEBI:18357 [Online; accessed 17-May-2022] (2022)."}],"container-title":["Scientific Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41597-022-01598-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41597-022-01598-7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41597-022-01598-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T19:07:50Z","timestamp":1727723270000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41597-022-01598-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,4]]},"references-count":50,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["1598"],"URL":"https:\/\/doi.org\/10.1038\/s41597-022-01598-7","relation":{"references":[{"id-type":"doi","id":"10.6084\/m9.figshare.18857549.v1","asserted-by":"subject"}]},"ISSN":["2052-4463"],"issn-type":[{"type":"electronic","value":"2052-4463"}],"subject":[],"published":{"date-parts":[[2022,8,4]]},"assertion":[{"value":"25 January 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 July 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"478"}}