{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T02:14:50Z","timestamp":1772590490861,"version":"3.50.1"},"reference-count":71,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000083","name":"Directorate for Computer and Information Science and Engineering","doi-asserted-by":"publisher","award":["1901386"],"award-info":[{"award-number":["1901386"]}],"id":[{"id":"10.13039\/100000083","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EPJ Data Sci."],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits. However, large corpora have remained unanalyzed in depth, as descriptive labels are absent and require expert domain knowledge to generate. We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments. We then evaluate the model on a new classification task for labeling computational notebook cells as stages in the data analysis process from data import to wrangling, exploration, modeling, and evaluation. We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplied heuristics and outperforms a suite of baselines. Our model enables us to examine a set of 118,000 Jupyter Notebooks to uncover common data analysis patterns. Focusing on notebooks with relationships to academic articles, we conduct the largest study of scientific code to date and find that notebooks which devote an higher fraction of code to the typically labor-intensive process of wrangling data in expectation exhibit decreased citation counts for corresponding papers. We also show significant differences between academic and non-academic notebooks, including that academic notebooks devote substantially more code to wrangling and exploring data, and less on modeling.<\/jats:p>","DOI":"10.1140\/epjds\/s13688-022-00327-9","type":"journal-article","created":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T11:09:45Z","timestamp":1647601785000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis"],"prefix":"10.1140","volume":"11","author":[{"given":"Ge","family":"Zhang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2994-0058","authenticated-orcid":false,"given":"Mike A.","family":"Merrill","sequence":"additional","affiliation":[]},{"given":"Yang","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jeffrey","family":"Heer","sequence":"additional","affiliation":[]},{"given":"Tim","family":"Althoff","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"327_CR1","volume-title":"Positioning and power in academic publishing: players, agents and agendas","author":"T Kluyver","year":"2016","unstructured":"Kluyver T, Ragan-Kelley B, P\u00e9rez F, Granger BE, Bussonnier M, Frederic J, Kelley K, Hamrick JB, Grout J, Corlay S et al. (2016) Jupyter notebooks-a publishing format for reproducible computational workflows. In: Positioning and power in academic publishing: players, agents and agendas"},{"issue":"2","key":"327_CR2","doi-asserted-by":"crossref","first-page":"203","DOI":"10.5195\/jmla.2017.88","volume":"105","author":"ED Foster","year":"2017","unstructured":"Foster ED, Deardorff A (2017) Open science framework (OSF). J. Med. Libr. Assoc. 105(2):203\u2013206","journal-title":"J. Med. Libr. Assoc."},{"key":"327_CR3","unstructured":"Ayers P LibGuides: Citing & publishing software: Publishing research software"},{"issue":"5","key":"327_CR4","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.jocs.2013.08.001","volume":"4","author":"C Pradal","year":"2013","unstructured":"Pradal C, Varoquaux G, Langtangen HP (2013) Publishing scientific software matters. J Comput Sci 4(5):311\u2013312","journal-title":"J Comput Sci"},{"key":"327_CR5","volume-title":"CHI","author":"Y Liu","year":"2020","unstructured":"Liu Y, Althoff T, Heer J (2020) Paths explored, paths omitted, paths obscured: decision points & selective reporting in end-to-end data analysis. In: CHI"},{"key":"327_CR6","volume-title":"CHI","author":"A Rule","year":"2018","unstructured":"Rule A, Tabard A, Hollan JD (2018) Exploration and explanation in computational notebooks. In: CHI"},{"key":"327_CR7","volume-title":"SIGMOD","author":"MS Rehman","year":"2019","unstructured":"Rehman MS (2019) Towards understanding data analysis workflows using a large notebook corpus. In: SIGMOD"},{"key":"327_CR8","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1145\/3377816.3381724","volume-title":"ICSE","author":"J Wang","year":"2020","unstructured":"Wang J, Li L, Zeller A (2020) Better code, better sharing: on the need of analyzing Jupyter notebooks. In: ICSE, pp 53\u201356"},{"key":"327_CR9","volume-title":"ICSE","author":"J Wang","year":"2020","unstructured":"Wang J, Li L, Zeller A (2020) Better code, better sharing: on the need of analyzing Jupyter notebooks. In: ICSE"},{"key":"327_CR10","volume-title":"CHI","author":"MB Kery","year":"2018","unstructured":"Kery MB, Radensky M, Arya M, John BE, Myers BA (2018) The story in the notebook: exploratory data science using a literate programming tool. In: CHI"},{"key":"327_CR11","volume-title":"Enterprise data analysis and visualization: an interview study. TVCG","author":"S Kandel","year":"2012","unstructured":"Kandel S, Paepcke A, Hellerstein JM, Heer J (2012) Enterprise data analysis and visualization: an interview study. TVCG"},{"key":"327_CR12","unstructured":"Wongsuphasawat K, Liu Y, Heer J (2019) Goals, process, and challenges of exploratory data analysis: an interview study. 1911.00568"},{"issue":"1","key":"327_CR13","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/TVCG.2018.2865040","volume":"25","author":"S Alspaugh","year":"2018","unstructured":"Alspaugh S, Zokaei N, Liu A, Jin C, Hearst MA (2018) Futzing and moseying: interviews with professional data analysts on exploration practices. IEEE Trans Vis Comput Graph 25(1):22\u201331","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"327_CR14","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2018.021651343","volume":"20","author":"A Johanson","year":"2018","unstructured":"Johanson A, Hasselbring W (2018) Software engineering for computational science: past, present, future. Comput Sci Eng 20:90\u2013109","journal-title":"Comput Sci Eng"},{"key":"327_CR15","volume-title":"CHI","author":"MB Kery","year":"2017","unstructured":"Kery MB, Horvath A, Myers B (2017) Variolite: supporting exploratory programming by data scientists. In: CHI"},{"key":"327_CR16","first-page":"162","volume-title":"VL\/HCC","author":"C Hill","year":"2016","unstructured":"Hill C, Bellamy R, Erickson T, Burnett M (2016) Trials and tribulations of developers of intelligent systems: A field study. In: VL\/HCC. IEEE, Los Alamitos, pp 162\u2013170"},{"key":"327_CR17","volume-title":"NeurIPS","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: NeurIPS"},{"key":"327_CR18","unstructured":"Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. 1810.04805"},{"key":"327_CR19","volume-title":"NeurIPS","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NeurIPS"},{"key":"327_CR20","doi-asserted-by":"crossref","unstructured":"Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. CoRR. 1503.00075","DOI":"10.3115\/v1\/P15-1150"},{"key":"327_CR21","unstructured":"Mou L, Li G, Jin Z, Zhang L, Wang T (2014) TBCNN: A tree-based convolutional neural network for programming language processing. CoRR. 1409.5718"},{"key":"327_CR22","volume-title":"ICSE","author":"A Hindle","year":"2012","unstructured":"Hindle A, Barr ET, Su Z, Gabel M, Devanbu P (2012) On the naturalness of software. In: ICSE"},{"key":"327_CR23","volume-title":"FSE","author":"Z Tu","year":"2014","unstructured":"Tu Z, Su Z, Devanbu P (2014) On the localness of software. In: FSE"},{"key":"327_CR24","volume-title":"ESEC\/FSE","author":"TT Nguyen","year":"2013","unstructured":"Nguyen TT, Nguyen AT, Nguyen HA, Nguyen TN (2013) A statistical semantic language model for source code. In: ESEC\/FSE"},{"key":"327_CR25","volume-title":"2013 10th working conference on mining software repositories (MSR)","author":"M Allamanis","year":"2013","unstructured":"Allamanis M, Sutton C (2013) Mining source code repositories at massive scale using language modeling. In: 2013 10th working conference on mining software repositories (MSR)"},{"key":"327_CR26","volume-title":"SIGPLAN","author":"V Raychev","year":"2014","unstructured":"Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: SIGPLAN"},{"key":"327_CR27","volume-title":"ICDM","author":"T Kwon","year":"2011","unstructured":"Kwon T, Su Z (2011) Modeling high-level behavior patterns for precise similarity analysis of software. In: ICDM"},{"key":"327_CR28","volume-title":"ACL","author":"D Movshovitz-Attias","year":"2013","unstructured":"Movshovitz-Attias D, Cohen W (2013) Natural language models for predicting programming comments. In: ACL"},{"key":"327_CR29","volume-title":"ICML","author":"P Bielik","year":"2016","unstructured":"Bielik P, Raychev V, Vechev M (2016) Phog: probabilistic model for code. In: ICML"},{"key":"327_CR30","first-page":"404","volume-title":"SIGPLAN","author":"U Alon","year":"2018","unstructured":"Alon U, Zilberstein M, Levy O, Yahav E (2018) A general path-based representation for predicting program properties. In: SIGPLAN, pp 404\u2013419"},{"key":"327_CR31","doi-asserted-by":"crossref","unstructured":"Li J, Wang Y, Lyu MR, King I (2017) Code completion with neural attention and pointer networks. 1711.09573","DOI":"10.24963\/ijcai.2018\/578"},{"key":"327_CR32","unstructured":"Allamanis M, Brockschmidt M, Khademi M (2017) Learning to represent programs with graphs. 1711.00740"},{"key":"327_CR33","first-page":"876","volume-title":"ICDM","author":"Y Zhang","year":"2019","unstructured":"Zhang Y, Xu FF, Li S, Meng Y, Wang X, Li Q, Han J (2019) Higitclass: keyword-driven hierarchical classification of github repositories. In: ICDM, pp 876\u2013885"},{"key":"327_CR34","doi-asserted-by":"crossref","unstructured":"Shetty M, Bansal C, Kumar S, Rao N, Nagappan N, Zimmermann T (2020) Neural knowledge extraction from cloud service incidents. 2007.05505","DOI":"10.1109\/ICSE-SEIP52600.2021.00031"},{"key":"327_CR35","volume":"3","author":"U Alon","year":"2019","unstructured":"Alon U, Zilberstein M, Levy O, Yahav E (2019) code2vec: learning distributed representations of code. POPL 3:40","journal-title":"POPL"},{"key":"327_CR36","volume-title":"FSE","author":"M Allamanis","year":"2014","unstructured":"Allamanis M, Barr E, Bird C, Sutton C (2014) Learning natural coding conventions. In: FSE"},{"key":"327_CR37","volume-title":"ESEC\/FSE","author":"M Acharya","year":"2007","unstructured":"Acharya M, Xie T, Pei J, Xu J (2007) Mining api patterns as partial orders from source code: from usage scenarios to specifications. In: ESEC\/FSE"},{"key":"327_CR38","volume-title":"ICSE","author":"TD Nguyen","year":"2017","unstructured":"Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring api embedding for api usages and applications. In: ICSE"},{"key":"327_CR39","doi-asserted-by":"crossref","unstructured":"Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, R\u00e9 C (2019) Snorkel: rapid training data creation with weak supervision. VLDB","DOI":"10.1007\/s00778-019-00552-1"},{"key":"327_CR40","volume-title":"NeurIPS","author":"W Hamilton","year":"2017","unstructured":"Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: NeurIPS"},{"key":"327_CR41","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. 1609.02907"},{"key":"327_CR42","volume-title":"ICDM","author":"M Wu","year":"2019","unstructured":"Wu M, Pan S, Zhu X, Zhou C, Pan L (2019) Domain-adversarial graph neural networks for text classification. In: ICDM"},{"key":"327_CR43","volume-title":"European semantic web conference","author":"M Schlichtkrull","year":"2018","unstructured":"Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference"},{"key":"327_CR44","volume-title":"NeurIPS","author":"M Zhang","year":"2018","unstructured":"Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: NeurIPS"},{"key":"327_CR45","volume-title":"NeurIPS","author":"M Defferrard","year":"2016","unstructured":"Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: NeurIPS"},{"key":"327_CR46","volume-title":"NeurIPS","author":"Z Ying","year":"2018","unstructured":"Ying Z, You J, Morris C, Ren X, Hamilton W, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. In: NeurIPS"},{"key":"327_CR47","volume-title":"ICML","author":"H Dai","year":"2016","unstructured":"Dai H, Dai B, Song L (2016) Discriminative embeddings of latent variable models for structured data. In: ICML"},{"key":"327_CR48","volume-title":"NeurIPS","author":"DK Duvenaud","year":"2015","unstructured":"Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: NeurIPS"},{"issue":"1","key":"327_CR49","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","volume":"20","author":"F Scarselli","year":"2008","unstructured":"Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61\u201380","journal-title":"IEEE Trans Neural Netw"},{"key":"327_CR50","unstructured":"Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R et al (2018) Relational inductive biases, deep learning, and graph networks. 1806.01261"},{"key":"327_CR51","unstructured":"Fernandes P, Allamanis M, Brockschmidt M (2018) Structured neural summarization. 1811.01824"},{"key":"327_CR52","unstructured":"Brockschmidt M, Allamanis M, Gaunt AL, Polozov O (2018) Generative code modeling with graphs. 1805.08490"},{"key":"327_CR53","unstructured":"Veli\u010dkovi\u0107 P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. 1710.10903"},{"key":"327_CR54","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/MS.2012.174","volume":"30","author":"L Chen","year":"2013","unstructured":"Chen L, Ali Babar M, Nuseibeh B (2013) Characterizing architecturally significant requirements. IEEE Softw 30:38\u201345","journal-title":"IEEE Softw"},{"key":"327_CR55","unstructured":"Anonymous CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis. https:\/\/bit.ly\/3hl1PUX"},{"key":"327_CR56","doi-asserted-by":"publisher","first-page":"5435","DOI":"10.18653\/v1\/D19-1546","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)","author":"R Agashe","year":"2019","unstructured":"Agashe R, Iyer S, Zettlemoyer L (2019) JuICe: a large scale distantly supervised dataset for open domain context-based code generation. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). Assoc. Comput. Linguistics, Hong Kong, pp 5435\u20135445. https:\/\/doi.org\/10.18653\/v1\/D19-1546. https:\/\/www.aclweb.org\/anthology\/D19-1546. Accessed 2019-12-03"},{"key":"327_CR57","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1109\/MSR.2019.00077","volume-title":"2019 IEEE\/ACM 16th international conference on mining software repositories (MSR)","author":"JF Pimentel","year":"2019","unstructured":"Pimentel JF, Murta L, Braganholo V, Freire J (2019) A large-scale study about quality and reproducibility of Jupyter notebooks. In: 2019 IEEE\/ACM 16th international conference on mining software repositories (MSR), pp 507\u2013517. https:\/\/doi.org\/10.1109\/MSR.2019.00077. ISSN: 2574-3864"},{"key":"327_CR58","unstructured":"JetBrains Data Science in 2018 (2018). https:\/\/www.jetbrains.com\/research\/data-science-2018\/"},{"key":"327_CR59","unstructured":"Kelley K, Granger B (2017) Jupyter frontends: from the classic jupyter notebook to jupyterlab, nteract, and beyond. JupyterCon"},{"issue":"1","key":"327_CR60","doi-asserted-by":"crossref","first-page":"159","DOI":"10.2307\/2529310","volume":"33","author":"J Landis","year":"1977","unstructured":"Landis J, Koch G (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159\u2013174","journal-title":"Biometrics"},{"key":"327_CR61","volume-title":"Content analysis: an introduction to its methodology","author":"K Krippendorff","year":"2004","unstructured":"Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage, Thousand Oaks","edition":"2"},{"key":"327_CR62","volume-title":"ICML","author":"J Gilmer","year":"2017","unstructured":"Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: ICML"},{"key":"327_CR63","unstructured":"Ba JL, Kiros JR, Hinton GE (2016). Layer normalization. 1607.06450"},{"key":"327_CR64","volume-title":"IJCAI","author":"J Weston","year":"2011","unstructured":"Weston J, Bengio S, Usunier N (2011) Wsabie: scaling up to large vocabulary image annotation. In: IJCAI"},{"key":"327_CR65","doi-asserted-by":"crossref","unstructured":"Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. TACL","DOI":"10.1162\/tacl_a_00177"},{"key":"327_CR66","volume-title":"NAACL-HLT","author":"M Iyyer","year":"2016","unstructured":"Iyyer M, Guha A, Chaturvedi S, Boyd-Graber J, Daum\u00e9 H III (2016) Feuding families and former friends: unsupervised learning for dynamic fictional relationships. In: NAACL-HLT"},{"key":"327_CR67","volume-title":"ACL","author":"R He","year":"2017","unstructured":"He R, Lee WS, Ng HT, Dahlmeier D (2017) An unsupervised neural attention model for aspect extraction. In: ACL"},{"key":"327_CR68","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993\u20131022","journal-title":"J Mach Learn Res"},{"key":"327_CR69","doi-asserted-by":"publisher","first-page":"1212","DOI":"10.1145\/3447548.3467455","volume-title":"Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining","author":"MA Merrill","year":"2021","unstructured":"Merrill MA, Zhang G, Althoff T (2021) MULTIVERSE: mining collective data science knowledge from code on the web to suggest alternative analysis approaches. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. ACM, Singapore, pp 1212\u20131222. https:\/\/doi.org\/10.1145\/3447548.3467455. Accessed 2021-08-30"},{"key":"327_CR70","unstructured":"Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J, Edwards H, Burda Y, Joseph N, Brockman G, Ray A, Puri R, Krueger G, Petrov M, Khlaaf H, Sastry G, Mishkin P, Chan B, Gray S, Ryder N, Pavlov M, Power A, Kaiser L, Bavarian M, Winter C, Tillet P, Such FP, Cummings D, Plappert M, Chantzis F, Barnes E, Herbert-Voss A, Guss WH, Nichol A, Paino A, Tezak N, Tang J, Babuschkin I, Balaji S, Jain S, Saunders W, Hesse C, Carr AN, Leike J, Achiam J, Misra V, Morikawa E, Radford A, Knight M, Brundage M, Murati M, Mayer K, Welinder P, McGrew B, Amodei D, McCandlish S, Sutskever I, Zaremba W (2021) Evaluating Large Language Models Trained on Code. 2107.03374 [cs]. Accessed 2021-08-30"},{"key":"327_CR71","volume-title":"ACL","author":"K Lo","year":"2020","unstructured":"Lo K, Wang LL, Neumann M, Kinney R, Weld DS (2020) S2ORC: the semantic scholar open research corpus. In: ACL"}],"container-title":["EPJ Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00327-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1140\/epjds\/s13688-022-00327-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1140\/epjds\/s13688-022-00327-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T11:11:46Z","timestamp":1647601906000},"score":1,"resource":{"primary":{"URL":"https:\/\/epjdatascience.springeropen.com\/articles\/10.1140\/epjds\/s13688-022-00327-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":71,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["327"],"URL":"https:\/\/doi.org\/10.1140\/epjds\/s13688-022-00327-9","relation":{},"ISSN":["2193-1127"],"issn-type":[{"value":"2193-1127","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,18]]},"assertion":[{"value":"9 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"14"}}