{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T12:19:54Z","timestamp":1761394794234},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2016,10,28]],"date-time":"2016-10-28T00:00:00Z","timestamp":1477612800000},"content-version":"vor","delay-in-days":221,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: While the size and number of biobanks, patient registries and other data collections are increasing, biomedical researchers still often need to pool data for statistical power, a task that requires time-intensive retrospective integration.<\/jats:p>\n               <jats:p>Results: To address this challenge, we developed MOLGENIS\/connect, a semi-automatic system to find, match and pool data from different sources. The system shortlists relevant source attributes from thousands of candidates using ontology-based query expansion to overcome variations in terminology. Then it generates algorithms that transform source attributes to a common target DataSchema. These include unit conversion, categorical value matching and complex conversion patterns (e.g. calculation of BMI). In comparison to human-experts, MOLGENIS\/connect was able to auto-generate 27% of the algorithms perfectly, with an additional 46% needing only minor editing, representing a reduction in the human effort and expertise needed to pool data.<\/jats:p>\n               <jats:p>Availability and Implementation: Source code, binaries and documentation are available as open-source under LGPLv3 from http:\/\/github.com\/molgenis\/molgenis and www.molgenis.org\/connect.<\/jats:p>\n               <jats:p>Contact: m.a.swertz@rug.nl<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw155","type":"journal-article","created":{"date-parts":[[2016,3,22]],"date-time":"2016-03-22T02:05:11Z","timestamp":1458612311000},"page":"2176-2183","source":"Crossref","is-referenced-by-count":12,"title":["MOLGENIS\/connect: a system for semi-automatic integration of heterogeneous phenotype data with applications in biobanks"],"prefix":"10.1093","volume":"32","author":[{"given":"Chao","family":"Pang","sequence":"first","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"},{"name":"2 Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"van Enckevort","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"de Haan","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fleur","family":"Kelpin","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Jetten","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dennis","family":"Hendriksen","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tommy","family":"de Boer","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bart","family":"Charbon","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erwin","family":"Winder","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"K. Joeri","family":"van der Velde","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dany","family":"Doiron","sequence":"additional","affiliation":[{"name":"3 Research Institute of the McGill University Health Centre and"},{"name":"4 Department of Medicine, McGill University, Montreal, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isabel","family":"Fortier","sequence":"additional","affiliation":[{"name":"3 Research Institute of the McGill University Health Centre and"},{"name":"4 Department of Medicine, McGill University, Montreal, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hans","family":"Hillege","sequence":"additional","affiliation":[{"name":"2 Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Morris A.","family":"Swertz","sequence":"additional","affiliation":[{"name":"1 Department of Genetics, University Medical Center Groningen, Genomics Coordination Center, University of Groningen, Groningen, The Netherlands"},{"name":"2 Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2016,3,21]]},"reference":[{"key":"2023020112470712100_btw155-B201","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1002\/humu.22070","article-title":"Observ-OM and Observ-TAB: universal syntax solutions for the integration, search and exchange of phenotype and genotype information","volume":"33","author":"Adamusiak","year":"2012","journal-title":"Hum. Mutat."},{"key":"2023020112470712100_btw155-B202","doi-asserted-by":"crossref","first-page":"866","DOI":"10.1016\/j.ipm.2006.09.003","article-title":"A review of ontology based query expansion","volume":"43","author":"Bhogal","year":"2007","journal-title":"Inf. Process. Manage."},{"key":"2023020112470712100_btw155-B203","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1742-7622-10-12","article-title":"Data harmonization and federated analysis of population-based studies: the BioSHaRE project","volume":"10","author":"Doiron","year":"2013","journal-title":"Emerg. Themes. Epidemiol."},{"key":"2023020112470712100_btw155-B204","doi-asserted-by":"crossref","first-page":"1314","DOI":"10.1093\/ije\/dyr106","article-title":"Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies","volume":"40","author":"Fortier","year":"2011","journal-title":"Int. J. Epidemiol."},{"key":"2023020112470712100_btw155-B1","year":"2012"},{"key":"2023020112470712100_btw155-B2","year":"2014"},{"key":"2023020112470712100_btw155-B100","year":"2015"},{"key":"2023020112470712100_btw155-B3","year":"2014"},{"key":"2023020112470712100_btw155-B4","year":"2014"},{"key":"2023020112470712100_btw155-B5","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1136\/jamia.2009.000893","article-title":"Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)","volume":"17","author":"Murphy","year":"2010","journal-title":"J. Am. Med. Inf. Assoc.: JAMIA"},{"key":"2023020112470712100_btw155-B6","year":"2011"},{"key":"2023020112470712100_btw155-B7","first-page":"65","article-title":"BiobankConnect: Software to Rapidly Connect Data Elements for Pooled Analysis across Biobanks Using Ontological and Lexical Indexing","volume-title":"J. Am. Med. Inform. Assoc.","author":"Pang","year":"2015"},{"key":"2023020112470712100_btw155-B8","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bav089","article-title":"SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data","volume":"2015","author":"Pang","year":"2015","journal-title":"Database"},{"key":"2023020112470712100_btw155-B9","volume-title":"The Unified Code for Units of Measure (UCUM)","author":"Schadow","year":"2005"},{"key":"2023020112470712100_btw155-B10","doi-asserted-by":"crossref","first-page":"1172","DOI":"10.1093\/ije\/dyu229","article-title":"Cohort Profile: LifeLines, a three-generation cohort study and biobank","volume":"44","author":"Scholtens","year":"2015","journal-title":"Int. J. Epidemiol"},{"key":"2023020112470712100_btw155-B11","author":"Shima","year":"2011"},{"key":"2023020112470712100_btw155-B12","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2105-11-S12-S12","article-title":"The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button","volume":"11","author":"Swertz","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020112470712100_btw155-B13","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/1479-5876-8-68","article-title":"Effective knowledge management in translational medicine","volume":"8","author":"Szalma","year":"2010","journal-title":"J. Transl. Med"},{"key":"2023020112470712100_btw155-B205","year":"2006"},{"key":"2023020112470712100_btw155-B14","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/1472-6823-14-9","article-title":"The prevalence of Metabolic Syndrome and metabolically healthy obesity in Europe: a collaborative analysis of ten large cohort studies","volume":"14","author":"Van Vliet-Ostaptchouk","year":"2014","journal-title":"BMC Endocrine Disorders"},{"key":"2023020112470712100_btw155-B15","first-page":"6","article-title":"Verb Semantics and Lexical Selection","author":"Wu","year":"1994","journal-title":"32nd Annual Meeting on Association for Computational Linguistics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/14\/2176\/49020190\/bioinformatics_32_14_2176.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/14\/2176\/49020190\/bioinformatics_32_14_2176.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:50:49Z","timestamp":1675291849000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/14\/2176\/1743056"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,3,21]]},"references-count":21,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2016,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw155","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,7,15]]},"published":{"date-parts":[[2016,3,21]]}}}