{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:07:14Z","timestamp":1750306034961,"version":"3.41.0"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,12,31]],"date-time":"2017-12-31T00:00:00Z","timestamp":1514678400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000957","name":"Alzheimer's Association","doi-asserted-by":"crossref","award":["003278-0001"],"award-info":[{"award-number":["003278-0001"]}],"id":[{"id":"10.13039\/100000957","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Laboratory of Neuro Imaging Resource (LONIR) NIH","award":["P41EB015922"],"award-info":[{"award-number":["P41EB015922"]}]},{"name":"National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health","award":["U54EB020406"],"award-info":[{"award-number":["U54EB020406"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2017,12,31]]},"abstract":"<jats:p>This article describes an approach for the automated reading of biomedical data dictionaries. Automated reading is the process of extracting element details for each of the data elements from a data dictionary in a document format (such as PDF) to a completely structured representation. A structured representation is essential if the data dictionary metadata are to be used in applications such as data integration and also in evaluating the quality of the associated data. We present an approach and implemented solution for the problem, considering different formats of data dictionaries. We have a particular focus on the most challenging format with a machine-learning classification solution to the problem using conditional random field classifiers. We present an evaluation using several actual data dictionaries, demonstrating the effectiveness of our approach.<\/jats:p>","DOI":"10.1145\/3177874","type":"journal-article","created":{"date-parts":[[2018,5,11]],"date-time":"2018-05-11T12:15:27Z","timestamp":1526040927000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Machine Reading of Biomedical Data Dictionaries"],"prefix":"10.1145","volume":"9","author":[{"given":"Naveen","family":"Ashish","sequence":"first","affiliation":[{"name":"Hutch Data Commonwealth, Fred Hutchinson Cancer Research Center, Seattle WA"}]},{"given":"Arihant","family":"Patawari","sequence":"additional","affiliation":[{"name":"City of Hope, National Medical Center, Duarte, CA"}]}],"member":"320","published-online":{"date-parts":[[2018,5,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"C. C. Aggarwal and C. Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer 163--222.   C. C. Aggarwal and C. Zhai. 2012. A survey of text classification algorithms. In Mining Text Data. Springer 163--222.","DOI":"10.1007\/978-1-4614-3223-4_6"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/271074.271078"},{"key":"e_1_2_1_3_1","unstructured":"N. Ashish and A. Patawari. 2017. Data Dictionary Reader Code. Retrieved from https:\/\/github.com\/nashish100\/DDReading.  N. Ashish and A. Patawari. 2017. Data Dictionary Reader Code. Retrieved from https:\/\/github.com\/nashish100\/DDReading."},{"key":"e_1_2_1_4_1","volume-title":"GEM: Tha GAAIN entity mapper. In Proceedings of the 11th International Conference on Data Integration in Life Sciences","author":"Ashish N.","year":"2015","unstructured":"N. Ashish , P. Dewan , J. Ambite , and A. Toga . 2015 . GEM: Tha GAAIN entity mapper. In Proceedings of the 11th International Conference on Data Integration in Life Sciences . Springer , 13--27. N. Ashish, P. Dewan, J. Ambite, and A. Toga. 2015. GEM: Tha GAAIN entity mapper. In Proceedings of the 11th International Conference on Data Integration in Life Sciences. Springer, 13--27."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-006-0014-x"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/263661.263675"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"H. Chao and J. Fan. 2004. Layout and content extraction for pdf documents. In Document Analysis Systems VI. Springer Berlin.  H. Chao and J. Fan. 2004. Layout and content extraction for pdf documents. In Document Analysis Systems VI. Springer Berlin.","DOI":"10.1007\/978-3-540-28640-0_20"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511477"},{"key":"e_1_2_1_9_1","volume-title":"IBM Dictionary of Computing","author":"Dictionary Data","unstructured":"Data Dictionary . IBM Dictionary of Computing ( 10 th ed.). ACM. Data Dictionary. IBM Dictionary of Computing (10th ed.). ACM.","edition":"10"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2014.07.007"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242583"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"volume-title":"Conditional. In Proceedings of the International Conference on Machine Learning (ICML\u201901)","author":"Lafferty J.","key":"e_1_2_1_14_1","unstructured":"J. Lafferty , A. McCallum , and P. Pereira . 2001 . Conditional. In Proceedings of the International Conference on Machine Learning (ICML\u201901) . J. Lafferty, A. McCallum, and P. Pereira. 2001. Conditional. In Proceedings of the International Conference on Machine Learning (ICML\u201901)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007584"},{"key":"e_1_2_1_16_1","unstructured":"A. McCallum. 2002. Mallet. R\u00e9cup\u00e9r\u00e9 sur MALLET: A Machine Learning for Language Toolkit. Retrieved from http:\/\/mallet.cs.umass.edu\/.  A. McCallum. 2002. Mallet. R\u00e9cup\u00e9r\u00e9 sur MALLET: A Machine Learning for Language Toolkit. Retrieved from http:\/\/mallet.cs.umass.edu\/."},{"key":"e_1_2_1_17_1","unstructured":"PDFBox. 2015. R\u00e9cup\u00e9r\u00e9 sur Apache PDFBox. Retrieved from https:\/\/pdfbox.apache.org\/.  PDFBox. 2015. R\u00e9cup\u00e9r\u00e9 sur Apache PDFBox. Retrieved from https:\/\/pdfbox.apache.org\/."},{"key":"e_1_2_1_18_1","unstructured":"PDFTables. 2015. R\u00e9cup\u00e9r\u00e9 sur Accurately extract tables from PDF. Retrieved from https:\/\/pdftables.com\/.  PDFTables. 2015. R\u00e9cup\u00e9r\u00e9 sur Accurately extract tables from PDF. Retrieved from https:\/\/pdftables.com\/."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860479"},{"key":"e_1_2_1_20_1","unstructured":"Tabula. 2015. R\u00e9cup\u00e9r\u00e9 sur extract tables from PDFs. Retrieved from http:\/\/tabula.technology\/.  Tabula. 2015. R\u00e9cup\u00e9r\u00e9 sur extract tables from PDFs. Retrieved from http:\/\/tabula.technology\/."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220497"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1532-5415.1992.tb01992.x"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060761"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3177874","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3177874","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:02:55Z","timestamp":1750215775000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3177874"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,31]]},"references-count":22,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,12,31]]}},"alternative-id":["10.1145\/3177874"],"URL":"https:\/\/doi.org\/10.1145\/3177874","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2017,12,31]]},"assertion":[{"value":"2015-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-05-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}