{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T10:09:26Z","timestamp":1774519766153,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2017,11,16]],"date-time":"2017-11-16T00:00:00Z","timestamp":1510790400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"},{"start":{"date-parts":[[2017,11,16]],"date-time":"2017-11-16T00:00:00Z","timestamp":1510790400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006227","name":"Lawrence Livermore National Laboratory","doi-asserted-by":"publisher","award":["DE-AC52-07NA27344"],"award-info":[{"award-number":["DE-AC52-07NA27344"]}],"id":[{"id":"10.13039\/100006227","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100008902","name":"Los Alamos National Laboratory","doi-asserted-by":"publisher","award":["DE-AC5206NA25396"],"award-info":[{"award-number":["DE-AC5206NA25396"]}],"id":[{"id":"10.13039\/100008902","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006228","name":"Oak Ridge National Laboratory","doi-asserted-by":"publisher","award":["DE-AC05-00OR22725"],"award-info":[{"award-number":["DE-AC05-00OR22725"]}],"id":[{"id":"10.13039\/100006228","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objective<\/jats:title>\n                    <jats:p>We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not suf\ufb01ciently capture syntactic and semantic contexts from free-text documents.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Materials and Methods<\/jats:title>\n                    <jats:p>Data for our analyses were obtained from 942 deidenti\ufb01ed pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classi\ufb01cation, matched to G1\u2013G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w\/A), and a convolutional neural network.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Our results demonstrate that for both information tasks, HAN performed signi\ufb01cantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macroF-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w\/A (0.637, 0.471), and convolutional neural network (0.714, 0.460).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocx131","type":"journal-article","created":{"date-parts":[[2017,10,17]],"date-time":"2017-10-17T15:11:15Z","timestamp":1508253075000},"page":"321-330","source":"Crossref","is-referenced-by-count":108,"title":["Hierarchical attention networks for information extraction from cancer pathology reports"],"prefix":"10.1093","volume":"25","author":[{"given":"Shang","family":"Gao","sequence":"first","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Michael T","family":"Young","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"John X","family":"Qiu","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Hong-Jun","family":"Yoon","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"James B","family":"Christian","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Paul A","family":"Fearn","sequence":"additional","affiliation":[{"name":"Surveillance Informatics Branch, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA"}]},{"given":"Georgia D","family":"Tourassi","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Arvind","family":"Ramanthan","sequence":"additional","affiliation":[{"name":"Computational Science and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,11,16]]},"reference":[{"issue":"20","key":"2020110612371561100_ocx131-B1","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1056\/NEJMp1600894","article-title":"Aiming high\u2014changing the trajectory for cancer","volume":"374","author":"Lowy","year":"2016","journal-title":"New Engl J Med."},{"key":"2020110612371561100_ocx131-B2","volume-title":"Overview of the SEER Program","author":"National Cancer Institute","year":"2017"},{"key":"2020110612371561100_ocx131-B3","first-page":"1378","article-title":"Ask me anything: dynamic memory networks for natural language processing","author":"Kumar","year":"2016","journal-title":"Proc Int Conf Mach Learn."},{"key":"2020110612371561100_ocx131-B4","article-title":"Convolutional neural networks for sentence classi\ufb01cation","author":"Kim","year":"2014","journal-title":"arXiv preprint arXiv:14085882."},{"key":"2020110612371561100_ocx131-B5","article-title":"A critical review of recurrent neural networks for sequence learning","author":"Lipton","year":"2015","journal-title":"arXiv preprint arXiv:150600019."},{"issue":"8","key":"2020110612371561100_ocx131-B6","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"2020110612371561100_ocx131-B7","article-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling","author":"Chung","year":"2014","journal-title":"arXiv preprint arXiv:14123555."},{"key":"2020110612371561100_ocx131-B8","article-title":"Generating sequences with recurrent neural networks","author":"Graves","year":"2013","journal-title":"arXiv preprint arXiv:13080850."},{"key":"2020110612371561100_ocx131-B9","first-page":"1480","article-title":"Hierarchical attention networks for document classi\ufb01cation","author":"Yang","year":"2016","journal-title":"In:Proceedings of NAACL-HLT."},{"issue":"6","key":"2020110612371561100_ocx131-B10","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1093\/aje\/kwt441","article-title":"Using natural language processing to improve ef\ufb01ciency of manual chart abstraction in research: the case of breast cancer recurrence","volume":"179","author":"Carrell","year":"2014","journal-title":"Am J Epidemiol."},{"key":"2020110612371561100_ocx131-B11","first-page":"1877","article-title":"Information extraction from pathology reports in a hospital setting","author":"Martinez","year":"2011","journal-title":"Proc ACM Int Conf Inf Knowl Manag."},{"key":"2020110612371561100_ocx131-B12","article-title":"Clinical information extraction via convolutional neural network","author":"Li","year":"2016","journal-title":"arXiv preprint arXiv:160309381."},{"issue":"3","key":"2020110612371561100_ocx131-B13","doi-asserted-by":"crossref","first-page":"242","DOI":"10.3414\/ME11-01-0005","article-title":"Automated classi\ufb01cation of free-text pathology reports for registration of incident cases of cancer","volume":"51","author":"Jouhet","year":"2012","journal-title":"Methods Inf Med."},{"key":"2020110612371561100_ocx131-B14","article-title":"Bidirectional RNN for medical event detection in electronic health records","volume":"473","author":"Jagannatha","year":"2016","journal-title":"Proceedings of NAACL-HLT."},{"key":"2020110612371561100_ocx131-B15","article-title":"Deep learning for automated extraction of primary sites from cancer pathology reports","author":"Qiu","year":"2017","journal-title":"IEEE J Biomed Health Inform."},{"key":"2020110612371561100_ocx131-B16","volume-title":"Coding Guidelines Breast C500\u2013C509","author":"National Cancer Institute","year":"2016"},{"key":"2020110612371561100_ocx131-B17","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume":"2","author":"Mikolov","year":"2013","journal-title":"Proc 26th Intl Conf Neural Inf Process Syst."},{"key":"2020110612371561100_ocx131-B18","article-title":"Ef\ufb01cient estimation of word representations in vector space","author":"Mikolov","year":"2013","journal-title":"arXiv preprint arXiv:13013781."},{"key":"2020110612371561100_ocx131-B19","first-page":"1532","article-title":"GloVe: global vectors for word representation","volume":"14","author":"Pennington","year":"2014","journal-title":"Proc Conf Empir Methods Nat Lang Process."},{"key":"2020110612371561100_ocx131-B20","article-title":"LSTM: A search space odyssey","author":"Greff","year":"2016","journal-title":"IEEE Trans Neural Netw Learn Syst."},{"key":"2020110612371561100_ocx131-B21","volume-title":"Optimizing the Hyperparameter of Which Hyperparameter Optimizer to Use","author":"Bernstein","year":"2017"},{"key":"2020110612371561100_ocx131-B22","first-page":"625","article-title":"Why does unsupervised pre-training help deep learning?","volume":"11","author":"Erhan","year":"2010","journal-title":"J Mach Learn Res."},{"key":"2020110612371561100_ocx131-B23","first-page":"1106","article-title":"A hierarchical neural autoencoder for paragraphs and documents","author":"Li","year":"2015","journal-title":"Proc 53rd Annu Mtg Assoc Comput Linguist."},{"issue":"3","key":"2020110612371561100_ocx131-B24","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1214\/ss\/1032280214","article-title":"Bootstrap con\ufb01dence intervals","volume":"11","author":"DiCiccio","year":"1996","journal-title":"Stat Sci."},{"key":"2020110612371561100_ocx131-B25","first-page":"577","article-title":"Attention-based models for speech recognition","author":"Chorowski","year":"2015","journal-title":"In:Adv Neural Inf Process Syst."},{"key":"2020110612371561100_ocx131-B26","first-page":"919","article-title":"Semi-supervised convolutional neural networks for text categorization via region embedding","volume-title":"Adv Neural Inf Process Syst NIPS \u201915","author":"Johnson","year":"2015"},{"key":"2020110612371561100_ocx131-B27","first-page":"526","article-title":"Supervised and semi-supervised text categorization using LSTM for region embeddings","author":"Johnson","year":"2016","journal-title":"Proc Int Conf Mach Learn. ICML \u201916."},{"issue":"7","key":"2020110612371561100_ocx131-B28","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.5858\/2000-124-1040-CAFMAP","article-title":"Clinicians are from Mars and pathologists are from Venus: clinician interpretation of pathology reports","volume":"124","author":"Powsner","year":"2000","journal-title":"Arch Pathol Lab Med."},{"key":"2020110612371561100_ocx131-B29","first-page":"195","article-title":"Multi-task deep neural networks for automated extraction of primary site and laterality information from cancer pathology reports","volume-title":"Advances in Big Data: Proceedings of the INNS Conference on Big Data","author":"Yoon","year":"2016"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/3\/321\/34150614\/ocx131.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/3\/321\/34150614\/ocx131.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,4]],"date-time":"2022-08-04T16:17:37Z","timestamp":1659629857000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/25\/3\/321\/4636780"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,11,16]]},"references-count":29,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2017,11,16]]},"published-print":{"date-parts":[[2018,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocx131","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.732138389.793552898","asserted-by":"object"}]},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,3]]},"published":{"date-parts":[[2017,11,16]]}}}