{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T18:15:15Z","timestamp":1754158515566,"version":"3.41.2"},"reference-count":48,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["EL"],"published-print":{"date-parts":[[2021,11,4]]},"abstract":"<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title>\n<jats:p>The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exploring knowledge extraction in materials science, as applied to digital scholarship. An overriding goal is to help inform readers about the status knowledge extraction in materials science.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title>\n<jats:p>The authors conducted a two-part analysis, comparing knowledge extraction methods applied materials science scholarship, across a sample of 22 articles; followed by a comparison of HIVE-4-MAT, an ontology-based knowledge extraction and MatScholar, a named entity recognition (NER) application. This paper covers contextual background, and a review of three tiers of knowledge extraction (ontology-based, NER and relation extraction), followed by the research goals and approach.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Findings<\/jats:title>\n<jats:p>The results indicate three key needs for researchers to consider for advancing knowledge extraction: the need for materials science focused corpora; the need for researchers to define the scope of the research being pursued, and the need to understand the tradeoffs among different knowledge extraction methods. This paper also points to future material science research potential with relation extraction and increased availability of ontologies.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title>\n<jats:p>To the best of the authors\u2019 knowledge, there are very few studies examining knowledge extraction in materials science. This work makes an important contribution to this underexplored research area.<\/jats:p>\n<\/jats:sec>","DOI":"10.1108\/el-11-2020-0320","type":"journal-article","created":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T23:21:47Z","timestamp":1628637707000},"page":"469-485","source":"Crossref","is-referenced-by-count":3,"title":["An exploratory analysis: extracting materials science knowledge from unstructured scholarly data"],"prefix":"10.1108","volume":"39","author":[{"given":"Xintong","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Jane","family":"Greenberg","sequence":"additional","affiliation":[]},{"given":"Vanessa","family":"Meschke","sequence":"additional","affiliation":[]},{"given":"Eric","family":"Toberer","sequence":"additional","affiliation":[]},{"given":"Xiaohua","family":"Hu","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2021,8,9]]},"reference":[{"key":"key2021110310070602100_ref001","first-page":"71","article-title":"A trainable summarizer with knowledge acquired from robust NLP techniques","volume-title":"Advances in Automatic Text Summarization","year":"1999"},{"first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program","year":"2001","key":"key2021110310070602100_ref002"},{"issue":"4","key":"key2021110310070602100_ref003","doi-asserted-by":"crossref","first-page":"1163","DOI":"10.1007\/s12274-016-1347-8","article-title":"High-performance oxygen reduction and evolution carbon catalysis: from mechanistic studies to device integration","volume":"10","year":"2017","journal-title":"Nano Research"},{"year":"2014","key":"key2021110310070602100_ref004","article-title":"Question answering with subgraph embeddings"},{"key":"key2021110310070602100_ref005","unstructured":"Chinchor, N.A. (1998), \u201cOverview of MUC-7\/MET-2\u201d, Science Applications International Corp., San Diego, CA."},{"issue":"7","key":"key2021110310070602100_ref006","doi-asserted-by":"crossref","first-page":"G403","DOI":"10.1149\/1.1481532","article-title":"Reliability characteristics of W\/WN\/TaO x Ny\/SiO2\/Si metal oxide semiconductor capacitors","volume":"149","year":"2002","journal-title":"Journal of the Electrochemical Society"},{"key":"key2021110310070602100_ref007","first-page":"14","article-title":"Advancing the DFC semantic technology platform via HIVE innovation","volume-title":"Research Conference on Metadata and Semantic Research","year":"2013"},{"issue":"1","key":"key2021110310070602100_ref008","first-page":"1","article-title":"Auto-generated materials database of Curie and N\u00e9el temperatures via semi-supervised relationship extraction","volume":"5","year":"2018","journal-title":"Scientific Data"},{"issue":"2","key":"key2021110310070602100_ref009","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1039\/c3ee42926a","article-title":"Enhancing SOFC cathode performance by surface modification through infiltration","volume":"7","year":"2014","journal-title":"Energy and Environmental Science"},{"issue":"1","key":"key2021110310070602100_ref010","first-page":"837","article-title":"The automatic content extraction (ace) program-tasks, data, and evaluation","volume":"2","year":"2004","journal-title":"LREC"},{"key":"key2021110310070602100_ref011","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","year":"2014","journal-title":"Journal of Biomedical Informatics"},{"issue":"1","key":"key2021110310070602100_ref012","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.artint.2005.03.001","article-title":"Unsupervised named-entity extraction from the web: an experimental study","volume":"165","year":"2005","journal-title":"Artificial Intelligence"},{"key":"key2021110310070602100_ref013","first-page":"466","article-title":"Message understanding conference-6: a brief history","volume-title":"COLING \u201896 Volume 1: The 16th International Conference on Computational Linguistics","year":"1996"},{"year":"2020","key":"key2021110310070602100_ref014","article-title":"More data, more relations, more context and more openness: a review and outlook for relation extraction"},{"key":"key2021110310070602100_ref015","article-title":"A shortest dependency path based convolutional neural network for protein-protein relation extraction","volume":"2016","year":"2016","journal-title":"BioMed Research International"},{"issue":"1","key":"key2021110310070602100_ref016","first-page":"1","article-title":"A database of battery materials auto-generated using ChemDataExtractor","volume":"7","year":"2020","journal-title":"Scientific Data"},{"key":"key2021110310070602100_ref017","first-page":"246","article-title":"Learning information extraction patterns from examples","volume-title":"International Joint Conference on Artificial Intelligence","year":"1995"},{"year":"2019","key":"key2021110310070602100_ref018","article-title":"Document-Level N-ary relation extraction with multiscale representation learning"},{"first-page":"178","article-title":"Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction","year":"2004","key":"key2021110310070602100_ref019"},{"issue":"5","key":"key2021110310070602100_ref020","doi-asserted-by":"crossref","first-page":"e93949","DOI":"10.1371\/journal.pone.0093949","article-title":"The number of scholarly documents on the public web","volume":"9","year":"2014","journal-title":"PLoS One"},{"issue":"1","key":"key2021110310070602100_ref022","first-page":"1","article-title":"Virtual screening of inorganic materials synthesis parameters with deep learning","volume":"3","year":"2017","journal-title":"NPJ Computational Materials"},{"issue":"21","key":"key2021110310070602100_ref021","doi-asserted-by":"crossref","first-page":"9436","DOI":"10.1021\/acs.chemmater.7b03500","article-title":"Materials synthesis insights from scientific literature via text extraction and machine learning","volume":"29","year":"2017","journal-title":"Chemistry of Materials"},{"issue":"1","key":"key2021110310070602100_ref024","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1016\/j.matt.2019.05.011","article-title":"Distilling a materials synthesis ontology","volume":"1","year":"2019","journal-title":"Matter"},{"issue":"1","key":"key2021110310070602100_ref023","first-page":"1","article-title":"Machine-learned and codified synthesis parameters of oxide materials","volume":"4","year":"2017","journal-title":"Scientific Data"},{"issue":"Suppl. 1","key":"key2021110310070602100_ref025","first-page":"i180","article-title":"GENIA corpus \u2013 a semantically annotated corpus for bio-text mining","volume":"19","year":"2003","journal-title":"Bioinformatics"},{"key":"key2021110310070602100_ref026","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","year":"2016","journal-title":"Database"},{"issue":"1","key":"key2021110310070602100_ref027","first-page":"1","article-title":"A neural joint model for entity and relation extraction from biomedical text","volume":"18","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"key2021110310070602100_ref028","article-title":"Drug-drug interaction extraction via convolutional neural networks","volume":"2016","year":"2016","journal-title":"Computational and Mathematical Methods in Medicine"},{"year":"2016","key":"key2021110310070602100_ref029","article-title":"End-to-end relation extraction using LSTMs on sequences and tree structures"},{"first-page":"51","article-title":"Named entity recognition for question answering","year":"2006","key":"key2021110310070602100_ref030"},{"year":"2017","key":"key2021110310070602100_ref031","article-title":"Automatically extracting action graphs from materials science synthesis procedures"},{"key":"key2021110310070602100_ref032","first-page":"731","article-title":"Proximity-based document representation for named entity retrieval","volume-title":"Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management","year":"2007"},{"issue":"2","key":"key2021110310070602100_ref033","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1007\/s12525-017-0250-9","article-title":"The transformation of the academic publishing market: multiple perspectives on innovation","volume":"27","year":"2017","journal-title":"Electronic Markets"},{"key":"key2021110310070602100_ref034","article-title":"Multichannel convolutional neural network for biological relation extraction","volume":"2016","year":"2016","journal-title":"BioMed Research International"},{"issue":"4","key":"key2021110310070602100_ref035","doi-asserted-by":"crossref","first-page":"eaaq1566","DOI":"10.1126\/sciadv.aaq1566","article-title":"Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments","volume":"4","year":"2018","journal-title":"Science Advances"},{"key":"key2021110310070602100_ref036","first-page":"1","article-title":"Automatic keyword extraction from individual documents","volume":"1","year":"2010","journal-title":"Text Mining: Applications and Theory"},{"year":"2003","key":"key2021110310070602100_ref037","article-title":"Introduction to the CoNLL-2003 shared task: language-independent named entity recognition"},{"issue":"1","key":"key2021110310070602100_ref038","first-page":"1","article-title":"SemaTyP: a knowledge graph based literature mining method for drug discovery","volume":"19","year":"2018","journal-title":"BMC Bioinformatics"},{"issue":"7698","key":"key2021110310070602100_ref039","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/nature25978","article-title":"Planning chemical syntheses with deep neural networks and symbolic AI","volume":"555","year":"2018","journal-title":"Nature"},{"year":"2020","key":"key2021110310070602100_ref040","article-title":"Evaluating the relevance of UMLS concepts for public health informatics during disasters using MetaMap"},{"key":"key2021110310070602100_ref041","first-page":"1","article-title":"Changes in scientific publishing: a heuristic for analysis","volume-title":"The Future of Scholarly Publishing: Open Access and the Economics of Digitisation","year":"2017"},{"issue":"7763","key":"key2021110310070602100_ref042","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1038\/s41586-019-1335-8","article-title":"Unsupervised word embeddings capture latent knowledge from materials science literature","volume":"571","year":"2019","journal-title":"Nature"},{"issue":"1","key":"key2021110310070602100_ref043","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1109\/TBDATA.2016.2546302","article-title":"AlgorithmSeer: a system for extracting and searching for algorithms in scholarly big data","volume":"2","year":"2016","journal-title":"IEEE Transactions on Big Data"},{"issue":"1","key":"key2021110310070602100_ref044","first-page":"1112","article-title":"Knowledge graph embedding by translating on hyperplanes","volume":"28","year":"2014","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"9","key":"key2021110310070602100_ref045","doi-asserted-by":"crossref","first-page":"3692","DOI":"10.1021\/acs.jcim.9b00470","article-title":"Named entity recognition and normalization applied to large-scale information extraction from the materials science literature","volume":"59","year":"2019","journal-title":"Journal of Chemical Information and Modeling"},{"article-title":"Scholarly big data: computational approaches to semantic labeling in materials science","volume-title":"presented at ACM\/IEEE Joint Conference on Digital Libraries Workshop 4: Organizing Big Data, Information, and Knowledge","year":"2020","key":"key2021110310070602100_ref046"},{"year":"2021","key":"key2021110310070602100_ref047","article-title":"HIVE-4-MAT: advancing the ontology infrastructure for materials science"},{"issue":"3\/4","key":"key2021110310070602100_ref048","first-page":"139","article-title":"Ontological realism: a methodology for coordinated evolution of scientific ontologies","volume":"5","year":"2010","journal-title":"Applied Ontology"}],"container-title":["The Electronic Library"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EL-11-2020-0320\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/EL-11-2020-0320\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T01:08:24Z","timestamp":1753405704000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/el\/article\/39\/3\/469-485\/99144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":48,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,8,9]]},"published-print":{"date-parts":[[2021,11,4]]}},"alternative-id":["10.1108\/EL-11-2020-0320"],"URL":"https:\/\/doi.org\/10.1108\/el-11-2020-0320","relation":{},"ISSN":["0264-0473","0264-0473"],"issn-type":[{"type":"print","value":"0264-0473"},{"type":"print","value":"0264-0473"}],"subject":[],"published":{"date-parts":[[2021,8,9]]}}}