{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T21:08:40Z","timestamp":1769634520944,"version":"3.49.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"S5","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,12,5]],"date-time":"2019-12-05T00:00:00Z","timestamp":1575504000000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Method<\/jats:title>\n                <jats:p>We develop a text classification algorithm that represents a document as a combination of a \u201cbag of words\u201d and a \u201cbag of knowledge terms,\u201d where a \u201cknowledge term\u201d is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged <jats:italic>F<\/jats:italic><jats:sub>1<\/jats:sub> score) and ranking-based metric (mean reciprocal rank) are used in evaluation.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-019-0938-1","type":"journal-article","created":{"date-parts":[[2019,12,5]],"date-time":"2019-12-05T01:02:23Z","timestamp":1575507743000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Improving rare disease classification using imperfect knowledge graph"],"prefix":"10.1186","volume":"19","author":[{"given":"Xuedong","family":"Li","sequence":"first","affiliation":[]},{"given":"Yue","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Dongwu","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Walter","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Dezhong","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Qiaozhu","family":"Mei","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,12,5]]},"reference":[{"key":"938_CR1","unstructured":"European Commission. Rare Diseases. https:\/\/ec.europa.eu\/health\/non_communicable_diseases\/rare_diseases_en. Accessed 26 Mar 2019."},{"key":"938_CR2","unstructured":"United States DepartmentofHealthandHumanServices. National Organization for Rare Disorders (NORD). https:\/\/www.nidcd.nih.gov\/directory\/national-organization-rare-disorders-nord. Accessed 26 Mar 2019."},{"issue":"2","key":"938_CR3","doi-asserted-by":"publisher","first-page":"145","DOI":"10.5582\/irdr.2018.01056","volume":"7","author":"J He","year":"2018","unstructured":"He J, Kang Q, Hu J, Song P, Jin C. China has officially released its first national list of rare diseases. Intractable Rare Dis Res. 2018; 7(2):145\u20137.","journal-title":"Intractable Rare Dis Res"},{"key":"938_CR4","unstructured":"Orphanet. The portal for rare diseases and orphan drugs. https:\/\/www.orpha.net\/consor\/cgi-bin\/index.php. Accessed 26 Mar 2019."},{"issue":"1","key":"938_CR5","doi-asserted-by":"publisher","first-page":"e1083145","DOI":"10.1080\/21675511.2015.1083145","volume":"3","author":"D Svenstrup","year":"2015","unstructured":"Svenstrup D, J\u00f8rgensen HL, Winther O. Rare disease diagnosis: a review of web search, social media and large-scale data-mining approaches. Rare Dis. 2015; 3(1):e1083145.","journal-title":"Rare Dis"},{"key":"938_CR6","volume-title":"Proceedings of the 2008 ACM SIGMOD international conference on Management of data","author":"K Bollacker","year":"2008","unstructured":"Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. New York: ACM: 2008. p. 1247\u201350."},{"issue":"3","key":"938_CR7","doi-asserted-by":"publisher","first-page":"269","DOI":"10.14778\/3157794.3157797","volume":"11","author":"A Ratner","year":"2017","unstructured":"Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, R\u00e9 C. Snorkel: Rapid training data creation with weak supervision. Proc VLDB Endowment. 2017; 11(3):269\u201382.","journal-title":"Proc VLDB Endowment"},{"issue":"1","key":"938_CR8","doi-asserted-by":"publisher","first-page":"5994","DOI":"10.1038\/s41598-017-05778-z","volume":"7","author":"M Rotmensch","year":"2017","unstructured":"Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D. Learning a health knowledge graph from electronic medical records. Sci Rep. 2017; 7(1):5994.","journal-title":"Sci Rep"},{"issue":"6","key":"938_CR9","doi-asserted-by":"publisher","first-page":"528","DOI":"10.1016\/j.ijmedinf.2013.01.005","volume":"82","author":"R Dragusin","year":"2013","unstructured":"Dragusin R, Petcu P, Lioma C, Larsen B, J\u00f8rgensen HL, Cox IJ, et al.FindZebra: a search engine for rare diseases. Int J Med Inf. 2013; 82(6):528\u201338.","journal-title":"Int J Med Inf"},{"key":"938_CR10","unstructured":"Shen F, Liu S, Wang Y, Wang L, Afzal N, Liu H. Leveraging collaborative filtering to accelerate rare disease diagnosis. In: AMIA Annual Symposium Proceedings. vol. 2017. American Medical Informatics Association: 2017. p. 1554."},{"issue":"4","key":"938_CR11","doi-asserted-by":"publisher","first-page":"e11301","DOI":"10.2196\/11301","volume":"6","author":"F Shen","year":"2018","unstructured":"Shen F, Liu S, Wang Y, Wen A, Wang L, Liu H. Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches. JMIR Med Inf. 2018; 6(4):e11301.","journal-title":"JMIR Med Inf"},{"key":"938_CR12","first-page":"1505","volume":"2018","author":"F Shen","year":"2018","unstructured":"Shen F, Liu H. Incorporating Knowledge-Driven Insights into a Collaborative Filtering Model to Facilitate the Differential Diagnosis of Rare Diseases. AMIA Annu Symp Proc. 2018; 2018:1505\u20131514.","journal-title":"AMIA Annu Symp Proc"},{"key":"938_CR13","volume-title":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","author":"R Babbar","year":"2017","unstructured":"Babbar R, Sch\u00f6lkopf B. Dismec: Distributed sparse machines for extreme multi-label classification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. New York: ACM: 2017. p. 721\u20139."},{"key":"938_CR14","volume-title":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","author":"H Jain","year":"2019","unstructured":"Jain H, Balasubramanian V, Chunduri B, Varma M. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. New York: ACM: 2019. p. 528\u201336."},{"issue":"9","key":"938_CR15","first-page":"1263","volume":"21","author":"H He","year":"2008","unstructured":"He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2008; 21(9):1263\u201384.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"938_CR16","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321\u201357.","journal-title":"J Artif Intell Res"},{"issue":"4","key":"938_CR17","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","volume":"5","author":"B Krawczyk","year":"2016","unstructured":"Krawczyk B. Learning from imbalanced data: open challenges and future directions. Prog Artif Intell. 2016; 5(4):221\u201332.","journal-title":"Prog Artif Intell"},{"issue":"6","key":"938_CR18","doi-asserted-by":"publisher","first-page":"1367","DOI":"10.1109\/TPAMI.2018.2832629","volume":"41","author":"Q Dong","year":"2018","unstructured":"Dong Q, Gong S, Zhu X. Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell. 2018; 41(6):1367\u20131381.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"938_CR19","volume-title":"Icml. vol. 97","author":"Y Yang","year":"1997","unstructured":"Yang Y, Pedersen JO. A comparative study on feature selection in text categorization. In: Icml. vol. 97. San Francisco: Morgan Kaufmann Publishers Inc.: 1997. p. 35."},{"key":"938_CR20","volume-title":"IJCAI. vol. 5","author":"E Gabrilovich","year":"2005","unstructured":"Gabrilovich E, Markovitch S. Feature generation for text categorization using world knowledge. In: IJCAI. vol. 5. San Francisco: Morgan Kaufmann Publishers Inc.: 2005. p. 1048\u201353."},{"key":"938_CR21","volume-title":"Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics","author":"B Settles","year":"2011","unstructured":"Settles B. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics: 2011. p. 1467\u201378."},{"key":"938_CR22","volume-title":"Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval","author":"G Druck","year":"2008","unstructured":"Druck G, Mann G, McCallum A. Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York: ACM: 2008. p. 595\u2013602."},{"issue":"Aug","key":"938_CR23","first-page":"1655","volume":"7","author":"H Raghavan","year":"2006","unstructured":"Raghavan H, Madani O, Jones R. Active learning with feedback on features and instances. J Mach Learn Res. 2006; 7(Aug):1655\u201386.","journal-title":"J Mach Learn Res"},{"key":"938_CR24","unstructured":"Jieba Chinese text segmentation. https:\/\/github.com\/fxsjy\/jieba. Accessed 26 Mar 2019."},{"key":"938_CR25","doi-asserted-by":"crossref","unstructured":"Xu B, Xu Y, Liang J, Xie C, Liang B, Cui W, et al. CN-DBpedia: a never-ending Chinese knowledge extraction system. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer: 2017. p. 428\u2013438.","DOI":"10.1007\/978-3-319-60045-1_44"},{"key":"938_CR26","unstructured":"Knowledge Works. http:\/\/kw.fudan.edu.cn\/. Accessed 26 Mar 2019."},{"key":"938_CR27","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to information retrieval. 1st ed","author":"C Manning","year":"2008","unstructured":"Manning C, Raghavan P, Sch\u00fctze H. Term frequency and weighting. In: Introduction to information retrieval. 1st ed. New York: Cambridge university press: 2008. p. 117\u20139."},{"issue":"2","key":"938_CR28","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","volume":"41","author":"T Baltru\u0161aitis","year":"2019","unstructured":"Baltru\u0161aitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2019; 41(2):423\u201343.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"938_CR29","volume-title":"Proceedings of the 10th international conference on World Wide Web","author":"C Dwork","year":"2001","unstructured":"Dwork C, Kumar R, Naor M, Sivakumar D. Rank aggregation methods for the web. In: Proceedings of the 10th international conference on World Wide Web. New York: ACM: 2001. p. 613\u2013622."},{"key":"938_CR30","unstructured":"Zaidan OF, Eisner J, Piatko C. Machine learning with annotator rationales to reduce annotation cost. In: Proceedings of the NIPS* 2008 workshop on cost sensitive learning. Neural Information Processing Systems Foundation, Inc.: 2008. p. 260\u2013267."},{"key":"938_CR31","unstructured":"Wikipedia. F1 Score. https:\/\/en.wikipedia.org\/wiki\/F1_score. Accessed 26 Mar 2019."},{"key":"938_CR32","doi-asserted-by":"crossref","unstructured":"Craswell N. Mean reciprocal rank. Encycl Database Syst. 2009:1703.","DOI":"10.1007\/978-0-387-39940-9_488"},{"issue":"1","key":"938_CR33","doi-asserted-by":"publisher","first-page":"119","DOI":"10.2307\/2984124","volume":"4","author":"EJ Pitman","year":"1937","unstructured":"Pitman EJ. Significance tests which may be applied to samples from any populations. Suppl J R Stat Soc. 1937; 4(1):119\u201330.","journal-title":"Suppl J R Stat Soc"},{"key":"938_CR34","volume-title":"European conference on machine learning","author":"T Joachims","year":"1998","unstructured":"Joachims T. Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. Berlin: Springer: 1998. p. 137\u2013142."},{"key":"938_CR35","volume-title":"Interactive Machine Learning with Applications in Health Informatics. Doctoral dissertation","author":"Y Wang","year":"2018","unstructured":"Wang Y. Interactive Machine Learning with Applications in Health Informatics. Doctoral dissertation. Ann Arbor: University of Michigan; 2018."},{"issue":"1-2","key":"938_CR36","doi-asserted-by":"publisher","first-page":"39","DOI":"10.3233\/DS-170007","volume":"1","author":"X Wilcke","year":"2017","unstructured":"Wilcke X, Bloem P, De Boer V. The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci. 2017; 1(1-2):39\u201357.","journal-title":"Data Sci"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-019-0938-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12911-019-0938-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-019-0938-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:18:35Z","timestamp":1607041115000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-019-0938-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":36,"journal-issue":{"issue":"S5","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["938"],"URL":"https:\/\/doi.org\/10.1186\/s12911-019-0938-1","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"5 December 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"238"}}