{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:01:29Z","timestamp":1775066489151,"version":"3.50.1"},"reference-count":46,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T00:00:00Z","timestamp":1737417600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:p>Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.<\/jats:p>","DOI":"10.3389\/fdata.2024.1466391","type":"journal-article","created":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T08:37:41Z","timestamp":1737448661000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning"],"prefix":"10.3389","volume":"7","author":[{"given":"Shivika","family":"Prasanna","sequence":"first","affiliation":[]},{"given":"Ajay","family":"Kumar","sequence":"additional","affiliation":[]},{"given":"Deepthi","family":"Rao","sequence":"additional","affiliation":[]},{"given":"Eduardo J.","family":"Simoes","sequence":"additional","affiliation":[]},{"given":"Praveen","family":"Rao","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,1,21]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2401.03390","article-title":"Global prediction of covid-19 variant emergence using dynamics-informed graph neural networks","author":"Aawar","year":"2024","journal-title":"arXiv"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1906.02569","article-title":"Gradio: Hassle-free sharing and testing of ml models in the wild","author":"Abid","year":"2019","journal-title":"arXiv"},{"key":"B3","doi-asserted-by":"crossref","DOI":"10.1109\/SNAMS52053.2020.9336541","article-title":"\u201cCone-KG: A semantic knowledge graph with news content and social context for studying covid-19 news articles on social media,\u201d","volume-title":"2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)","author":"Al-Obeidat","year":"2020"},{"key":"B4","unstructured":"Blazegraph\n              DB\n            \n          \n          Blazegraph DB\n          \n          2024"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13326-016-0067-z","article-title":"Faldo: a semantic standard for describing the location of nucleotide and protein feature annotation","volume":"7","author":"Bolleman","year":"2016","journal-title":"J. Biomed. Semant"},{"key":"B6","doi-asserted-by":"publisher","first-page":"4597","DOI":"10.1093\/bioinformatics\/btab694","article-title":"Covid-19 knowledge graph from semantic integration of biomedical literature and databases","volume":"37","author":"Chen","year":"2021","journal-title":"Bioinformatics"},{"key":"B7","doi-asserted-by":"publisher","first-page":"80","DOI":"10.4161\/fly.19695","article-title":"A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: Snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3","volume":"6","author":"Cingolani","year":"2012","journal-title":"Fly"},{"key":"B8","doi-asserted-by":"publisher","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and vcftools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"B9","unstructured":"Graph Transformer in a Nutshell.\n          \n          2024"},{"key":"B10","first-page":"64","article-title":"\u201csparqling: painlessly drawing SPARQL queries over graphol ontologies,\u201d","volume-title":"Proc. of the Fourth International Workshop on Visualization and Interaction for Ontologies and Linked Data","author":"Di Bartolomeo","year":"2018"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1332","DOI":"10.1093\/bioinformatics\/btaa834","article-title":"Covid-19 knowledge graph: a computable, multi-modal, cause-and-effect knowledge model of covid-19 pathophysiology","volume":"37","author":"Domingo-Fern\u00e1ndez","year":"2021","journal-title":"Bioinformatics"},{"key":"B12","doi-asserted-by":"crossref","first-page":"2869","DOI":"10.1145\/3219819.3219938","article-title":"\u201cChallenges and innovations in building a product knowledge graph,\u201d","volume-title":"Proc. of the 24th ACM SIGKDD International Conference On Knowledge Discovery & Data Mining","author":"Dong","year":"2018"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkac957","article-title":"Genomickb: a knowledge graph for the human genome","author":"Feng","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1999-LB","DOI":"10.2337\/db24-1999-LB","article-title":"1999-lb: GenomicKB\u2014a knowledge graph for human genomic data to advance understanding of diabetes","volume":"73","author":"Feng","year":"2024","journal-title":"Diabetes"},{"key":"B15","doi-asserted-by":"publisher","first-page":"12","DOI":"10.3389\/fdata.2020.00012","article-title":"FoodKG: A tool to enrich knowledge graphs using machine learning techniques","volume":"3","author":"Gharibi","year":"2020","journal-title":"Front. Big Data"},{"key":"B16","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1145\/2844544","article-title":"Schema. org: evolution of structured data on the web","volume":"59","author":"Guha","year":"2016","journal-title":"Commun. ACM"},{"key":"B17","article-title":"\u201cInductive representation learning on large graphs,\u201d","author":"Hamilton","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B18","doi-asserted-by":"publisher","first-page":"100042","DOI":"10.1016\/j.cmpbup.2021.100042","article-title":"Bert based clinical knowledge extraction for biomedical knowledge graph construction and analysis","volume":"1","author":"Harnoune","year":"2021","journal-title":"Comp. Meth. Prog. Biomed. Update"},{"key":"B19","article-title":"\u201cA tool for efficiently processing SPARQL queries on RDF quads,\u201d","volume-title":"Proc. of the International Semantic Web Conference Posters & Demonstrations and Industry Tracks (ISWC 2017)","author":"Katib","year":"2017"},{"key":"B20","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1016\/j.websem.2016.03.005","article-title":"RIQ: Fast processing of SPARQL queries on RDF quadruples","volume":"38","author":"Katib","year":"2016","journal-title":"J. Web Semant"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1412.6980","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv"},{"key":"B22","doi-asserted-by":"publisher","first-page":"i659","DOI":"10.1101\/840173","article-title":"Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data","volume":"36","author":"Lanchantin","year":"2019","journal-title":"Bioinformatics"},{"key":"B23","doi-asserted-by":"publisher","first-page":"167","DOI":"10.3233\/SW-140134","article-title":"Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia","volume":"6","author":"Lehmann","year":"2015","journal-title":"Semantic Web"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with burrows-wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short dna sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res"},{"key":"B26","doi-asserted-by":"publisher","first-page":"i911","DOI":"10.1093\/bioinformatics\/btaa822","article-title":"Deepcdr: a hybrid graph convolutional network for predicting cancer drug response","volume":"36","author":"Liu","year":"2020","journal-title":"Bioinformatics"},{"key":"B27","article-title":"\u201cYAGO3: a knowledge base from multilingual wikipedias,\u201d","volume-title":"7th Biennial Conference on Innovative Data Systems Research (CIDR 2015)","author":"Mahdisoltani","year":"2013"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"B29","doi-asserted-by":"publisher","first-page":"4338","DOI":"10.1073\/pnas.90.10.4338","article-title":"The human genome project","volume":"90","author":"Olson","year":"1993","journal-title":"Proc. Nat. Acad. Sci"},{"key":"B30","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1186\/s12911-022-01848-z","article-title":"Expediting knowledge acquisition by a web framework for knowledge graph exploration and visualization (kgev): case studies on covid-19 and human phenotype ontology","volume":"22","author":"Peng","year":"2022","journal-title":"BMC Med. Inform. Decis. Mak"},{"key":"B31","doi-asserted-by":"publisher","first-page":"baad006","DOI":"10.1093\/database\/baad006","article-title":"Aimedgraph: a comprehensive multi-relational knowledge graph for precision medicine","volume":"2023","author":"Quan","year":"2023","journal-title":"Database"},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.21227\/b0ph-s175","author":"Rao","year":"2021","journal-title":"Variant Analysis Of Human Genome Sequences For COVID-19 Research."},{"key":"B33","doi-asserted-by":"publisher","first-page":"100155","DOI":"10.1016\/j.patter.2020.100155","article-title":"Kg-covid-19: a framework to produce customized knowledge graphs for covid-19 response","volume":"2","author":"Reese","year":"2021","journal-title":"Patterns"},{"key":"B34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13073-021-00835-9","article-title":"Cadd-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores","volume":"13","author":"Rentzsch","year":"2021","journal-title":"Genome Med"},{"key":"B35","doi-asserted-by":"publisher","first-page":"D886","DOI":"10.1093\/nar\/gky1016","article-title":"Cadd: predicting the deleteriousness of variants throughout the human genome","volume":"47","author":"Rentzsch","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"B36","first-page":"36","volume-title":"Introducing Cloudlab: Scientific Infrastructure for Advancing Cloud Architectures and Applications","author":"Ricci","year":"2014"},{"key":"B37","doi-asserted-by":"publisher","first-page":"100760","DOI":"10.1016\/j.websem.2022.100760","article-title":"Knowledge4covid-19: a semantic-based approach for constructing a covid-19 related knowledge graph from various sources and analyzing treatments' toxicities","volume":"75","author":"Sakor","year":"2023","journal-title":"J. Web Semant"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1506.01333","article-title":"Fast processing of SPARQL queries on RDF quadruples","author":"Slavov","year":"2015","journal-title":"arXiv"},{"key":"B39","unstructured":"SnpEff\n          \n          2024"},{"key":"B40","doi-asserted-by":"publisher","first-page":"11469","DOI":"10.1038\/s41598-023-38314-3","article-title":"Covid-19 infection inference with graph neural networks","volume":"13","author":"Song","year":"2023","journal-title":"Sci. Rep"},{"key":"B41","doi-asserted-by":"publisher","first-page":"e1002195","DOI":"10.1371\/journal.pbio.1002195","article-title":"Big data: astronomical or genomical?","volume":"13","author":"Stephens","year":"2015","journal-title":"PLoS Biol"},{"key":"B42","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1109\/BigData.2018.8622484","article-title":"\u201cKnowledge-guided bayesian support vector machine for high-dimensional data with application to analysis of genomics data,\u201d","volume-title":"2018 IEEE International Conference on Big Data (Big Data)","author":"Sun","year":"2018"},{"key":"B43","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1145\/2629489","article-title":"Wikidata: a free collaborative knowledgebase","volume":"57","author":"Vrandevci\u0107","year":"2014","journal-title":"Commun. ACM"},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.01315","article-title":"Deep graph library: a graph-centric, highly-performant package for graph neural networks","author":"Wang","year":"2019","journal-title":"arXiv"},{"key":"B45","doi-asserted-by":"publisher","first-page":"9240","DOI":"10.48550\/arXiv.1903.03894","article-title":"GNNexplainer: Generating explanations for graph neural networks","volume":"32","author":"Ying","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40649-019-0069-y","article-title":"Graph convolutional networks: a comprehensive review","volume":"6","author":"Zhang","year":"2019","journal-title":"Comp. Soc. Netw"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1466391\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,21]],"date-time":"2025-01-21T08:37:48Z","timestamp":1737448668000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1466391\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,21]]},"references-count":46,"alternative-id":["10.3389\/fdata.2024.1466391"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2024.1466391","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,21]]},"article-number":"1466391"}}