{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T22:18:18Z","timestamp":1769638698683,"version":"3.49.0"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T00:00:00Z","timestamp":1729641600000},"content-version":"vor","delay-in-days":30,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Sequences derived from organisms sharing common evolutionary origins exhibit similarity, while unique sequences, absent in related organisms, act as good diagnostic marker candidates. However, the approach focused on identifying dissimilar regions among closely-related organisms poses challenges as it requires complex multiple sequence alignments, making computation and parsing difficult. To address this, we have developed a biologically inspired universal NAUniSeq algorithm to find the unique sequences for microorganism diagnosis by traveling through the phylogeny of life. Mapping through a phylogenetic tree ensures a low number of cross-contamination and false positives. We have downloaded complete taxonomy data from Taxadb database and sequence data from National Center for Biotechnology Information Reference Sequence Database (NCBI-Refseq) and, with the help of NetworkX, created a phylogenetic tree. Sequences were assigned over the graph nodes, k-mers were created for target and non-target nodes and search was performed over the graph using the depth first search algorithm. In a memory efficient alternative NoSQL approach, we created a collection of Refseq sequences in MongoDB database using tax-id and path of FASTA files. We queried the MongoDB collection for the target and non-target sequences. In both the approaches, we used an alignment free sliding window k-mer\u2013based procedure that quickly compares k-mers of target and non-target sequences and returns unique sequences that are not present in the non-target. We have validated our algorithm with target nodes Mycobacterium tuberculosis, Neisseria gonorrhoeae, and Monkeypox and generated unique sequences. This universal algorithm is a powerful tool for generating diagnostic sequences, enabling the accurate identification of microbial strains with high phylogenetic precision.<\/jats:p>","DOI":"10.1093\/bib\/bbae545","type":"journal-article","created":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T14:10:18Z","timestamp":1729692618000},"source":"Crossref","is-referenced-by-count":3,"title":["Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection"],"prefix":"10.1093","volume":"25","author":[{"given":"Gulshan Kumar","family":"Sharma","sequence":"first","affiliation":[{"name":"Malaviya National Institute of Technology , Jawahar Lal Nehru Marg, Jhalana Gram, Malviya Nagar, Jaipur, Rajasthan 302017 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rakesh","family":"Sharma","sequence":"additional","affiliation":[{"name":"Centre for Converging Technologies, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kavita","family":"Joshi","sequence":"additional","affiliation":[{"name":"Department of Zoology, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sameer","family":"Qureshi","sequence":"additional","affiliation":[{"name":"Department of Zoology, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shubhita","family":"Mathur","sequence":"additional","affiliation":[{"name":"Department of Zoology, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sharad","family":"Sinha","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samit","family":"Chatterjee","sequence":"additional","affiliation":[{"name":"Department of Zoology, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0988-3359","authenticated-orcid":false,"given":"Vandana","family":"Nunia","sequence":"additional","affiliation":[{"name":"Department of Zoology, University of Rajasthan , Jawahar Lal Nehru Marg, Talvandi, Jaipur, Rajasthan 302004 ,","place":["India"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,10,23]]},"reference":[{"key":"2024102314100357100_ref1","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nrg3433","article-title":"Computational solutions for omics data","volume":"14","author":"Berger","year":"2013","journal-title":"Nat Rev Genet"},{"key":"2024102314100357100_ref2","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024102314100357100_ref3","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2024102314100357100_ref4","doi-asserted-by":"publisher","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024102314100357100_ref5","first-page":"346","article-title":"What is bioinformatics? A proposed definition and overview of the field methods","volume":"40","author":"Luscombe","year":"2001","journal-title":"Inf Med"},{"key":"2024102314100357100_ref6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5897\/IJBC2013.0086","article-title":"Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)","volume":"6","author":"Donkor","year":"2014","journal-title":"J Bioinf Seq Anal"},{"key":"2024102314100357100_ref7","doi-asserted-by":"publisher","first-page":"3687","DOI":"10.1093\/bioinformatics\/btaa222","article-title":"GenMap: ultra-fast computation of genome mappability","volume":"36","author":"Pockrandt","year":"2020","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref8","doi-asserted-by":"publisher","first-page":"2081","DOI":"10.1093\/bioinformatics\/btab059","article-title":"Fur: Find unique genomic regions for diagnostic PCR","volume":"37","author":"Haubold","year":"2021","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref9","doi-asserted-by":"publisher","first-page":"2789","DOI":"10.12688\/f1000research.10225.2","article-title":"Recapitulating phylogenies using k-mers: from trees to networks","volume":"5","author":"Bernard","year":"2016","journal-title":"F1000Res"},{"key":"2024102314100357100_ref10","doi-asserted-by":"crossref","DOI":"10.5962\/bhl.title.82303","volume-title":"On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life","author":"Darwin","year":"1859","edition":"1st"},{"key":"2024102314100357100_ref11","doi-asserted-by":"publisher","first-page":"2090","DOI":"10.1093\/molbev\/msl080","article-title":"Exploring the relationship between sequence similarity and accurate phylogenetic trees","volume":"23","author":"Cantarel","year":"2006","journal-title":"Mol Biol Evol"},{"key":"2024102314100357100_ref12","doi-asserted-by":"publisher","first-page":"1795","DOI":"10.1093\/molbev\/msn104","article-title":"Confirming the phylogeny of mammals by use of large comparative sequence data sets","volume":"25","author":"Prasad","year":"2008","journal-title":"Mol Biol Evol"},{"key":"2024102314100357100_ref13","first-page":"2","article-title":"Visualizing bacteriophage evolution through sequence and structural phylogeny of lysin a and terminase proteins: an analysis of protein structure across phage clusters","volume":"11","author":"Maansi","year":"2021","journal-title":"J Purdue Undergrad Res"},{"key":"2024102314100357100_ref14","doi-asserted-by":"publisher","first-page":"S8","DOI":"10.1186\/1471-2164-12-S3-S8","article-title":"Perfect hamming code with a hash table for faster genome mapping","volume":"12","author":"Takenaka","year":"2011","journal-title":"BMC Genomics"},{"key":"2024102314100357100_ref15","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1093\/nargab\/lqad004","article-title":"A fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis","volume":"5","author":"Firtina","year":"2023","journal-title":"NAR Genom Bioinform"},{"key":"2024102314100357100_ref16","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1093\/bioinformatics\/bts712","article-title":"SRmapper: a fast and sensitive genome-hashing alignment tool","volume":"29","author":"Gontarz","year":"2013","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref17","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s13015-016-0069-5","article-title":"Bitpacking techniques for indexing genomes: I Hash tables","volume":"11","author":"Wu","year":"2016","journal-title":"Algorithms Mol Biol"},{"key":"2024102314100357100_ref18","volume-title":"Algorithms and Data Structures: The Basic Toolbox","author":"Mehlhorn","year":"2008"},{"key":"2024102314100357100_ref19","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1109\/TNB.2016.2523501","article-title":"A comparative analysis between k-mers and community detection-based features for the task of protein classification","volume":"15","author":"Tangirala","year":"2016","journal-title":"IEEE Trans Nanobioscience"},{"key":"2024102314100357100_ref20","doi-asserted-by":"publisher","first-page":"3349","DOI":"10.1093\/bioinformatics\/btab196","article-title":"KEC: unique sequence search by k-mer exclusion","volume":"37","author":"Beran","year":"2021","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref21","doi-asserted-by":"publisher","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024102314100357100_ref22","doi-asserted-by":"publisher","first-page":"1153","DOI":"10.1111\/2041-210X.13440","article-title":"Taxadb: a high-performance local taxonomic database interface","volume":"11","author":"Norman","year":"2020","journal-title":"Methods Ecol Evol"},{"key":"2024102314100357100_ref23","first-page":"11","volume-title":"Proceedings of the 7th Python in Science Conference (SciPy2008), (Pasadena, CA USA)","author":"Hagberg","year":"2008"},{"key":"2024102314100357100_ref24","doi-asserted-by":"publisher","first-page":"1608","DOI":"10.1093\/bioinformatics\/btq249","article-title":"PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes","volume":"26","author":"Yu","year":"2010","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref25","first-page":"487609","article-title":"DeepTMHMM predicts alpha and beta transmembrane proteins using deep neural networks","author":"Hallgren","year":"2022","journal-title":"bioRxiv"},{"key":"2024102314100357100_ref26","doi-asserted-by":"publisher","first-page":"884","DOI":"10.1093\/bioinformatics\/btt607","article-title":"Protter: interactive protein feature visualization and integration with experimental proteomic data","volume":"30","author":"Omasits","year":"2014","journal-title":"Bioinformatics"},{"key":"2024102314100357100_ref27","doi-asserted-by":"publisher","first-page":"1090","DOI":"10.4236\/ns.2010.210136","article-title":"Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms","volume":"02","author":"Chou","year":"2010","journal-title":"Natural Science"},{"key":"2024102314100357100_ref28","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/1471-2105-8-4","article-title":"VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines","volume":"8","author":"Doytchinova","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2024102314100357100_ref29","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1038\/s41587-019-0036-z","article-title":"SignalP 5.0 improves signal peptide predictions using deep neural networks","volume":"37","author":"Almagro Armenteros","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2024102314100357100_ref30","volume-title":"A Practical Guide to Ubuntu","author":"Sobell","year":"2015"},{"key":"2024102314100357100_ref31","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/j.gene.2014.05.043","article-title":"K-mer natural vector and its application to the phylogenetic analysis of genetic sequences","volume":"546","author":"Wen","year":"2014","journal-title":"Gene"},{"key":"2024102314100357100_ref32","doi-asserted-by":"publisher","first-page":"944","DOI":"10.3390\/ijms21030944","article-title":"Ozoline ON unique k-mers as strain-specific barcodes for phylogenetic analysis and natural microbiome profiling","volume":"21","author":"Panyukov","year":"2020","journal-title":"Int J Mol Sci"},{"key":"2024102314100357100_ref33","doi-asserted-by":"publisher","first-page":"365","DOI":"10.3390\/biology9110365","article-title":"Amino acid k-mer feature extraction for quantitative antimicrobial resistance (AMR) prediction by machine learning and model interpretation for biological insights","volume":"9","author":"ValizadehAslani","year":"2020","journal-title":"Biology"},{"key":"2024102314100357100_ref34","doi-asserted-by":"publisher","first-page":"113","DOI":"10.3389\/fimmu.2019.00113","article-title":"Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery","volume":"10","author":"Dalsass","year":"2019","journal-title":"Front Immunol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae545\/59996608\/bbae545.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae545\/59996608\/bbae545.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T14:10:23Z","timestamp":1729692623000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae545\/7832362"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":34,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,9,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae545","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,9,23]]},"article-number":"bbae545"}}