{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T02:06:44Z","timestamp":1774922804718,"version":"3.50.1"},"reference-count":63,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,8,31]],"date-time":"2022-08-31T00:00:00Z","timestamp":1661904000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62173282"],"award-info":[{"award-number":["62173282"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1605213"],"award-info":[{"award-number":["U1605213"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31872564"],"award-info":[{"award-number":["31872564"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFD0901401"],"award-info":[{"award-number":["2018YFD0901401"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fujian Provincial Science and Technology Project","award":["2019N0001"],"award-info":[{"award-number":["2019N0001"]}]},{"name":"Fujian Provincial Science and Technology Project","award":["2017FJSCZY02"],"award-info":[{"award-number":["2017FJSCZY02"]}]},{"name":"Open Fund of Engineering Research Center for Medical Data Mining and Application of Fujian Province","award":["MDM2018002"],"award-info":[{"award-number":["MDM2018002"]}]},{"name":"Natural Science Foundation of Fujian","award":["2018J01097"],"award-info":[{"award-number":["2018J01097"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Many biological applications are essentially pairwise comparison problems, such as evolutionary relationships on genomic sequences, contigs binning on metagenomic data, cell type identification on gene expression profiles of single-cells, etc. To make pair-wise comparison, it is necessary to adopt suitable dissimilarity metric. However, not all the metrics can be fully adapted to all possible biological applications. It is necessary to employ metric learning based on data adaptive to the application of interest. Therefore, in this study, we proposed MEtric Learning with Triplet network (MELT), which learns a nonlinear mapping from original space to the embedding space in order to keep similar data closer and dissimilar data far apart. MELT is a weakly supervised and data-driven comparison framework that offers more adaptive and accurate dissimilarity learned in the absence of the label information when the supervised methods are not applicable. We applied MELT in three typical applications of genomic data comparison, including hierarchical genomic sequences, longitudinal microbiome samples and longitudinal single-cell gene expression profiles, which have no distinctive grouping information. In the experiments, MELT demonstrated its empirical utility in comparison to many widely used dissimilarity metrics. And MELT is expected to accommodate a more extensive set of applications in large-scale genomic comparisons. MELT is available at https:\/\/github.com\/Ying-Lab\/MELT.<\/jats:p>","DOI":"10.1093\/bib\/bbac345","type":"journal-article","created":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T21:41:15Z","timestamp":1662068475000},"source":"Crossref","is-referenced-by-count":6,"title":["Metric learning for comparing genomic data with triplet network"],"prefix":"10.1093","volume":"23","author":[{"given":"Zhi","family":"Ma","sequence":"first","affiliation":[{"name":"Department of Automation, Xiamen University , China"},{"name":"National Institute for Data Science in Health and Medicine, Xiamen University"}]},{"given":"Yang Young","family":"Lu","sequence":"additional","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, Ontario , Canada"}]},{"given":"Yiwen","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , China"}]},{"given":"Renhao","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , China"}]},{"given":"Zizi","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , China"}]},{"given":"Fang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, Ontario , Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8766-5950","authenticated-orcid":false,"given":"Ying","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Automation, Xiamen University , China"},{"name":"National Institute for Data Science in Health and Medicine, Xiamen University"},{"name":"Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision , Xiamen, Fujian 361005 , China"},{"name":"Fujian Key Laboratory of Genetics and Breeding of Marine Organisms , Xiamen, 361100 , China"}]}],"member":"286","published-online":{"date-parts":[[2022,8,31]]},"reference":[{"issue":"5","key":"2022092013223990100_ref1","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1038\/nrg1603","article-title":"Phylogenomics and the reconstruction of the tree of life","volume":"6","author":"Delsuc","year":"2005","journal-title":"Nat Rev Genet"},{"issue":"W1","key":"2022092013223990100_ref2","doi-asserted-by":"crossref","first-page":"W554","DOI":"10.1093\/nar\/gkx351","article-title":"CAFE: aCcelerated Alignment-FrEe sequence analysis","volume":"45","author":"Lu","year":"2017","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2022092013223990100_ref3","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1093\/bib\/bbx067","article-title":"Alignment-free inference of hierarchical and reticulate phylogenomic relationships","volume":"20","author":"Bernard","year":"2019","journal-title":"Brief Bioinform"},{"issue":"2","key":"2022092013223990100_ref4","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1093\/bioinformatics\/btaa699","article-title":"CRAFT: Compact genome Representation towards large-scale Alignment-Free daTabase","volume":"37","author":"Lu","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013223990100_ref5","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.neucom.2020.08.017","article-title":"A tutorial on distance metric learning: mathematical foundations, algorithms, experimental analysis, prospects and challenges","volume":"425","author":"Su\u00e1rez","year":"2021","journal-title":"Neurocomputing"},{"issue":"suppl_2","key":"2022092013223990100_ref6","doi-asserted-by":"crossref","first-page":"W45","DOI":"10.1093\/nar\/gkh362","article-title":"CVTree: a phylogenetic tree reconstruction tool based on whole genomes","volume":"32","author":"Qi","year":"2004","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2022092013223990100_ref7","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1186\/1471-2164-13-730","article-title":"Comparison of metagenomic samples using sequence signatures","volume":"13","author":"Jiang","year":"2012","journal-title":"BMC Genomics"},{"issue":"1","key":"2022092013223990100_ref8","doi-asserted-by":"crossref","first-page":"e84348","DOI":"10.1371\/journal.pone.0084348","article-title":"Comparison of metatranscriptomic samples based on k-tuple frequencies","volume":"9","author":"Wang","year":"2014","journal-title":"PLoS One"},{"issue":"1","key":"2022092013223990100_ref9","doi-asserted-by":"crossref","first-page":"24175","DOI":"10.1038\/srep24175","article-title":"Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes","volume":"6","author":"Lin","year":"2016","journal-title":"Sci Rep"},{"issue":"6","key":"2022092013223990100_ref10","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1093\/bioinformatics\/btw290","article-title":"COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge","volume":"33","author":"Lu","year":"2017","journal-title":"Bioinformatics"},{"issue":"5","key":"2022092013223990100_ref11","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat Rev Genet"},{"issue":"2","key":"2022092013223990100_ref12","doi-asserted-by":"crossref","first-page":"lqaa044","DOI":"10.1093\/nargab\/lqaa044","article-title":"A network-based integrated framework for predicting virus\u2013prokaryote interactions","volume":"2","author":"Wang","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"2022092013223990100_ref13","first-page":"193","volume-title":"Proceedings of the 26th International Conference on World Wide Web. 2017, International World Wide Web Conferences Steering Committee","author":"Hsieh"},{"issue":"2","key":"2022092013223990100_ref14","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1109\/TITB.2012.2229286","article-title":"Identifying mammalian MicroRNA targets based on supervised distance metric learning","volume":"17","author":"Liu","year":"2013","journal-title":"IEEE J Biomed Health Inform"},{"issue":"5","key":"2022092013223990100_ref15","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1109\/TCBB.2015.2495186","article-title":"A guaranteed similarity metric learning framework for biological sequence comparison","volume":"13","author":"Hua","year":"2016","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2022092013223990100_ref16","first-page":"2078","volume-title":"AAAI Conference on Artificial Intelligence","author":"Shi","year":"2014"},{"key":"2022092013223990100_ref17","volume-title":"2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)","author":"Kimothi","year":"2017"},{"issue":"12","key":"2022092013223990100_ref18","doi-asserted-by":"crossref","first-page":"1499","DOI":"10.1038\/nbt1205-1499","article-title":"How does gene expression clustering work?","volume":"23","author":"D'Haeseleer","year":"2005","journal-title":"Nat Biotechnol"},{"key":"2022092013223990100_ref19","volume-title":"Advances in Bioinformatics and Computational Biology","author":"Jaskowiak","year":"2012"},{"issue":"1","key":"2022092013223990100_ref20","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1186\/s40064-016-2941-7","article-title":"The distance function effect on k-nearest neighbor classification for medical datasets","volume":"5","author":"Hu","year":"2016","journal-title":"Springerplus"},{"key":"2022092013223990100_ref21","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.compbiomed.2018.11.011","article-title":"Genetic algorithm for assigning weights to gene expressions using functional annotations","volume":"104","author":"Ray","year":"2019","journal-title":"Comput Biol Med"},{"issue":"4","key":"2022092013223990100_ref22","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1093\/bioinformatics\/btz731","article-title":"Metric learning on expression data for gene function prediction","volume":"36","author":"Makrodimitris","year":"2020","journal-title":"Bioinformatics"},{"issue":"12","key":"2022092013223990100_ref23","doi-asserted-by":"crossref","first-page":"i293","DOI":"10.1093\/bioinformatics\/btv253","article-title":"Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival","volume":"31","author":"Schissler","year":"2015","journal-title":"Bioinformatics"},{"key":"2022092013223990100_ref24","volume-title":"2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","author":"Najat","year":"2017"},{"issue":"16","key":"2022092013223990100_ref25","doi-asserted-by":"crossref","first-page":"e94","DOI":"10.1093\/nar\/gkaa582","article-title":"Variance-adjusted Mahalanobis (VAM): a fast and accurate method for cell-specific gene set scoring","volume":"48","author":"Frost","year":"2020","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"2022092013223990100_ref26","doi-asserted-by":"crossref","first-page":"1820","DOI":"10.1093\/bioinformatics\/bty887","article-title":"SENSE: Siamese neural network for sequence embedding and alignment-free comparison","volume":"35","author":"Zheng","year":"2019","journal-title":"Bioinformatics"},{"key":"2022092013223990100_ref27","first-page":"49","article-title":"On the generalized distance in statistics","volume":"2","author":"Mahalanobis","year":"1936","journal-title":"Proc Natl Inst Sci India"},{"key":"2022092013223990100_ref28","first-page":"521","volume-title":"Proceedings of the 15th International Conference on Neural Information Processing Systems","author":"Xing","year":"2002"},{"issue":"10","key":"2022092013223990100_ref29","doi-asserted-by":"crossref","first-page":"1570","DOI":"10.1016\/j.neucom.2009.11.037","article-title":"A new kernelization framework for Mahalanobis distance learning algorithms","volume":"73","author":"Chatpatanasiri","year":"2010","journal-title":"Neurocomputing"},{"key":"2022092013223990100_ref30","volume-title":"2018 1st Annual International Conference on Information and Sciences (AiCIS)","author":"Al-Mejibli","year":"2018"},{"issue":"3","key":"2022092013223990100_ref31","first-page":"23","article-title":"Evaluation of SVM kernels and conventional machine learning algorithms for speaker identification","volume":"3","author":"Mezghani","year":"2010","journal-title":"Int J Hybrid Inf Technol"},{"issue":"3","key":"2022092013223990100_ref32","first-page":"11","article-title":"Large scale online learning of image similarity through ranking","volume":"5524","author":"Chechik","year":"2009","journal-title":"J Mach Learn Res"},{"key":"2022092013223990100_ref33","volume-title":"International workshop on similarity-based pattern recognition","author":"Hoffer","year":"2015"},{"key":"2022092013223990100_ref34","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Schroff","year":"2015"},{"key":"2022092013223990100_ref35","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Kumar","year":"2016"},{"key":"2022092013223990100_ref36","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Oh Song","year":"2016"},{"key":"2022092013223990100_ref37","volume-title":"2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)","author":"Lennox","year":"2021"},{"key":"2022092013223990100_ref38","doi-asserted-by":"crossref","DOI":"10.1109\/TCBB.2021.3108718","article-title":"TripletProt: deep representation learning of proteins based on siamese networks","volume-title":"IEEE\/ACM Trans Comput Biol Bioinform","author":"Nourani","year":"2021"},{"issue":"6","key":"2022092013223990100_ref39","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbab248","article-title":"Improving protein fold recognition using triplet network and ensemble deep learning","volume":"22","author":"Liu","year":"2021","journal-title":"Brief Bioinform"},{"issue":"1","key":"2022092013223990100_ref40","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"},{"issue":"6","key":"2022092013223990100_ref41","doi-asserted-by":"crossref","first-page":"00257","DOI":"10.1128\/mSystems.00257-18","article-title":"K-mer similarity, networks of microbial genomes, and taxonomic rank","volume":"3","author":"Bernard","year":"2018","journal-title":"mSystems"},{"issue":"1","key":"2022092013223990100_ref42","doi-asserted-by":"crossref","first-page":"949","DOI":"10.3390\/life5010949","article-title":"Phylogeny and taxonomy of archaea: a comparison of the whole-genome-based CVTree approach with 16S rRNA sequence analysis","volume":"5","author":"Zuo","year":"2015","journal-title":"Life"},{"issue":"11","key":"2022092013223990100_ref43","doi-asserted-by":"crossref","first-page":"1467","DOI":"10.1089\/cmb.2010.0056","article-title":"Alignment-free sequence comparison (II): theoretical power of comparison statistics","volume":"17","author":"Wan","year":"2010","journal-title":"J Comput Biol"},{"issue":"1\u20133","key":"2022092013223990100_ref44","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","article-title":"Principal component analysis","volume":"2","author":"Wold","year":"1987","journal-title":"Chemom Intell Lab Syst"},{"issue":"Nov","key":"2022092013223990100_ref45","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mac Lear Res"},{"key":"2022092013223990100_ref46","article-title":"Umap: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018"},{"issue":"12","key":"2022092013223990100_ref47","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2022092013223990100_ref48","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume":"11","author":"Vinh","year":"2010","journal-title":"J Mach Learn Res"},{"issue":"1","key":"2022092013223990100_ref49","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"issue":"Dec","key":"2022092013223990100_ref50","first-page":"583","article-title":"Cluster ensembles---a knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J Mach Learn Res"},{"issue":"suppl_1","key":"2022092013223990100_ref51","first-page":"501","article-title":"NCBI reference sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"33","author":"Pruitt","year":"2005","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2022092013223990100_ref52","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res"},{"key":"2022092013223990100_ref53","volume-title":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","author":"Wu","year":"2015"},{"key":"2022092013223990100_ref54","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.neucom.2011.10.021","article-title":"Fast neighborhood component analysis","volume":"83","author":"Yang","year":"2012","journal-title":"Neurocomputing"},{"key":"2022092013223990100_ref55","first-page":"1","article-title":"Metric-learn: metric learning algorithms in python","volume":"21","author":"De Vazelhes","year":"2020","journal-title":"J Mach Learn Res"},{"issue":"3","key":"2022092013223990100_ref56","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1023\/A:1009752403260","article-title":"On comparing classifiers: pitfalls to avoid and a recommended approach","volume":"1","author":"Salzberg","year":"1997","journal-title":"Data Min Knowl Discov"},{"key":"2022092013223990100_ref57","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar","year":"2006","journal-title":"J Mach Learn Res"},{"issue":"11","key":"2022092013223990100_ref58","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1101\/gr.177725.114","article-title":"Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing","volume":"24","author":"Biase","year":"2014","journal-title":"Genome Res"},{"issue":"6167","key":"2022092013223990100_ref59","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1126\/science.1245316","article-title":"Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells","volume":"343","author":"Deng","year":"2014","journal-title":"Science"},{"issue":"1","key":"2022092013223990100_ref60","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.cell.2016.01.047","article-title":"Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos","volume":"165","author":"Goolam","year":"2016","journal-title":"Cell"},{"issue":"1","key":"2022092013223990100_ref61","first-page":"100","article-title":"Algorithm AS 136: a k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J R Stat Soc Ser C Appl Stat"},{"issue":"3","key":"2022092013223990100_ref62","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/j.chom.2014.08.014","article-title":"The Integrative human microbiome project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease","volume":"16","author":"Integrative","year":"2014","journal-title":"Cell Host Microbe"},{"issue":"6","key":"2022092013223990100_ref63","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1038\/s41591-019-0450-2","article-title":"The vaginal microbiome and preterm birth","volume":"25","author":"Fettweis","year":"2019","journal-title":"Nat Med"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac345\/45935141\/bbac345.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac345\/45935141\/bbac345.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,15]],"date-time":"2023-02-15T20:39:28Z","timestamp":1676493568000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac345\/6679451"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,31]]},"references-count":63,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac345","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,8,31]]},"article-number":"bbac345"}}