{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:52:14Z","timestamp":1740135134641,"version":"3.37.3"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T00:00:00Z","timestamp":1613088000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T00:00:00Z","timestamp":1613088000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ProFi project","award":["ANR-10-INBS-08"],"award-info":[{"award-number":["ANR-10-INBS-08"]}]},{"name":"GRAL project","award":["ANR-10-LABX-49-01"],"award-info":[{"award-number":["ANR-10-LABX-49-01"]}]},{"name":"DATA@UGA and SYMER projects","award":["ANR-15-IDEX-02"],"award-info":[{"award-number":["ANR-15-IDEX-02"]}]},{"name":"MIAI @ Grenoble Alpes","award":["ANR-19-P3IA-0003"],"award-info":[{"award-number":["ANR-19-P3IA-0003"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We propose a clustering algorithm that solves the powerful but computationally demanding kernel <jats:italic>k<\/jats:italic>-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nystr\u00f6m approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-03969-0","type":"journal-article","created":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T20:30:21Z","timestamp":1613161821000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis"],"prefix":"10.1186","volume":"22","author":[{"given":"Olga","family":"Permiakova","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Romain","family":"Guibert","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexandra","family":"Kraut","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Fortin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anne-Marie","family":"Hesse","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3539-3564","authenticated-orcid":false,"given":"Thomas","family":"Burger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,2,12]]},"reference":[{"issue":"6","key":"3969_CR1","doi-asserted-by":"publisher","first-page":"1537","DOI":"10.1074\/mcp.O114.037879","volume":"13","author":"J Teleman","year":"2014","unstructured":"Teleman J, Dowsey AW, Gonzalez-Galarza FF, Perkins S, Pratt B, R\u00f6st HL, et al. Numerical compression schemes for proteomics mass spectrometry data. Mol Cell Proteomics. 2014;13(6):1537\u201342.","journal-title":"Mol Cell Proteomics."},{"issue":"1","key":"3969_CR2","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1093\/biostatistics\/kxs030","volume":"14","author":"B Klaus","year":"2013","unstructured":"Klaus B, Strimmer K. Signal identification for rare and weak features: Higher criticism or false discovery rates? Biostatistics. 2013;14(1):129\u201343.","journal-title":"Biostatistics."},{"issue":"10","key":"3969_CR3","doi-asserted-by":"publisher","first-page":"2470","DOI":"10.1021\/ac026424o","volume":"75","author":"DL Tabb","year":"2003","unstructured":"Tabb DL, MacCoss MJ, Wu CC, Anderson SD, Yates JR. Similarity among tandem mass spectra from proteomic experiments: Detection, significance, and utility. Anal Chem. 2003;75(10):2470\u20137.","journal-title":"Anal Chem."},{"issue":"8","key":"3969_CR4","doi-asserted-by":"publisher","first-page":"1250","DOI":"10.1016\/j.jasms.2005.04.010","volume":"16","author":"DL Tabb","year":"2005","unstructured":"Tabb DL, Thompson MR, Khalsa-Moyers G, VerBerkmoes NC, McDonald WH. MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra. J Am Soc Mass Spectrom. 2005;16(8):1250\u201361.","journal-title":"J Am Soc Mass Spectrom."},{"issue":"4","key":"3969_CR5","doi-asserted-by":"publisher","first-page":"950","DOI":"10.1002\/pmic.200300652","volume":"4","author":"I Beer","year":"2004","unstructured":"Beer I, Barnea E, Ziv T, Admon A. Improving large-scale proteomics by clustering of mass spectrometry data. Proteomics. 2004;4(4):950\u201360.","journal-title":"Proteomics."},{"issue":"18","key":"3969_CR6","doi-asserted-by":"publisher","first-page":"3245","DOI":"10.1002\/pmic.200700160","volume":"7","author":"K Flikka","year":"2007","unstructured":"Flikka K, Meukens J, Helsens K, Vandekerckhove J, Eidhammer I, Gevaert K, et al. Implementation and application of a versatile clustering tool for tandem mass spectrometry data. Proteomics. 2007;7(18):3245\u201358.","journal-title":"Proteomics."},{"issue":"1","key":"3969_CR7","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1021\/pr070361e","volume":"7","author":"AM Frank","year":"2008","unstructured":"Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, et al. Clustering millions of tandem mass spectra. J Proteome Res. 2008;7(1):113\u201322.","journal-title":"J Proteome Res."},{"issue":"7","key":"3969_CR8","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1038\/nmeth.1609","volume":"8","author":"AM Frank","year":"2011","unstructured":"Frank AM, Monroe ME, Shah AR, Carver JJ, Bandeira N, Moore RJ, et al. Spectral archives: Extending spectral libraries to analyze both identified and unidentified spectra. Nat Methods. 2011;8(7):587\u201394.","journal-title":"Nat Methods."},{"issue":"2","key":"3969_CR9","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1038\/nmeth.2343","volume":"10","author":"J Griss","year":"2013","unstructured":"Griss J, Foster JM, Hermjakob H, Vizca\u00edno JA. PRIDE Cluster: building a consensus of proteomics data. Nat Methods. 2013;10(2):95\u20136.","journal-title":"Nat Methods."},{"issue":"8","key":"3969_CR10","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1038\/nmeth.3902","volume":"13","author":"J Griss","year":"2016","unstructured":"Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, Del-Toro N, et al. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods. 2016;13(8):651\u20136.","journal-title":"Nat Methods."},{"issue":"11","key":"3969_CR11","doi-asserted-by":"publisher","first-page":"4614","DOI":"10.1021\/pr800226w","volume":"7","author":"JA Falkner","year":"2008","unstructured":"Falkner JA, Falkner JW, Yocum AK, Andrews PC. A spectral clustering approach to MS\/MS identification of post-translational modifications. J Proteome Res. 2008;7(11):4614\u201322.","journal-title":"J Proteome Res."},{"issue":"1","key":"3969_CR12","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1109\/TCBB.2013.152","volume":"11","author":"F Saeed","year":"2014","unstructured":"Saeed F, Hoffert JD, Knepper MA. CAMS-RS: Clustering algorithm for large-scale mass spectrometry data using restricted search space and intelligent random sampling. IEEE\/ACM Trans Comput Biol Bioinf. 2014;11(1):128\u201341.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf."},{"issue":"3","key":"3969_CR13","doi-asserted-by":"publisher","first-page":"713","DOI":"10.1021\/acs.jproteome.5b00749","volume":"15","author":"M The","year":"2016","unstructured":"The M, K\u00e4ll L. MaRaCluster: a fragment rarity metric for clustering fragment spectra in shotgun proteomics. J Proteome Res. 2016;15(3):713\u201320.","journal-title":"J Proteome Res."},{"issue":"5","key":"3969_CR14","doi-asserted-by":"publisher","first-page":"1993","DOI":"10.1021\/acs.jproteome.7b00824","volume":"17","author":"J Griss","year":"2018","unstructured":"Griss J, Perez-Riverol Y, The M, K\u00e4ll L, Vizca\u00edno JA. Response to \u201ccomparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra\u2019\u2019. J Proteome Res. 2018;17(5):1993\u20136.","journal-title":"J Proteome Res."},{"issue":"1","key":"3969_CR15","first-page":"147","volume":"18","author":"L Wang","year":"2019","unstructured":"Wang L, Li S, Tang H. MsCRUSH: fast tandem mass spectral clustering using locality sensitive hashing. J Proteome Res. 2019;18(1):147\u201358.","journal-title":"J Proteome Res."},{"issue":"14","key":"3969_CR16","doi-asserted-by":"publisher","first-page":"1700454","DOI":"10.1002\/pmic.201700454","volume":"18","author":"Y Perez-Riverol","year":"2018","unstructured":"Perez-Riverol Y, Vizca\u00edno JA, Griss J. Future prospects of spectral clustering approaches in proteomics. Proteomics. 2018;18(14):1700454.","journal-title":"Proteomics."},{"issue":"7","key":"3969_CR17","doi-asserted-by":"publisher","first-page":"2771","DOI":"10.1021\/acs.jproteome.9b00068","volume":"18","author":"M Gutierrez","year":"2019","unstructured":"Gutierrez M, Handy K, Smith R. XNet: a Bayesian approach to extracted ion chromatogram clustering for precursor mass spectrometry data. J Proteome Res. 2019;18(7):2771\u20138.","journal-title":"J Proteome Res."},{"issue":"14","key":"3969_CR18","doi-asserted-by":"publisher","first-page":"e132","DOI":"10.1093\/bioinformatics\/btl219","volume":"22","author":"B Fischer","year":"2006","unstructured":"Fischer B, Grossmann J, Roth V, Gruissem W, Baginsky S, Buhmann JM. Semi-supervised LC\/MS alignment for differential proteomics. Bioinformatics. 2006;22(14):e132-40.","journal-title":"Bioinformatics."},{"issue":"8","key":"3969_CR19","doi-asserted-by":"publisher","first-page":"4152","DOI":"10.1021\/pr1003856","volume":"9","author":"S Houel","year":"2010","unstructured":"Houel S, Abernathy R, Renganathan K, Meyer-Arendt K, Ahn NG, Old WM. Quantifying the impact of chimera MS\/MS spectra on peptide identification in large-scale proteomics studies. J Proteome Res. 2010;9(8):4152\u201360.","journal-title":"J Proteome Res."},{"issue":"6","key":"3969_CR20","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1002\/mas.21400","volume":"33","author":"JD Chapman","year":"2014","unstructured":"Chapman JD, Goodlett DR, Masselon CD. Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom Rev. 2014;33(6):452\u201370.","journal-title":"Mass Spectrom Rev."},{"issue":"5","key":"3969_CR21","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1038\/nmeth.4643","volume":"15","author":"R Peckner","year":"2018","unstructured":"Peckner R, Myers SA, Jacome ASV, Egertson JD, Abelin JG, MacCoss MJ, et al. Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics. Nat Methods. 2018;15(5):371\u20138.","journal-title":"Nat Methods."},{"issue":"1","key":"3969_CR22","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1021\/acs.jproteome.7b00386","volume":"18","author":"A Hu","year":"2019","unstructured":"Hu A, Lu YY, Bilmes J, Noble WS. Joint precursor elution profile inference via regression for peptide detection in data-independent acquisition mass spectra. J Proteome Res. 2019;18(1):86\u201394.","journal-title":"J Proteome Res"},{"issue":"3","key":"3969_CR23","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1038\/nmeth.3255","volume":"12","author":"CC Tsou","year":"2015","unstructured":"Tsou CC, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras AC, et al. DIA-umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods. 2015;12(3):258\u201364.","journal-title":"Nat Methods"},{"key":"3969_CR24","doi-asserted-by":"crossref","unstructured":"Cox J, Mann M. MaxQuant enables high peptide identication rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantication. Nat Biotechnol. 2008;26(12):1367\u201372.","DOI":"10.1038\/nbt.1511"},{"key":"3969_CR25","doi-asserted-by":"crossref","unstructured":"Bertsch A, Gr\u00f6pl C, Reinert K, Kohlbacher O. OpenMS and TOPP: open source software for LC-MS data analysis. In: Methods in molecular biology (Clifton, N.J.). vol. 696. Springer; 2011; 353\u2013367.","DOI":"10.1007\/978-1-60761-987-1_23"},{"issue":"15","key":"3969_CR26","doi-asserted-by":"publisher","first-page":"1902","DOI":"10.1093\/bioinformatics\/btl276","volume":"22","author":"M Bellew","year":"2006","unstructured":"Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, et al. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics. 2006;22(15):1902\u20139.","journal-title":"Bioinformatics"},{"key":"3969_CR27","doi-asserted-by":"publisher","DOI":"10.1201\/9781584889977","volume-title":"Constrained clustering: advances in algorithms, theory, and applications","author":"S Basu","year":"2008","unstructured":"Basu S, Davidson I, Wagstaff K. Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press; 2008."},{"issue":"1","key":"3969_CR28","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1093\/comjnl\/16.1.30","volume":"16","author":"R Sibson","year":"1973","unstructured":"Sibson R. SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J. 1973;16(1):30\u20134.","journal-title":"Comput J"},{"issue":"4","key":"3969_CR29","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1093\/comjnl\/20.4.364","volume":"20","author":"D Defays","year":"1977","unstructured":"Defays D. An efficient algorithm for a complete link method. Comput J. 1977;20(4):364\u20136.","journal-title":"Comput J."},{"key":"3969_CR30","unstructured":"Ester M, Kriegel HP, Sander J, Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996; 96: 226\u2013231."},{"key":"3969_CR31","unstructured":"Michener SR. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull. 1958;38:1409\u20131438. Available from: http:\/\/ci.nii.ac.jp\/naid\/10011579647\/en\/."},{"key":"3969_CR32","unstructured":"Von Luxburg U, Williamson RC, Guyon I. Clustering: Science or art? In: Proceedings of ICML workshop on unsupervised and transfer learning; 2012; 65\u201379."},{"key":"3969_CR33","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.patcog.2018.10.026","volume":"88","author":"A Adolfsson","year":"2019","unstructured":"Adolfsson A, Ackerman M, Brownstein NC. To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 2019;88:13\u201326.","journal-title":"Pattern Recogn"},{"issue":"4","key":"3969_CR34","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1093\/bioinformatics\/btg025","volume":"19","author":"S Datta","year":"2003","unstructured":"Datta S, Datta S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics. 2003;19(4):459\u201366.","journal-title":"Bioinformatics"},{"issue":"8","key":"3969_CR35","doi-asserted-by":"publisher","first-page":"888","DOI":"10.1109\/34.868688","volume":"22","author":"J Shi","year":"2000","unstructured":"Shi J, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell. 2000;22(8):888\u2013905.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"key":"3969_CR36","unstructured":"Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems; 2002; 849\u2013856."},{"issue":"4","key":"3969_CR37","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","volume":"17","author":"U Von Luxburg","year":"2007","unstructured":"Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17(4):395\u2013416.","journal-title":"Stat Comput."},{"issue":"1","key":"3969_CR38","first-page":"571","volume":"18","author":"H Borges","year":"2019","unstructured":"Borges H, Guibert R, Permiakova O, Burger T. Distinguishing between Spectral Clustering and Cluster Analysis of Mass Spectra. J Proteome Res. 2019;18(1):571\u20133.","journal-title":"J Proteome Res."},{"issue":"8","key":"3969_CR39","doi-asserted-by":"publisher","first-page":"790","DOI":"10.1109\/34.400568","volume":"17","author":"Y Cheng","year":"1995","unstructured":"Cheng Y. Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell. 1995;17(8):790\u20139.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"issue":"5","key":"3969_CR40","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1109\/34.1000236","volume":"24","author":"D Comaniciu","year":"2002","unstructured":"Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell. 2002;24(5):603\u201319.","journal-title":"IEEE Trans Pattern Anal Mach Intell."},{"key":"3969_CR41","doi-asserted-by":"crossref","unstructured":"Schubert E, Rousseeuw PJ. Faster k-Medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: International conference on similarity search and applications. Springer; 2019; 171\u2013187.","DOI":"10.1007\/978-3-030-32047-8_16"},{"key":"3969_CR42","unstructured":"Macqueen J. Some methods for classification and analysis. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. vol. 233. Oakland, CA, USA; 1967. p. 281\u2013297. Available from: http:\/\/projecteuclid.org\/bsmsp."},{"issue":"2","key":"3969_CR43","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"SP Lloyd","year":"1982","unstructured":"Lloyd SP. Least Squares Quantization in PCM. IEEE Trans Inf Theory. 1982;28(2):129\u201337.","journal-title":"IEEE Trans Inf Theory."},{"issue":"8","key":"3969_CR44","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","volume":"31","author":"AK Jain","year":"2010","unstructured":"Jain AK. Data clustering: 50 years beyond K-means. Pattern Recogn Lett. 2010;31(8):651\u201366.","journal-title":"Pattern Recogn Lett."},{"key":"3969_CR45","volume-title":"Learning with kernels: support vector machines, regularization, optimization, and beyond","author":"CKI Williams","year":"2003","unstructured":"Williams CKI. Learning with kernels: support vector machines, regularization, optimization, and beyond, vol. 98. Cambridge: MIT press; 2003."},{"issue":"5","key":"3969_CR46","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1162\/089976698300017467","volume":"10","author":"B Sch\u00f6lkopf","year":"1998","unstructured":"Sch\u00f6lkopf B, Smola A, M\u00fcller KR. Nonlinear component analysis as a Kernel eigenvalue problem. Neural Comput. 1998;10(5):1299\u2013319.","journal-title":"Neural Comput."},{"issue":"1","key":"3969_CR47","first-page":"392","volume":"18","author":"J Henning","year":"2019","unstructured":"Henning J, Tostengard A, Smith R. A peptide-level fully annotated data set for quantitative evaluation of precursor-aware mass spectrometry data processing algorithms. J Proteome Res. 2019;18(1):392\u20138.","journal-title":"J Proteome Res."},{"issue":"10","key":"3969_CR48","doi-asserted-by":"publisher","first-page":"918","DOI":"10.1038\/nbt.2377","volume":"30","author":"MC Chambers","year":"2012","unstructured":"Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012;30(10):918\u201320.","journal-title":"Nat Biotechnol."},{"key":"3969_CR49","unstructured":"Yu Z, Herman G. On the earth mover\u2019s distance as a histogram similarity metric for image retrieval. In: IEEE international conference on multimedia and expo, ICME 2005. 2005;2005(2):686\u2013689."},{"key":"3969_CR50","doi-asserted-by":"crossref","unstructured":"Courty N, Flamary R, Tuia D. Domain adaptation with regularized optimal transport. In: Joint European conference on machine learning and knowledge discovery in databases. Springer; 2014; 274\u2013289.","DOI":"10.1007\/978-3-662-44848-9_18"},{"key":"3969_CR51","unstructured":"Majewski S, Ciach MA, Startek M, Niemyska W, Miasojedow B, Gambin A. The wasserstein distance as a dissimilarity measure for mass spectra with application to spectral deconvolution. In: 18th international workshop on algorithms in bioinformatics (WABI 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik; 2018. ."},{"key":"3969_CR52","unstructured":"Sch\u00f6lkopf B. The kernel trick for distances. In: Advances in Neural Information Processing Systems; 2001; 301\u2013307."},{"issue":"1","key":"3969_CR53","first-page":"431","volume":"20","author":"S Wang","year":"2019","unstructured":"Wang S, Gittens A, Mahoney MW. Scalable kernel K-means clustering with Nystr\u00f6m approximation: relative-error bounds. J Mach Learn Res. 2019;20(1):431\u201379.","journal-title":"J Mach Learn Res."},{"issue":"3","key":"3969_CR54","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1093\/imaiai\/iax015","volume":"7","author":"N Keriven","year":"2018","unstructured":"Keriven N, Bourrier A, Gribonval R, P\u00e9rez P. Sketching for large-scale learning of mixture models. Inf Inference J IMA. 2018;7(3):447\u2013508.","journal-title":"Inf Inference J IMA."},{"issue":"1","key":"3969_CR55","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","volume":"28","author":"JA Hartigan","year":"1979","unstructured":"Hartigan JA, Wong MA. Algorithm AS 136: a K-means clustering algorithm. Appl Stat. 1979;28(1):100.","journal-title":"Appl Stat."},{"key":"3969_CR56","doi-asserted-by":"crossref","unstructured":"Keriven N, Tremblay N, Traonmilin Y, Gribonval R. Compressive K-means. In: ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings. Institute of Electrical and Electronics Engineers Inc.; 2017; 6369\u20136373.","DOI":"10.1109\/ICASSP.2017.7953382"},{"issue":"2","key":"3969_CR57","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1307\/mmj\/1029003026","volume":"31","author":"CR Givens","year":"1984","unstructured":"Givens CR, Shortt RM. A class of Wasserstein metrics for probability distributions. Mich Math J. 1984;31(2):231\u201340.","journal-title":"Mich Math J."},{"issue":"3","key":"3969_CR58","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1111\/j.1751-5823.2002.tb00178.x","volume":"70","author":"AL Gibbs","year":"2002","unstructured":"Gibbs AL, Su FE. On choosing and bounding probability metrics. Int Stat Rev. 2002;70(3):419\u201335.","journal-title":"Int Stat Rev."},{"issue":"3","key":"3969_CR59","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1214\/009053607000000677","volume":"36","author":"T Hofmann","year":"2008","unstructured":"Hofmann T, Sch\u00f6lkopf B, Smola AJ. Kernel methods in machine learning. Ann Stat. 2008;36(3):1171\u2013220.","journal-title":"Ann Stat."},{"key":"3969_CR60","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-9096-9","volume-title":"Reproducing Kernel Hilbert spaces in probability and statistics","author":"A Berlinet","year":"2004","unstructured":"Berlinet A, Thomas-Agnan C. Reproducing Kernel Hilbert spaces in probability and statistics. Berlin: Springer; 2004."},{"key":"3969_CR61","doi-asserted-by":"crossref","unstructured":"Feragen A, Lauze F, Hauberg S. Geodesic exponential kernels: When curvature and linearity conflict. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015; 3032\u20133042.","DOI":"10.1109\/CVPR.2015.7298922"},{"key":"3969_CR62","unstructured":"Calandriello D, Rosasco L. Statistical and computational trade-offs in kernel K-means. In: Advances in neural information processing systems. vol. 2018-Decem; 2018; 9357\u20139367."},{"key":"3969_CR63","unstructured":"Rahimi A, Recht B. Random features for large-scale kernel machines. In: Advances in neural information processing systems; 2008; 1177\u20131184."},{"key":"3969_CR64","volume-title":"Fourier analysis on groups","author":"SE Puckette","year":"1965","unstructured":"Puckette SE, Rudin W. Fourier analysis on groups. Hoboken: Wiley; 1965."},{"key":"3969_CR65","doi-asserted-by":"crossref","unstructured":"Arias P, Randall G, Sapiro G. Connecting the out-of-sample and pre-image problems in Kernel methods. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE; 2007; 1\u20138.","DOI":"10.1109\/CVPR.2007.383038"},{"key":"3969_CR66","unstructured":"Mika S, Sch\u00f6lkopf B, Smola A, M\u00fcller KR, Scholz M, R\u00e4tsch G. Kernel PCA and de-noising in feature spaces. In: Advances in neural information processing systems; 1999; 536\u2013542."},{"issue":"16","key":"3969_CR67","doi-asserted-by":"publisher","first-page":"2781","DOI":"10.1093\/bioinformatics\/bty185","volume":"34","author":"F Prive","year":"2018","unstructured":"Prive F, Aschard H, Ziyatdinov A, Blum MGB. Efficient analysis of large-scale genome-wide data with two R packages: Bigstatsr and bigsnpr. Bioinformatics. 2018;34(16):2781\u20137.","journal-title":"Bioinformatics."},{"key":"3969_CR68","unstructured":"Permiakova O,\u00a0Guibert R, Burger T. Gitlab of CHICKN (Chromatogram HIerarchical Compressive K-means with Nystrom approximation) R package; 2020. Available from: https:\/\/gitlab.com\/Olga.Permiakova\/chickn."},{"key":"3969_CR69","unstructured":"Permiakova O, Guibert R, Burger T. CRAN repository of CHICKN (Chromatogram HIerarchical Compressive K-means with Nystrom approximation) R package; 2020. Available from: https:\/\/CRAN.R-project.org\/package=chickn."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03969-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-021-03969-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03969-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,12]],"date-time":"2021-02-12T20:38:32Z","timestamp":1613162312000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-03969-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,12]]},"references-count":69,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["3969"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-03969-0","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2021,2,12]]},"assertion":[{"value":"3 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"68"}}