{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T11:54:23Z","timestamp":1776858863106,"version":"3.51.2"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T00:00:00Z","timestamp":1694649600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T00:00:00Z","timestamp":1694649600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p> Partitioning around medoids (PAM) is one of the most widely used and successful clustering method in many fields. One of its key advantages is that it only requires a distance or a dissimilarity between the individuals, and the fact that cluster centers are actual points in the data set means they can be taken as reliable representatives of their classes. However, its wider application is hampered by the large amount of memory needed to store the distance matrix (quadratic on the number of individuals) and also by the high computational cost of computing such distance matrix and, less importantly, by the cost of the clustering algorithm itself.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p> Therefore, new software has been provided that addresses these issues. This software, provided under GPL license and usable as either an R package or a C++ library, calculates in parallel the distance matrix for different distances\/dissimilarities (<jats:inline-formula><jats:alternatives><jats:tex-math>$$L_1$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:msub>\n                      <mml:mi>L<\/mml:mi>\n                      <mml:mn>1<\/mml:mn>\n                    <\/mml:msub>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, <jats:inline-formula><jats:alternatives><jats:tex-math>$$L_2$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                    <mml:msub>\n                      <mml:mi>L<\/mml:mi>\n                      <mml:mn>2<\/mml:mn>\n                    <\/mml:msub>\n                  <\/mml:math><\/jats:alternatives><\/jats:inline-formula>, Pearson, cosine and weighted Euclidean) and also implements a parallel fast version of PAM (FASTPAM1) using any data type to reduce memory usage. Moreover, the parallel implementation uses all the cores available in modern computers which greatly reduces the execution time. Besides its general application, the software is especially useful for processing data of single cell experiments. It has been tested in problems including clustering of single cell experiments with up to 289,000 cells with the expression of about 29,000 genes per cell.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p> Comparisons with other current packages in terms of execution time have been made. The method greatly outperforms the available R packages for distance matrix calculation and also improves the packages that implement the PAM itself. The software is available as an R package at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/CRAN.R-project.org\/package=scellpam\">https:\/\/CRAN.R-project.org\/package=scellpam<\/jats:ext-link> and as C++ libraries at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/JdMDE\/jmatlib\">https:\/\/github.com\/JdMDE\/jmatlib<\/jats:ext-link> and <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/JdMDE\/ppamlib\">https:\/\/github.com\/JdMDE\/ppamlib<\/jats:ext-link> The package is useful for single cell RNA-seq studies but it is also applicable in other contexts where clustering of large data sets is required.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-023-05471-1","type":"journal-article","created":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T15:03:03Z","timestamp":1694703783000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Scellpam: an R package\/C++ library to perform parallel partitioning around medoids on scRNAseq data sets"],"prefix":"10.1186","volume":"24","author":[{"given":"Juan","family":"Domingo","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Teresa","family":"Leon","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Esther","family":"Dura","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,9,14]]},"reference":[{"key":"5471_CR1","doi-asserted-by":"publisher","DOI":"10.15252\/msb.20188746","author":"MD Luecken","year":"2019","unstructured":"Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019. https:\/\/doi.org\/10.15252\/msb.20188746.","journal-title":"Mol Syst Biol"},{"key":"5471_CR2","doi-asserted-by":"publisher","first-page":"1141","DOI":"10.12688\/f1000research.15666.3","volume":"7","author":"A Du\u00f2","year":"2020","unstructured":"Du\u00f2 A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Research. 2020;7:1141. https:\/\/doi.org\/10.12688\/f1000research.15666.3.","journal-title":"F1000Research"},{"key":"5471_CR3","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1002\/9780470316801","volume-title":"Finding groups in data: an introduction to cluster analysis","author":"L Kaufman","year":"1990","unstructured":"Kaufman L, Rousseeuw P. Finding groups in data: an introduction to cluster analysis. New York: Wiley; 1990. p. 68\u2013125. https:\/\/doi.org\/10.1002\/9780470316801."},{"key":"5471_CR4","doi-asserted-by":"publisher","first-page":"101804","DOI":"10.1016\/j.is.2021.101804","volume":"101","author":"E Schubert","year":"2021","unstructured":"Schubert E, Rousseeuw PJ. Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms. Inf Syst. 2021;101:101804. https:\/\/doi.org\/10.1016\/j.is.2021.101804.","journal-title":"Inf Syst"},{"key":"5471_CR5","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53\u201365. https:\/\/doi.org\/10.1016\/0377-0427(87)90125-7.","journal-title":"J Comput Appl Math"},{"key":"5471_CR6","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1038\/s41592-019-0654-x","volume":"17","author":"R Amezquita","year":"2020","unstructured":"Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S. Orchestrating single-cell analysis with bioconductor. Nat Methods. 2020;17:137\u201345. https:\/\/doi.org\/10.1038\/s41592-019-0654-x.","journal-title":"Nat Methods"},{"key":"5471_CR7","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1038\/nbt.3192","volume":"33","author":"R Satija","year":"2015","unstructured":"Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495\u2013502. https:\/\/doi.org\/10.1038\/nbt.3192.","journal-title":"Nat Biotechnol"},{"key":"5471_CR8","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1038\/nbt.4096","volume":"36","author":"A Butler","year":"2018","unstructured":"Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411\u201320. https:\/\/doi.org\/10.1038\/nbt.4096.","journal-title":"Nat Biotechnol"},{"key":"5471_CR9","doi-asserted-by":"publisher","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","volume":"177","author":"T Stuart","year":"2019","unstructured":"Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive integration of single-cell data. Cell. 2019;177:1888\u2013902. https:\/\/doi.org\/10.1016\/j.cell.2019.05.031.","journal-title":"Cell"},{"key":"5471_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2021.04.048","author":"Y Hao","year":"2021","unstructured":"...Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zagar M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LB, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell. 2021. https:\/\/doi.org\/10.1016\/j.cell.2021.04.048.","journal-title":"Cell"},{"key":"5471_CR11","unstructured":"Du\u00f2 A, Soneson C. DuoClustering2018: data, clustering results and visualization functions from Du\u00f2 et al (2018). (2021). R package version 1.10.0"},{"issue":"10","key":"5471_CR12","doi-asserted-by":"publisher","first-page":"1644","DOI":"10.1038\/s41591-020-1040-z","volume":"26","author":"W Wang","year":"2020","unstructured":"Wang W, Vilella F, Alama P, Moreno I, Mignardi M, Isakova A, Pan W, Simon C, Quake SR. Single-cell transcriptomic atlas of the human endometrium during the menstrual cycle. Nat Med. 2020;26(10):1644\u201353. https:\/\/doi.org\/10.1038\/s41591-020-1040-z.","journal-title":"Nat Med"},{"key":"5471_CR13","unstructured":"Domingo J, Kutsyr-Kolesnyk O, Leon T, Perez-Moraga R, Ayala G, Roson B. A cell abundance analysis based on efficient pam clustering for a better understanding of the dynamics of endometrial remodelling. Submitted to BMC Bioinformatics (under review) (2023). https:\/\/johnford.uv.es\/BMCDraft\/BMC_under_review.pdf"},{"key":"5471_CR14","unstructured":"R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2022). R Foundation for Statistical Computing. https:\/\/www.R-project.org\/"},{"key":"5471_CR15","unstructured":"Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. Cluster: cluster analysis basics and extensions. R package version 2.1.4 \u2014 For new features, see the \u2019Changelog\u2019 file (in the package source); 2022. https:\/\/CRAN.R-project.org\/package=cluster"},{"key":"5471_CR16","unstructured":"Eckert A. parallelDist: parallel distance matrix computation using multiple threads. R package version 0.2.6; 2022. https:\/\/CRAN.R-project.org\/package=parallelDist"},{"key":"5471_CR17","unstructured":"Lucas A. Amap: another multidimensional analysis package. R package version 0.8-19; 2022. https:\/\/CRAN.R-project.org\/package=amap"},{"key":"5471_CR18","unstructured":"Li, X.: Fastkmedoids: Faster K-Medoids Clustering Algorithms: FastPAM, FastCLARA, FastCLARANS. (2021). https:\/\/CRAN.R-project.org\/package=fastkmedoids"},{"key":"5471_CR19","unstructured":"Mouselimis L. ClusterR: Gaussian mixture models, K-means, mini-batch-Kmeans, K-medoids and affinity propagation clustering. R package version 1.3.0; 2023. https:\/\/CRAN.R-project.org\/package=ClusterR"},{"key":"5471_CR20","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v001.i04","author":"A Struyf","year":"1997","unstructured":"Struyf A, Hubert M, Rousseeuw P. Clustering in an object-oriented environment. J Stat Softw. 1997. https:\/\/doi.org\/10.18637\/jss.v001.i04.","journal-title":"J Stat Softw"},{"key":"5471_CR21","unstructured":"Budiaji W. Kmed: distance-based k-medoids. R package version 0.4.2; 2022. https:\/\/CRAN.R-project.org\/package=kmed"},{"key":"5471_CR22","doi-asserted-by":"publisher","unstructured":"Defferrard M, Benzi K, Vandergheynst P, Xavier B. FMA: a dataset for music analysis. UCI Machine Learning Repository; 2017. https:\/\/doi.org\/10.24432\/C5HW28","DOI":"10.24432\/C5HW28"},{"key":"5471_CR23","unstructured":"Defferrard M, Benzi K, Vandergheynst P, Bresson X. FMA: a dataset for music analysis. In: International society for music information retrieval conference; 2016."},{"issue":"12","key":"5471_CR24","doi-asserted-by":"publisher","first-page":"1698","DOI":"10.1038\/s41588-021-00972-2","volume":"53","author":"L Garcia-Alonso","year":"2021","unstructured":"...Garcia-Alonso L, Handfield L-F, Roberts K, Nikolakopoulou K, Fernando RC, Gardner L, Woodhams B, Arutyunyan A, Polanski K, Hoo R, Sancho-Serra C, Li T, Kwakwa K, Tuck E, Lorenzi V, Massalha H, Prete M, Kleshchevnikov V, Tarkowska A, Porter T, Mazzeo CI, Dongen S, Dabrowska M, Vaskivskyi V, Mahbubani KT, Park J-E, Jimenez-Linan M, Campos L, Kiselev VY, Lindskog C, Ayuk P, Prigmore E, Stratton MR, Saeb-Parsy K, Moffett A, Moore L, Bayraktar OA, Teichmann SA, Turco MY, Vento-Tormo R. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat Genet. 2021;53(12):1698\u2013711. https:\/\/doi.org\/10.1038\/s41588-021-00972-2.","journal-title":"Nat Genet"},{"issue":"2","key":"5471_CR25","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1038\/s41588-022-01254-1","volume":"55","author":"MAS Fonseca","year":"2023","unstructured":"...Fonseca MAS, Haro M, Wright KN, Lin X, Abbasi F, Sun J, Hernandez L, Orr NL, Hong J, Choi-Kuaea Y, Maluf HM, Balzer BL, Fishburn A, Hickey R, Cass I, Goodridge HS, Truong M, Wang Y, Pisarska MD, Dinh HQ, EL-Naggar A, Huntsman DG, Anglesio MS, Goodman MT, Medeiros F, Siedhoff M, Lawrenson K. Single-cell transcriptomic analysis of endometriosis. Nat Genet. 2023;55(2):255\u201367. https:\/\/doi.org\/10.1038\/s41588-022-01254-1.","journal-title":"Nat Genet"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05471-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05471-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05471-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T22:05:26Z","timestamp":1700517926000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05471-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,14]]},"references-count":25,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5471"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05471-1","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,14]]},"assertion":[{"value":"29 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 September 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"No human or animal subjects were used for this study. No ethical issues are involved. Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"342"}}