{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:32:59Z","timestamp":1771461179892,"version":"3.50.1"},"reference-count":30,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T00:00:00Z","timestamp":1739318400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006769","name":"Russian Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100006769","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.<\/jats:p>","DOI":"10.3389\/fbinf.2025.1519468","type":"journal-article","created":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T07:28:35Z","timestamp":1739345315000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation"],"prefix":"10.3389","volume":"5","author":[{"given":"Mikhail","family":"Arbatsky","sequence":"first","affiliation":[]},{"given":"Ekaterina","family":"Vasilyeva","sequence":"additional","affiliation":[]},{"given":"Veronika","family":"Sysoeva","sequence":"additional","affiliation":[]},{"given":"Ekaterina","family":"Semina","sequence":"additional","affiliation":[]},{"given":"Valeri","family":"Saveliev","sequence":"additional","affiliation":[]},{"given":"Kseniya","family":"Rubina","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,2,12]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1038\/s41592-023-01814-1","article-title":"Comparison of transformations for single-cell RNA-seq data","volume":"20","author":"Ahlmann-Eltze","year":"2023","journal-title":"Nat. Methods"},{"key":"B2","doi-asserted-by":"publisher","first-page":"3422","DOI":"10.1093\/bioinformatics\/btaa176","article-title":"Exploring high-dimensional biological data with sparse contrastive principal component analysis","volume":"36","author":"Boileau","year":"2020","journal-title":"Bioinformatics"},{"key":"B3","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1006\/bmme.1997.2576","article-title":"Developmental expression of morphoregulatory genes in the mouse embryo: an analytical approach using a novel technology","volume":"60","author":"Craig","year":"1997","journal-title":"Biochem. Mol. Med."},{"key":"B4","doi-asserted-by":"publisher","first-page":"1141","DOI":"10.12688\/f1000research.15666.3","article-title":"A systematic performance evaluation of clustering methods for single-cell RNA-seq data","volume":"7","author":"Du\u00f2","year":"2018","journal-title":"F1000Res"},{"key":"B5","doi-asserted-by":"publisher","first-page":"e2100062","DOI":"10.1002\/bies.202100062","article-title":"Dimensional reduction in complex living systems: where, why, and how","volume":"43","author":"Eckmann","year":"2021","journal-title":"Bioessays"},{"key":"B6","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-73750-6","volume-title":"Principal manifolds for data visualization and dimension reduction","author":"Gorban","year":"2008"},{"key":"B7","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1186\/s13059-019-1874-1","article-title":"Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression","volume":"20","author":"Hafemeister","year":"2019","journal-title":"Genome Biol."},{"key":"B8","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1038\/s41587-023-01767-y","article-title":"Dictionary learning for integrative, multimodal and scalable single-cell analysis","volume":"42","author":"Hao","year":"2024","journal-title":"Nat. Biotechnol."},{"key":"B30","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1038\/s43586-024-00363-x","article-title":"Uniform manifold approximation and projection","volume":"4","author":"Healy","year":"2024","journal-title":"Nat. Rev. Methods Primers"},{"key":"B9","doi-asserted-by":"publisher","first-page":"685","DOI":"10.1038\/s41587-019-0113-3","article-title":"Efficient integration of heterogeneous single-cell transcriptomes using Scanorama","volume":"37","author":"Hie","year":"2019","journal-title":"Nat. Biotechnol."},{"key":"B10","doi-asserted-by":"publisher","first-page":"973","DOI":"10.3389\/fonc.2020.00973","article-title":"Impact of data preprocessing on integrative matrix factorization of single cell data","volume":"10","author":"Hsu","year":"2020","journal-title":"Front. Oncol."},{"key":"B11","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1038\/s42003-022-03628-x","article-title":"Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization","volume":"5","author":"Huang","year":"2022","journal-title":"Commun. Biol."},{"key":"B12","doi-asserted-by":"publisher","first-page":"1009316","DOI":"10.3389\/fgene.2022.1009316","article-title":"Influence of single-cell RNA sequencing data integration on the performance of differential gene expression analysis","volume":"13","author":"Kujawa","year":"2022","journal-title":"Front. Genet."},{"key":"B13","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1038\/s41586-018-0414-6","article-title":"RNA velocity of single cells","volume":"560","author":"La Manno","year":"2018","journal-title":"Nature"},{"key":"B14","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1007\/s10618-012-0268-8","article-title":"Visualizing dimensionality reduction of systems biology data","volume":"27","author":"Lehrmann","year":"2013","journal-title":"Data Min. Knowl. Discov."},{"key":"B15","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat. Methods"},{"key":"B16","doi-asserted-by":"publisher","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol."},{"key":"B17","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"Journal of Machine Learning Research"},{"key":"B18","doi-asserted-by":"publisher","first-page":"628","DOI":"10.1093\/bib\/bbv108","article-title":"Dimension reduction techniques for the integrative analysis of multi-omics data","volume":"17","author":"Meng","year":"2016","journal-title":"Brief. Bioinform"},{"key":"B19","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1142\/9789814447331_0043","article-title":"Principal components analysis to summarize microarray experiments: application to sporulation time series","author":"Raychaudhuri","year":"2000","journal-title":"Pac Symp. Biocomput"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1101\/2021.08.04.453579","article-title":"A comparison of data integration methods for single-cell RNA sequencing of cancer samples","volume":"2021","author":"Richards","year":"2021","journal-title":"bioRxiv"},{"key":"B21","doi-asserted-by":"publisher","first-page":"106","DOI":"10.14348\/molcells.2023.0009","article-title":"Integration of single-cell RNA-seq datasets: a review of computational methods","volume":"46","author":"Ryu","year":"2023","journal-title":"Mol. Cells"},{"key":"B22","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1089\/cmb.2015.0085","article-title":"Discovering what dimensionality reduction really tells us about RNA-seq data","volume":"22","author":"Simmons","year":"2015","journal-title":"J. Comput. Biol."},{"key":"B23","doi-asserted-by":"publisher","first-page":"5233","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"Traag","year":"2019","journal-title":"Sci. Rep."},{"key":"B24","doi-asserted-by":"publisher","first-page":"471","DOI":"10.1140\/epjb\/e2013-40829-0","article-title":"A smart local moving algorithm for large-scale modularity-based community detection","volume":"86","author":"Waltman","year":"2013","journal-title":"Eur. Phys. J. B"},{"key":"B25","doi-asserted-by":"publisher","first-page":"440","DOI":"10.1186\/s12859-020-03797-8","article-title":"Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data","volume":"21","author":"Wang","year":"2020","journal-title":"BMC Bioinforma."},{"key":"B26","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1186\/1471-2105-13-24","article-title":"Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets","volume":"13","author":"Yao","year":"2012","journal-title":"BMC Bioinforma."},{"key":"B27","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1186\/s13059-021-02552-3","article-title":"Benchmarking UMI-based single-cell RNA-seq preprocessing workflows","volume":"22","author":"You","year":"2021","journal-title":"Genome Biol."},{"key":"B28","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1186\/s13059-022-02622-0","article-title":"Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data","volume":"23","author":"Yu","year":"2022","journal-title":"Genome Biol."},{"key":"B29","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1261\/rna.078965.121","article-title":"Review of single-cell RNA-seq data clustering for cell-type identification and characterization","volume":"29","author":"Zhang","year":"2023","journal-title":"RNA"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1519468\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T07:28:38Z","timestamp":1739345318000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1519468\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,12]]},"references-count":30,"alternative-id":["10.3389\/fbinf.2025.1519468"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1519468","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,12]]},"article-number":"1519468"}}