{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T13:20:46Z","timestamp":1784294446008,"version":"3.55.0"},"reference-count":49,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,24]],"date-time":"2022-12-24T00:00:00Z","timestamp":1671840000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-19-P3IA-0001"],"award-info":[{"award-number":["ANR-19-P3IA-0001"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["075-15-2021-634"],"award-info":[{"award-number":["075-15-2021-634"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004569","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","award":["ANR-19-P3IA-0001"],"award-info":[{"award-number":["ANR-19-P3IA-0001"]}],"id":[{"id":"10.13039\/501100004569","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004569","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","award":["075-15-2021-634"],"award-info":[{"award-number":["075-15-2021-634"]}],"id":[{"id":"10.13039\/501100004569","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.<\/jats:p>","DOI":"10.3390\/e25010033","type":"journal-article","created":{"date-parts":[[2022,12,27]],"date-time":"2022-12-27T04:40:39Z","timestamp":1672116039000},"page":"33","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data"],"prefix":"10.3390","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1474-1734","authenticated-orcid":false,"given":"Evgeny M.","family":"Mirkes","sequence":"first","affiliation":[{"name":"School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8504-5448","authenticated-orcid":false,"given":"Jonathan","family":"Bac","sequence":"additional","affiliation":[{"name":"Institut Curie, PSL Research University, 75005 Paris, France"},{"name":"Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM), U900, 75012 Paris, France"},{"name":"CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7877-8769","authenticated-orcid":false,"given":"Aziz","family":"Fouch\u00e9","sequence":"additional","affiliation":[{"name":"Institut Curie, PSL Research University, 75005 Paris, France"},{"name":"Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM), U900, 75012 Paris, France"},{"name":"CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3032-5469","authenticated-orcid":false,"given":"Sergey V.","family":"Stasenko","sequence":"additional","affiliation":[{"name":"Laboratory of Advanced Methods for High-Dimensional Data Analysis, Lobachevsky University, 603000 Nizhniy Novgorod, Russia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9517-7284","authenticated-orcid":false,"given":"Andrei","family":"Zinovyev","sequence":"additional","affiliation":[{"name":"Institut Curie, PSL Research University, 75005 Paris, France"},{"name":"Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM), U900, 75012 Paris, France"},{"name":"CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75005 Paris, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6224-1430","authenticated-orcid":false,"given":"Alexander N.","family":"Gorban","sequence":"additional","affiliation":[{"name":"School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,24]]},"reference":[{"key":"ref_1","first-page":"2030","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"You, K., Long, M., Cao, Z., Wang, J., and Jordan, M.I. (2019, January 15\u201320). Universal Domain Adaptation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00283"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/TNN.2010.2091281","article-title":"Domain Adaptation via Transfer Component Analysis","volume":"22","author":"Pan","year":"2011","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, C.Y., Arabnia, H.R., and Deligiannidis, L. (2021). A Brief Review of Domain Adaptation. Advances in Data Science and Information Engineering, Springer International Publishing.","DOI":"10.1007\/978-3-030-71704-9"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1007\/s10994-009-5152-4","article-title":"A theory of learning from different domains","volume":"79","author":"Blitzer","year":"2010","journal-title":"Mach. Learn."},{"key":"ref_6","unstructured":"Shen, Z., Liu, J., He, Y., Zhang, X., Xu, R., Yu, H., and Cui, P. (2021). Towards Out-Of-Distribution Generalization: A Survey. arXiv."},{"key":"ref_7","unstructured":"Chen, M., Xu, Z.E., Weinberger, K.Q., and Sha, F. (July, January 26). Marginalized Denoising Autoencoders for Domain Adaptation. Proceedings of the 29th International Conference on Machine Learning, ICML 2012, icml.cc \/Omnipress, Edinburgh, Scotland, UK."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2639","DOI":"10.1162\/0899766042321814","article-title":"Canonical Correlation Analysis: An Overview with Application to Learning Methods","volume":"16","author":"Hardoon","year":"2004","journal-title":"Neural Computation"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1006\/jmva.2000.1908","article-title":"Common Principal Components for Dependent Random Vectors","volume":"75","author":"Neuenschwander","year":"2000","journal-title":"J. Multivar. Anal."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1137\/0718026","article-title":"Towards a Generalized Singular Value Decomposition","volume":"18","author":"Paige","year":"2006","journal-title":"SIAM J. Numer. Anal."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, J., Wang, C., Gao, J., and Han, J. (2013, January 2\u20134). Multi-view clustering via joint nonnegative matrix factorization. Proceedings of the 13th SIAM International Conference on Data Mining, Austin, TX, USA.","DOI":"10.1137\/1.9781611972832.28"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e49","DOI":"10.1093\/bioinformatics\/btl242","article-title":"Integrating structured biological data by Kernel Maximum Mean Discrepancy","volume":"22","author":"Borgwardt","year":"2006","journal-title":"Bioinformatics"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Fernando, B., Habrard, A., Sebban, M., and Tuytelaars, T. (2013, January 1\u20138). Unsupervised Visual Domain Adaptation Using Subspace Alignment. Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.368"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Csurka, G. (2017). Correlation Alignment for Unsupervised Domain Adaptation. Domain Adaptation in Computer Vision Applications, Springer International Publishing.","DOI":"10.1007\/978-3-319-58347-1"},{"key":"ref_15","unstructured":"Hua, G., and J\u00e9gou, H. (2016). Deep CORAL: Correlation Alignment for Deep Domain Adaptation. Computer Vision\u2014ECCV 2016 Workshops, Springer International Publishing."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1109\/TPAMI.2018.2832198","article-title":"Aggregating Randomized Clustering-Promoting Invariant Projections for Domain Adaptation","volume":"41","author":"Liang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational Optimal Transport: With Applications to Data Science","volume":"11","author":"Cuturi","year":"2019","journal-title":"Found. Trends\u00ae Mach. Learn."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Gorban, A.N., Grechuk, B., Mirkes, E.M., Stasenko, S.V., and Tyukin, I.Y. (2021). High-dimensional separability for one-and few-shot learning. Entropy, 23.","DOI":"10.20944\/preprints202106.0718.v1"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1080\/14786440109462720","article-title":"On lines and planes of closest fit to systems of points in space","volume":"2","author":"Pearson","year":"1901","journal-title":"Lond. Edinb. Dublin Philos. Mag. J. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1357","DOI":"10.1016\/j.patcog.2010.12.015","article-title":"Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds","volume":"44","author":"Barshan","year":"2011","journal-title":"Pattern Recognit."},{"key":"ref_22","first-page":"329","article-title":"The Use and Interpretation of Principal Component Analysis in Applied Research","volume":"26","author":"Rao","year":"1964","journal-title":"Sankhy\u0101: Indian J. Stat. Ser. A"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1069","DOI":"10.1016\/j.drudis.2017.01.005","article-title":"The application of principal component analysis to drug discovery and biomedical data","volume":"22","author":"Giuliani","year":"2017","journal-title":"Drug Discov. Today"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jolliffe, I.T. (1986). Principal Component Analysis, Springer.","DOI":"10.1007\/978-1-4757-1904-8"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gorban, A., K\u00e9gl, B., Wunch, D., and Zinovyev, A. (2008). Principal Manifolds for Data Visualisation and Dimension Reduction, Springer. Lecture Notes in Computational Science and Engineering.","DOI":"10.1007\/978-3-540-73750-6"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1109\/TVCG.2004.17","article-title":"Robust linear dimensionality reduction","volume":"10","author":"Koren","year":"2004","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2789","DOI":"10.1016\/j.patcog.2008.01.001","article-title":"A unified framework for semi-supervised dimensionality reduction","volume":"41","author":"Song","year":"2008","journal-title":"Pattern Recognit."},{"key":"ref_28","unstructured":"Gorban, A.N., Mirkes, E.M., and Zinovyev, A. (2016, September 09). Supervised PCA. Available online: https:\/\/github.com\/Mirkes\/SupervisedPCA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sompairac, N., Nazarov, P.V., Czerwinska, U., Cantini, L., Biton, A., Molkenov, A., Zhumadilov, Z., Barillot, E., Radvanyi, F., and Gorban, A. (2019). Independent component analysis for unraveling the complexity of cancer omics datasets. Int. J. Mol. Sci., 20.","DOI":"10.3390\/ijms20184414"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1093\/biostatistics\/kxx053","article-title":"Missing data and technical variability in single-cell RNA-sequencing experiments","volume":"19","author":"Hicks","year":"2018","journal-title":"Biostatistics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1525","DOI":"10.1101\/gr.138115.112","article-title":"Copy number variation detection and genotyping from exome sequence data","volume":"22","author":"Krumm","year":"2012","journal-title":"Genome Res."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1745-6150-2-2","article-title":"Component retention in principal component analysis with application to cDNA microarray data","volume":"2","author":"Cangelosi","year":"2007","journal-title":"Biol. Direct"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1007\/s12559-019-09667-7","article-title":"How deep should be the depth of convolutional neural networks: A backyard dog case study","volume":"12","author":"Gorban","year":"2020","journal-title":"Cogn. Comput."},{"key":"ref_34","first-page":"723","article-title":"A Kernel Two-Sample Test","volume":"13","author":"Gretton","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_35","first-page":"1","article-title":"Eleven grand challenges in single-cell data science","volume":"21","author":"Szczurek","year":"2020","journal-title":"Genome Biol."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1038\/s41587-021-00895-7","article-title":"Computational principles and challenges in single-cell data integration","volume":"39","author":"Argelaguet","year":"2021","journal-title":"Nat. Biotechnol."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1038\/s41586-020-2922-4","article-title":"A molecular cell atlas of the human lung from single-cell RNA sequencing","volume":"587","author":"Travaglini","year":"2020","journal-title":"Nature"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1186\/s13059-019-1900-3","article-title":"Benchmarking principal component analysis for large-scale single-cell RNA-sequencing","volume":"21","author":"Tsuyuzaki","year":"2020","journal-title":"Genome Biol."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Cuccu, A., Francescangeli, F., De Angelis, M.L., Bruselles, A., Giuliani, A., and Zeuner, A. (2022). Analysis of Dormancy-Associated Transcriptional Networks Reveals a Shared Quiescence Signature in Lung and Colorectal Cancer. Int. J. Mol. Sci., 23.","DOI":"10.3390\/ijms23179869"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Bac, J., Mirkes, E.M., Gorban, A.N., Tyukin, I., and Zinovyev, A. (2021). Scikit-dimension: A python package for intrinsic dimension estimation. Entropy, 23.","DOI":"10.3390\/e23101368"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"12140","DOI":"10.1038\/s41598-017-11873-y","article-title":"Estimating the intrinsic dimension of datasets by a minimal neighborhood information","volume":"7","author":"Facco","year":"2017","journal-title":"Sci. Rep."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1427","DOI":"10.1016\/j.camwa.2012.09.011","article-title":"Is the k-NN classifier in high dimensions affected by the curse of dimensionality?","volume":"65","author":"Pestov","year":"2013","journal-title":"Comput. Math. Appl."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Mirkes, E.M., Allohibi, J., and Gorban, A.N. (2020). Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality. Entropy, 22.","DOI":"10.3390\/e22101105"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1016\/j.aml.2006.04.022","article-title":"Topological grammars for data approximation","volume":"20","author":"Gorban","year":"2007","journal-title":"Appl. Math. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Albergante, L., Mirkes, E., Bac, J., Chen, H., Martin, A., Faure, L., Barillot, E., Pinello, L., Gorban, A., and Zinovyev, A. (2020). Robust and scalable learning of complex intrinsic dataset geometry via ElPiGraph. Entropy, 22.","DOI":"10.3390\/e22030296"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1016\/j.ins.2015.10.013","article-title":"SOM: Stochastic initialization versus principal components","volume":"364\u2013365","author":"Akinduko","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: Uniform Manifold Approximation and Projection","volume":"3","author":"McInnes","year":"2018","journal-title":"J. Open Source Softw."},{"key":"ref_48","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1002\/aic.690370209","article-title":"Nonlinear principal component analysis using autoassociative neural networks","volume":"37","author":"Kramer","year":"1991","journal-title":"AIChE J."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/1\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:49:53Z","timestamp":1760147393000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/1\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,24]]},"references-count":49,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["e25010033"],"URL":"https:\/\/doi.org\/10.3390\/e25010033","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,24]]}}}