{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T02:00:52Z","timestamp":1776132052358,"version":"3.50.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,12,30]],"date-time":"2024-12-30T00:00:00Z","timestamp":1735516800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)\u2014SFB","award":["442419336"],"award-info":[{"award-number":["442419336"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>\n            Machine learning has become increasingly important in biomechanics. It allows to unveil hidden patterns from large and complex data, which leads to a more comprehensive understanding of biomechanical processes and deeper insights into human movement. However, machine learning models are often trained on a single dataset with a limited number of participants, which negatively affects their robustness and generalizability. Combining data from multiple existing sources provides an opportunity to overcome these limitations without spending more time on recruiting participants and recording new data. It is furthermore an opportunity for researchers who lack the financial requirements or laboratory equipment to conduct expensive motion capture studies themselves. At the same time, subtle interlaboratory differences can be problematic in an analysis due to the bias that they introduce. In our study, we investigated differences in motion capture datasets in the context of machine learning, for which we combined overground walking trials from four existing studies. Specifically, our goal was to examine whether a machine learning model was able to predict the original data source based on marker and GRF trajectories of single strides and how different scaling methods and pooling procedures affected the outcome. Layer-wise relevance propagation was applied to understand which factors were influential to distinguish the original data sources. We found that the model could predict the original data source with a very high accuracy (up to\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({\\gt}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            99%), which decreased by about 15 percentage points when we scaled every dataset individually prior to pooling. However, none of the proposed scaling methods could fully remove the dataset bias. Layer-wise relevance propagation revealed that there was not only one single factor that differed between all datasets. Instead, every dataset had its unique characteristics that were picked up by the model. These variables differed between the scaling and pooling approaches but were mostly consistent between trials belonging to the same dataset. Our results show that motion capture data is sensitive even to small deviations in marker placement and experimental setup and that small inter-group differences should not be overinterpreted during data analysis, especially when the data was collected in different labs. Furthermore, we recommend scaling datasets individually prior to pooling them which led to the lowest accuracy. We want to raise awareness that differences in datasets always exist and are recognizable by machine learning models. Researchers should thus think about how these differences might affect their results when combining data from different studies.\n          <\/jats:p>","DOI":"10.1145\/3702646","type":"journal-article","created":{"date-parts":[[2024,11,11]],"date-time":"2024-11-11T14:05:15Z","timestamp":1731333915000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Exploring Dataset Bias and Scaling Techniques in Multi-Source Gait Biomechanics: An Explainable Machine Learning Approach"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0105-5244","authenticated-orcid":false,"given":"Sophie","family":"Fleischmann","sequence":"first","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8963-6220","authenticated-orcid":false,"given":"Simon","family":"Dietz","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3806-9922","authenticated-orcid":false,"given":"Julian","family":"Shanbhag","sequence":"additional","affiliation":[{"name":"Engineering Design, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8481-2969","authenticated-orcid":false,"given":"Annika","family":"Wuensch","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7318-9578","authenticated-orcid":false,"given":"Marlies","family":"Nitschke","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8610-1966","authenticated-orcid":false,"given":"J\u00f6rg","family":"Miehling","sequence":"additional","affiliation":[{"name":"Engineering Design, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0244-5033","authenticated-orcid":false,"given":"Sandro","family":"Wartzack","sequence":"additional","affiliation":[{"name":"Engineering Design, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8585-2725","authenticated-orcid":false,"given":"Sigrid","family":"Leyendecker","sequence":"additional","affiliation":[{"name":"Institute of Applied Dynamics, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0417-0336","authenticated-orcid":false,"given":"Bjoern M.","family":"Eskofier","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany and Translational Digital Health Group, Institute of AI for Health, Helmholtz Zentrum M\u00fcnchen - German Research Center for Environmental Health, Neuherberg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1867-0374","authenticated-orcid":false,"given":"Anne D.","family":"Koelewijn","sequence":"additional","affiliation":[{"name":"Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit\u00e4t Erlangen-N\u00fcrnberg (FAU), Erlangen, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,30]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2020.0770"},{"key":"e_1_3_2_3_2","unstructured":"David Alvarez-Melis and Tommi S. Jaakkola. 2018. Towards robust interpretability with self-explaining neural networks. arXiv:1806.07538. Retrieved from https:\/\/arxiv.org\/abs\/1806.07538"},{"key":"e_1_3_2_4_2","unstructured":"Christopher J. Anders David Neumann Wojciech Samek Klaus-Robert M\u00fcller and Sebastian Lapuschkin. 2023. Software for dataset-wide XAI: From local explanations to global insights with Zennit CoRelAy and ViRelAy. arXiv:2106.13200v2. Retrieved from https:\/\/arxiv.org\/abs\/2106.13200v2"},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the NIPS Workshop on Large Scale Visual Recognition and Retrieval","author":"Gong Boqing","year":"2021","unstructured":"Boqing Gong, Fei Sha, and Kristen Graumann. 2021. Overcoming dataset bias: An unsupervised domain adaptation approach. In Proceedings of the NIPS Workshop on Large Scale Visual Recognition and Retrieval."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","unstructured":"Benjamin R. Babcock Astrid Kosters Junkai Yang Mackenzie L. White and Eliver E. B. Ghosn. 2021. Data matrix normalization and merging strategies minimize batch-specific systemic variation in scRNA-Seq data. bioRxiv. DOI: 10.1101\/2021.08.18.456898","DOI":"10.1101\/2021.08.18.456898"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0130140"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2004.05.002"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gaitpost.2013.04.022"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.3389\/fbioe.2020.00260"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.humov.2021.102891"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2020.110182"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.88591.4"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBME.2007.901024"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.3389\/fbioe.2020.00604"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2019.07.022"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-71704-9_65"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2016.10.033"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13643-019-1063-z"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tibtech.2022.02.005"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gaitpost.2008.10.060"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2018.09.009"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2010.06.025"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41576-023-00586-w"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/0966-6362(95)01057-2"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.3390\/s21217145"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0249657"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-04083-2_2"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-019-38748-8"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","unstructured":"Fabian Horst Sebastian Lapuschkin Samek Samek Wojciech Klaus-Robert M\u00fcller and Wolfgang I Sch\u00f6llhorn. 2019. A public dataset of overground walking kinetics and full-body kinematics in healthy adult individuals. Mendeley Data. V3. DOI: 10.17632\/svx74xcrjr.3","DOI":"10.17632\/svx74xcrjr.3"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gaitpost.2020.07.114"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1055\/a-1231-5304"},{"key":"e_1_3_2_33_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2017. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1080\/14763141.2016.1246603"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","unstructured":"Tiziana Lencioni Ilaria Carpinella Marco Rabuffetti Alberto Marzegan and Alberto Ferrari. 2019. Human kinematic kinetic and EMG data during level walking toe\/heel-walking stairs ascending\/descending. figshare. Collection. DOI: 10.6084\/m9.figshare.c.4494755.v1","DOI":"10.6084\/m9.figshare.c.4494755.v1"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-019-0323-z"},{"key":"e_1_3_2_37_2","unstructured":"Christoffer Loeffler Wei-Cheng Lai Bjoern Eskofier Dario Zanca Lukas Schmidt and Christopher Mutschler. 2023. Don\u2019t get me wrong: How to apply deep visual interpretations to time series. arXiv:2203.07861. Retrieved from https:\/\/arxiv.org\/abs\/2203.07861"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10723-021-09595-7"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295230"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","unstructured":"Florent Moissenet and C\u00e9line Schreiber. 2019. A multimodal dataset of human gait at different walking speeds established on injury-free adult participants. V8. figshare. DOI: 10.6084\/m9.figshare.7734767","DOI":"10.6084\/m9.figshare.7734767"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-28954-6_10"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnhum.2023.1205881"},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","unstructured":"Soham Raste Rahul Singh Joel Vaughan and Vijayan N. Nair. 2022. Quantifying inherent randomness in machine learning algorithms. arXiv:2206.12353. Retrieved from https:\/\/arxiv.org\/abs\/2206.12353","DOI":"10.2139\/ssrn.4146989"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2021.110451"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.14348\/molcells.2023.0009"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2021.3060483"},{"key":"e_1_3_2_48_2","unstructured":"Warren S. Sarle. 2023. comp.ai.neural-nets FAQ Part 2 of 7: Learning. Retrieved from http:\/\/www.faqs.org\/faqs\/ai-faq\/neural-nets\/part2\/"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-019-0124-4"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1006223"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14418-4_26"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-58347-1_2"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2020.109820"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995347"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbiomech.2023.111623"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12010136"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2022.3215921"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2020.101879"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","unstructured":"Keenon Werling Michael Raitor Jon Stingel Jennifer L. Hicks Steve Collins Scott L. Delp and C. Karen Liu. 2022. Rapid bilevel optimization to concurrently solve musculoskeletal scaling marker registration and inverse kinematic problems for human motion reconstruction. bioRxiv 2022.08.22.504896. [bioRxiv.] DOI: 10.1101\/2022.08.22.504896","DOI":"10.1101\/2022.08.22.504896"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022-07054-1"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0007431"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.3389\/fbioe.2020.638793"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0255597"},{"key":"e_1_3_2_64_2","unstructured":"Sicheng Zhao Bo Li Colorado Reed Pengfei Xu and Kurt Keutzer. 2020. Multi-source domain adaptation in the deep learning era: A systematic survey. arXiv:2002.12169. Retrieved from https:\/\/arxiv.org\/abs\/2002.12169"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702646","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3702646","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:03Z","timestamp":1750295883000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702646"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,30]]},"references-count":63,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3702646"],"URL":"https:\/\/doi.org\/10.1145\/3702646","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,30]]},"assertion":[{"value":"2023-11-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-14","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}