{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T16:01:55Z","timestamp":1774368115090,"version":"3.50.1"},"reference-count":23,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T00:00:00Z","timestamp":1754524800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004955","name":"FFG","doi-asserted-by":"publisher","award":["FO999892220"],"award-info":[{"award-number":["FO999892220"]}],"id":[{"id":"10.13039\/501100004955","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Influence-based data selection methods, such as TracIn, aim to estimate the impact of individual training samples on model predictions and are increasingly used for dataset curation and reduction. This study investigates whether selecting the most positively influential training examples can be used to create compressed yet effective training datasets for transfer learning in plastic waste classification. Using a ResNet-18 model trained on a custom dataset of plastic waste images, TracIn was applied to compute influence scores across multiple training checkpoints. The top 50 influential samples per class were extracted and used to train a new model. Contrary to expectations, models trained on these highly influential subsets significantly underperformed compared to models trained on either the full dataset or an equally sized random sample. Further analysis revealed that many top-ranked influential images originated from different classes, indicating model biases and potential label confusion. These findings highlight the limitations of using influence scores for dataset compression. However, TracIn proved valuable for identifying problematic or ambiguous samples, class imbalance issues, and issues with fuzzy class boundaries. Based on the results, the utilized TracIn approach is recommended as a diagnostic instrument rather than for dataset curation.<\/jats:p>","DOI":"10.3390\/data10080127","type":"journal-article","created":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T08:33:06Z","timestamp":1754555586000},"page":"127","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Limitations of Influence-Based Dataset Compression for Waste Classification"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6605-2634","authenticated-orcid":false,"given":"Julian","family":"Aberger","sequence":"first","affiliation":[{"name":"Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lena","family":"Brensberger","sequence":"additional","affiliation":[{"name":"Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1740-7432","authenticated-orcid":false,"given":"Gerald","family":"Koinig","sequence":"additional","affiliation":[{"name":"Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benedikt","family":"H\u00e4cker","sequence":"additional","affiliation":[{"name":"Siemens Aktiengesellschaft, 1210 Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0093-3092","authenticated-orcid":false,"given":"Jes\u00fas","family":"Pestana","sequence":"additional","affiliation":[{"name":"Pro2Future GmbH, 8010 Graz, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3008-4703","authenticated-orcid":false,"given":"Renato","family":"Sarc","sequence":"additional","affiliation":[{"name":"Chair of Waste Processing Technology and Waste Management, Technical University of Leoben, 8700 Leoben, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1016\/j.wasman.2024.10.022","article-title":"Deep learning approaches for classification of copper-containing metal scrap in recycling processes","volume":"190","author":"Koinig","year":"2024","journal-title":"Waste Manag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1016\/j.wasman.2025.01.027","article-title":"Prototype of AI-powered assistance system for digitalisation of manual waste sorting","volume":"194","author":"Aberger","year":"2025","journal-title":"Waste Manag."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1186\/s40537-024-00943-4","article-title":"Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering","volume":"11","author":"Mujahid","year":"2024","journal-title":"J. Big Data"},{"key":"ref_4","unstructured":"Innsbruck University Press, Deutsche Gesellschaft f\u00fcr Abfallwirtschaft e.V., and Fakult\u00e4t f\u00fcr Bau- und Umweltingenieurwesen der Technischen Universit\u00e4t Wien (2024, January 15\u201316). Kreislauf- und Ressourcenwirtschaft. Proceedings of the Wissenschaftskongress, Vienna, Austria."},{"key":"ref_5","unstructured":"Pruthi, G., Liu, F., Sundararajan, M., and Kale, S. (2020). Estimating Training Data Influence by Tracing Gradient Descent. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2351","DOI":"10.1007\/s10994-023-06495-7","article-title":"Training data influence analysis and estimation: A survey","volume":"113","author":"Hammoudeh","year":"2024","journal-title":"Mach. Learn."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/s00506-023-00999-1","article-title":"Gro\u00dftechnische experimentelle Forschung im Digital Waste Research Lab und Digitale Abfallanalytik und -behandlung","volume":"76","author":"Kandlbauer","year":"2024","journal-title":"\u00d6sterr. Wasser Abfallw."},{"key":"ref_8","unstructured":"Umweltbundesamt (2022, December 16). Sortierung und Recycling von Kunststoffabf\u00e4llen in \u00d6sterreich: Stand 2019. Available online: https:\/\/www.umweltbundesamt.at\/fileadmin\/site\/publikationen\/rep0744_hauptteil.pdf."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1007\/s13762-019-02526-w","article-title":"Sampling and analysis of coarsely shredded mixed commercial waste. Part I: Procedure, particle size and sorting analysis","volume":"17","author":"Khodier","year":"2020","journal-title":"Int. J. Environ. Sci. Technol."},{"key":"ref_10","unstructured":"Zhao, Z.-Q., Zheng, P., Xu, S., and Wu, X. (2018). Object Detection with Deep Learning: A Review. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.rse.2018.06.028","article-title":"Detecting Mammals in UAV Images: Best Practices to address a substantially Imbalanced Dataset with Deep Learning","volume":"216","author":"Kellenberger","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018, January 18\u201323). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_13","unstructured":"(2025, June 03). PyTorch. resnet18 Documentation: PyTorch\/Torchvision Models. Available online: https:\/\/docs.pytorch.org\/vision\/main\/models\/generated\/torchvision.models.resnet18.html."},{"key":"ref_14","unstructured":"Koh, P.W., and Liang, P. (2017). Understanding Black-box Predictions via Influence Functions. arXiv."},{"key":"ref_15","unstructured":"Ghorbani, A., and Zou, J. (2019). Data Shapley: Equitable Valuation of Data for Machine Learning. arXiv."},{"key":"ref_16","unstructured":"Basu, S., You, X., and Feizi, S. (2019). On Second-Order Group Influence Functions for Black-Box Predictions. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Jia, R., Dao, D., Wang, B., Hubis, F.A., Gurel, N.M., Li, B., Zhang, C., Spanos, C.J., and Song, D. (2019). Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms. arXiv.","DOI":"10.14778\/3342263.3342637"},{"key":"ref_18","unstructured":"Yeh, C.-K., Kim, J.S., Yen, I.E.H., and Ravikumar, P. (2018). Representer Point Selection for Explaining Deep Neural Networks. arXiv."},{"key":"ref_19","unstructured":"Da Costa-Luis, C., Larroque, S.K., Altendorf, K., Mary, H., Korobov, M., Yorav-Raphael, N., Ivanov, I., Bargull, M., Rodrigues, N., and Chen, G. (2024). tqdm: A fast, Extensible Progress Bar for Python and CLI, Zenodo."},{"key":"ref_20","unstructured":"Powers, D.M.W. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"5303","DOI":"10.1109\/TIP.2022.3193758","article-title":"Weighted Correlation Embedding Learning for Domain Adaptation","volume":"31","author":"Lu","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1109\/TIP.2023.3261758","article-title":"Guided Discrimination and Correlation Subspace Learning for Domain Adaptation","volume":"32","author":"Lu","year":"2023","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4273","DOI":"10.1109\/TIP.2025.3581007","article-title":"Adaptive Dispersal and Collaborative Clustering for Few-Shot Unsupervised Domain Adaptation","volume":"34","author":"Lu","year":"2025","journal-title":"IEEE Trans. Image Process."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/8\/127\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:25:10Z","timestamp":1760034310000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/8\/127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,7]]},"references-count":23,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["data10080127"],"URL":"https:\/\/doi.org\/10.3390\/data10080127","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,7]]}}}