{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T12:22:28Z","timestamp":1764937348558},"reference-count":0,"publisher":"IOS Press","license":[{"start":{"date-parts":[[2021,12,2]],"date-time":"2021-12-02T00:00:00Z","timestamp":1638403200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,12,2]]},"abstract":"<jats:p>Machine learning research typically starts with a fixed data set created early in the process. The focus of the experiments is finding a model and training procedure that result in the best possible performance in terms of some selected evaluation metric. This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train\/test splits, and the human labelling accuracy impact the performance of a trained deep learning classifier. Our experiments suggest that analyzing how data set properties affect performance can be an important step in improving the results of trained classifiers, and leads to better understanding of the obtained results.<\/jats:p>","DOI":"10.3233\/faia210316","type":"book-chapter","created":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T09:13:46Z","timestamp":1638868426000},"source":"Crossref","is-referenced-by-count":5,"title":["Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis"],"prefix":"10.3233","author":[{"given":"Hannes","family":"Westermann","sequence":"first","affiliation":[{"name":"Cyberjustice Laboratory, Facult\u00e9 de droit, Universit\u00e9 de Montr\u00e9al"}]},{"given":"Jarom\u00edr","family":"\u0160avelka","sequence":"additional","affiliation":[{"name":"School of Computer Science, Carnegie Mellon University"}]},{"given":"Vern R.","family":"Walker","sequence":"additional","affiliation":[{"name":"LLT Lab, Maurice A. Deane School of Law, Hofstra University"}]},{"given":"Kevin D.","family":"Ashley","sequence":"additional","affiliation":[{"name":"School of Computing and Information, University of Pittsburgh"}]},{"given":"Karim","family":"Benyekhlef","sequence":"additional","affiliation":[{"name":"Cyberjustice Laboratory, Facult\u00e9 de droit, Universit\u00e9 de Montr\u00e9al"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","Legal Knowledge and Information Systems"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA210316","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T09:55:33Z","timestamp":1638870933000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA210316"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,2]]},"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia210316","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,2]]}}}