{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,30]],"date-time":"2025-08-30T16:49:55Z","timestamp":1756572595966,"version":"3.38.0"},"reference-count":14,"publisher":"China Science Publishing & Media Ltd.","issue":"2","license":[{"start":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T00:00:00Z","timestamp":1651017600000},"content-version":"vor","delay-in-days":116,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility.<\/jats:p>\n               <jats:p>Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods.<\/jats:p>\n               <jats:p>Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.<\/jats:p>","DOI":"10.1162\/dint_a_00124","type":"journal-article","created":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T14:36:48Z","timestamp":1651070208000},"page":"173-185","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":3,"title":["Canonical Workflow for Machine Learning Tasks"],"prefix":"10.3724","volume":"4","author":[{"given":"Christophe","family":"Blanchi","sequence":"first","affiliation":[{"name":"DONA Foundation, C\/O Universit\u00e9 de Gen\u00e8ve, Rue du G\u00e9n\u00e9ral-Dufour 24, 1204 Gen\u00e8ve, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Binyam","family":"Gebre","sequence":"additional","affiliation":[{"name":"bol.com, Papendorpseweg 100, 3528 BJ Utrecht, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Wittenburg","sequence":"additional","affiliation":[{"name":"FDO Forum, Gemeindweg 55, 47533 Kleve, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"2026","published-online":{"date-parts":[[2022,4,1]]},"reference":[{"issue":"1","key":"2022042714321296800_ref1","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1162\/dint_a_00084","article-title":"Not ready for convergence in data\n                        infrastructures","volume":"3","author":"Jeffery","year":"2021","journal-title":"Data Intelligence"},{"volume-title":"Canonical Workflow Framework for Research (CWFR)\u2014position\n                        paper\u2014 version 2","author":"Hardisty","key":"2022042714321296800_ref2"},{"volume-title":"Welcome to the Galaxy community hub","key":"2022042714321296800_ref3"},{"volume-title":"Jupyter Notebook","key":"2022042714321296800_ref4"},{"key":"2022042714321296800_ref5","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR guiding principles for scientific data management\n                        and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Scientific Data"},{"volume-title":"FDO Forum","key":"2022042714321296800_ref6"},{"volume-title":"Analysis of scientific practice towards FAIR digital\n                        objects","author":"de\n                                Smedt","key":"2022042714321296800_ref7"},{"key":"2022042714321296800_ref8","doi-asserted-by":"crossref","DOI":"10.1186\/s40537-016-0043-6","article-title":"A survey of transfer learning","volume":"3","author":"Weiss","year":"2016","journal-title":"Journal of Big Data"},{"volume-title":"Digital object interface protocol\n                        specificiation\u2014version 2.0","author":"DONA Foundation","key":"2022042714321296800_ref9"},{"volume-title":"Business process execution langauge","key":"2022042714321296800_ref10"},{"volume-title":"Common workflow lanague","key":"2022042714321296800_ref11"},{"key":"2022042714321296800_ref12","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"issue":"13","key":"2022042714321296800_ref13","doi-asserted-by":"crossref","first-page":"3521","DOI":"10.1073\/pnas.1611835114","article-title":"Overcoming catastrophic forgetting in neural\n                        networks","volume":"114","author":"Kirkpatrick","year":"2017","journal-title":"Proceedings of the National Academy of\n                        Sciences"},{"key":"2022042714321296800_ref14","first-page":"2503","volume-title":"Hidden technical debt in machine learning systems","author":"Sculley","year":"2015"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/4\/2\/173\/2012411\/dint_a_00124.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/4\/2\/173\/2012411\/dint_a_00124.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:42:23Z","timestamp":1741938143000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":14,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,1]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00124","relation":{},"ISSN":["2641-435X"],"issn-type":[{"type":"electronic","value":"2641-435X"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}