{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T11:56:15Z","timestamp":1773143775517,"version":"3.50.1"},"reference-count":42,"publisher":"China Science Publishing & Media Ltd.","issue":"2","license":[{"start":{"date-parts":[[2024,1,19]],"date-time":"2024-01-19T00:00:00Z","timestamp":1705622400000},"content-version":"vor","delay-in-days":18,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,5,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n               <jats:p>Active learning can be used for optimizing and speeding up the screening phase of systematic reviews. Running simulation studies mimicking the screening process can be used to test the performance of different machine-learning models or to study the impact of different training data. This paper presents an architecture design with a multiprocessing computational strategy for running many such simulation studies in parallel, using the ASReview Makita workflow generator and Kubernetes software for deployment with cloud technologies. We provide a technical explanation of the proposed cloud architecture and its usage. In addition to that, we conducted 1140 simulations investigating the computational time using various numbers of CPUs and RAM settings. Our analysis demonstrates the degree to which simulations can be accelerated with multiprocessing computing usage. The parallel computation strategy and the architecture design that was developed in the present paper can contribute to future research with more optimal simulation time and, at the same time, ensure the safe completion of the needed processes.<\/jats:p>","DOI":"10.1162\/dint_a_00244","type":"journal-article","created":{"date-parts":[[2024,1,19]],"date-time":"2024-01-19T05:12:27Z","timestamp":1705641147000},"page":"320-343","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":6,"title":["Optimizing ASReview Simulations: A generic Multiprocessing Solution for \u2018Light-data\u2019 and \u2018Heavy-data\u2019 Users"],"prefix":"10.3724","volume":"6","author":[{"given":"Sergei","family":"Romanov","sequence":"first","affiliation":[{"name":"Applied Data Science, Department of Information and Computing Science, Faculty of Science, Utrecht University"}]},{"given":"Abel Soares","family":"Siqueira","sequence":"additional","affiliation":[{"name":"Netherlands eScience Center, Amsterdam, NL"}]},{"given":"Jonathan","family":"de Bruin","sequence":"additional","affiliation":[{"name":"Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands"}]},{"given":"Jelle","family":"Teijema","sequence":"additional","affiliation":[{"name":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University"}]},{"given":"Laura","family":"Hofstee","sequence":"additional","affiliation":[{"name":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University"}]},{"given":"Rens","family":"van de Schoot","sequence":"additional","affiliation":[{"name":"Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University"}]}],"member":"2026","published-online":{"date-parts":[[2024,5,1]]},"reference":[{"key":"2024071119545468100_ref1","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1197\/jamia.M1929","article-title":"Reducing workload in systematic review preparation using automated citation classification","volume":"13","author":"Cohen","year":"2006","journal-title":"Journal of the American Medical Informatics Association"},{"key":"2024071119545468100_ref2","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/s42256-020-00287-7","article-title":"An open source machine learning framework for efficient and transparent systematic reviews","volume":"3","author":"van de Schoot","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2024071119545468100_ref3","volume-title":"Active learning literature survey","author":"Settles","year":"2009"},{"key":"2024071119545468100_ref4","volume-title":"Simulation-based active learning for systematic reviews: a systematic review of the literature","author":"Teijema","year":"2023"},{"key":"2024071119545468100_ref5","article-title":"Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques","volume":"1432-1449","author":"Yerushalmy","year":"1947","journal-title":"Public Health Reports (1896-1970)"},{"key":"2024071119545468100_ref6","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1197\/jamia.M1929","article-title":"Reducing workload in systematic review preparation using automated citation classification","volume":"13","author":"Cohen","year":"2006","journal-title":"Journal of the American Medical Informatics Association"},{"key":"2024071119545468100_ref7","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1186\/s13643-023-02257-7","article-title":"Performance of active learning models for screening prioritization in systematic reviews: a simulation study into the average time to discover relevant records","volume":"12","author":"Ferdinands","year":"2023","journal-title":"Systematic Reviews"},{"key":"2024071119545468100_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12874-020-01129-1","article-title":"An evaluation of DistillerSR's machine learning-based prioritization tool for title\/abstract screening-impact on reviewer-relevant outcomes","volume":"20","author":"Hamel","year":"2020","journal-title":"BMC Medical Research Methodology"},{"key":"2024071119545468100_ref9","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/s42256-020-00287-7","article-title":"An open source machine learning framework for efficient and transparent systematic reviews","volume":"3","author":"Van De Schoot","year":"2021","journal-title":"Nature Machine Intelligence"},{"key":"2024071119545468100_ref10","volume-title":"ASReview Makita: a workflow generator for simulation studies using the command line interface of ASReview LAB","author":"Teijema","year":"2023"},{"key":"2024071119545468100_ref11","doi-asserted-by":"crossref","first-page":"1178181","DOI":"10.3389\/frma.2023.1178181","article-title":"Active learning-based Systematic reviewing using switching classification models: the case of the onset, maintenance, and relapse of depressive disorders","volume":"8","author":"Teijema","year":"2023","journal-title":"Frontiers in Research Metrics and Analytics"},{"key":"2024071119545468100_ref12","doi-asserted-by":"crossref","first-page":"38","DOI":"10.3389\/fninf.2013.00038","article-title":"Virk: an active learning-based system for bootstrapping knowledge base development in the neurosciences","volume":"7","author":"Ambert","year":"2013","journal-title":"Frontiers In Neuroinformatics"},{"key":"2024071119545468100_ref13","volume-title":"The influence of active learning model and prior knowledge choice on how long it takes to find hard-to-find relevant papers: Examining the variability of the time to discovery and the stability of its rank-orders","author":"Byrne","year":"2023"},{"key":"2024071119545468100_ref14","doi-asserted-by":"crossref","DOI":"10.31219\/osf.io\/w6qbg","volume-title":"Active learning for screening prioritization in systematic reviews-A simulation study","author":"Ferdinands","year":"2020"},{"key":"2024071119545468100_ref15","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1080\/14737167.2023.2234639","article-title":"Can artificial intelligence separate the wheat from the chaff in systematic reviews of health economic articles?","volume":"23","author":"Oude Wolcherink","year":"2023","journal-title":"Expert Review of Pharmacoeconomics & Outcomes Research"},{"key":"2024071119545468100_ref16","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/s10648-024-09862-5","article-title":"Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research","volume":"36","author":"Campos","year":"2024","journal-title":"Educational Psychology Review"},{"key":"2024071119545468100_ref17","volume-title":"The issue of reconstructing a database using search queries and its possible solution","author":"Neeleman","year":"2022"},{"key":"2024071119545468100_ref18","volume-title":"Artificial intelligence supports literature screening in medical guideline development: towards up-to-date medical guidelines","author":"Harmsen","year":"2021"},{"key":"2024071119545468100_ref19","doi-asserted-by":"crossref","DOI":"10.31234\/osf.io\/2w3rm","volume-title":"Large-Scale simulation study of active learning models for systematic reviews","author":"Teijema","year":"2023"},{"key":"2024071119545468100_ref20","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1109\/TPS.2012.2229298","article-title":"Efficient parallelization of a three-dimensional high-order particle-in-cell method for the simulation of a 170 GHz gyrotron resonator","volume":"41","author":"Neudorfer","year":"2012","journal-title":"IEEE Transactions on Plasma Science"},{"key":"2024071119545468100_ref21","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1016\/j.ces.2015.06.066","article-title":"On speeding up stochastic simulations by parallelization of random number generation","volume":"137","author":"Shu","year":"2015","journal-title":"Chemical Engineering Science"},{"key":"2024071119545468100_ref22","volume-title":"Highly parallel computing","author":"Gottlieb","year":"1989"},{"key":"2024071119545468100_ref23","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1145\/2157136.2157285","article-title":"Introducing parallelism and concurrency in the data structures course","volume-title":"Proceedings of the 43rd ACM Technical Symposium on Computer Science Education","author":"Grossman","year":"2012"},{"key":"2024071119545468100_ref24","first-page":"1","article-title":"Virtualization and containerization of application infrastructure: a comparison","volume":"21","author":"Scheepers","year":"2014","journal-title":"21st Twente Student Conference on IT"},{"issue":"5","key":"2024071119545468100_ref25","doi-asserted-by":"crossref","first-page":"e0177459","DOI":"10.1371\/journal.pone.0177459","article-title":"Singularity: Scientific containers for mobility of compute","volume":"12","author":"Kurtzer","year":"2017","journal-title":"PloS ONE"},{"issue":"239","key":"2024071119545468100_ref26","first-page":"2","article-title":"Docker: lightweight linux containers for consistent development and deployment","volume":"2014","author":"Merkel","year":"2014","journal-title":"Linux Journal"},{"key":"2024071119545468100_ref27","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/ISPRAS47671.2019.00016","article-title":"Kubernetes container orchestration as a framework for flexible and effective scientific data analysis","volume-title":"2019 Ivannikov Ispras Open Conference (ISPRAS)","author":"Tesliuk","year":"2019"},{"key":"2024071119545468100_ref28","volume-title":"Service-oriented architecture: concepts, technology, and design","author":"Erl","year":"2005"},{"key":"2024071119545468100_ref29","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1109\/MCC.2017.4250933","article-title":"Key characteristics of a container orchestration platform to enable a modern application","volume":"4","author":"Khan","year":"2017","journal-title":"IEEE Cloud Computing"},{"key":"2024071119545468100_ref30","volume-title":"LCE: The failure of operating systems and how we can fix it","author":"Kerrisk","year":"2012"},{"key":"2024071119545468100_ref31","volume-title":"ASReview LAB - A tool for AI-assisted systematic reviews","author":"ASReview LAB Developers","year":"2023"},{"key":"2024071119545468100_ref32","volume-title":"ASReview Makita: a workflow generator for simulation studies using the command line interface of ASReview LAB","author":"Teijema","year":"2023"},{"key":"2024071119545468100_ref33","first-page":"42","article-title":"Gnu parallel-the command-line power tool","volume":"36","author":"Tange","year":"2011","journal-title":"Usenix Mag"},{"key":"2024071119545468100_ref34","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1109\/12.21127","article-title":"Speedup versus efficiency in parallel systems","volume":"38","author":"Eager","year":"1989","journal-title":"IEEE Transactions on Computers"},{"key":"2024071119545468100_ref35","volume-title":"Topics in Parallel and Distributed Computing.","author":"Prasad","year":"2015"},{"key":"2024071119545468100_ref36","volume-title":"SYNERGY - Open machine learning dataset on study selection in systematic reviews","author":"De Bruin","year":"2023"},{"key":"2024071119545468100_ref37","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1080\/00273171.2017.1412293","article-title":"Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation","volume":"53","author":"Van De Schoot","year":"2018","journal-title":"Multivariate Behavioral Research"},{"key":"2024071119545468100_ref38","volume-title":"Language morphology in active learning aided systematic reviews","author":"Kroft","year":"2022"},{"key":"2024071119545468100_ref39","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1145\/192007.192027","article-title":"Impact of sharing-based thread placement on multithreaded architectures","volume":"22","author":"Thekkath","year":"1994","journal-title":"ACM SIGARCH Computer Architecture News"},{"key":"2024071119545468100_ref40","doi-asserted-by":"crossref","first-page":"919","DOI":"10.1109\/TPDS.2016.2603511","article-title":"A parallel random forest algorithm for big data in a spark cloud computing environment","volume":"28","author":"Chen","year":"2016","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"2024071119545468100_ref41","doi-asserted-by":"crossref","volume-title":"GPUs and the future of parallel computing","author":"Keckler","DOI":"10.1109\/MM.2011.89"},{"key":"2024071119545468100_ref42","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1002\/9781118305393.ch16","article-title":"Green cloud computing and environmental sustainability","volume-title":"Harnessing Green IT","author":"Kumar","year":"2012"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/6\/2\/320\/2458977\/dint_a_00244.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/dint\/article-pdf\/6\/2\/320\/2458977\/dint_a_00244.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:41:29Z","timestamp":1741938089000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,5,1]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00244","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}