{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T12:59:48Z","timestamp":1770814788929,"version":"3.50.1"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,3,14]],"date-time":"2024-03-14T00:00:00Z","timestamp":1710374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF CCF award","award":["1908762"],"award-info":[{"award-number":["1908762"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>When data is scarce, software analytics can make many mistakes. For example, consider learning predictors for open source project health (e.g., the number of closed pull requests in 12 months time). The training data for this task may be very small (e.g., 5 years of data, collected every month means just 60 rows of training data). The models generated from such tiny datasets can make many prediction errors.<\/jats:p>\n          <jats:p>\n            Those errors can be tamed by a\n            <jats:italic>landscape analysis<\/jats:italic>\n            that selects better learner control parameters. Our\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            tool (a)\u00a0clusters the data to find the general landscape of the hyperparameters, then (b)\u00a0explores a few representatives from each part of that landscape.\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            is both faster and more effective than prior state-of-the-art hyperparameter optimization algorithms (e.g., FLASH, HYPEROPT, OPTUNA).\n          <\/jats:p>\n          <jats:p>\n            The configurations found by\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            have far less error than other methods. For example, for project health indicators such as\n            <jats:italic>C<\/jats:italic>\n            = number of commits,\n            <jats:italic>I<\/jats:italic>\n            = number of closed issues, and\n            <jats:italic>R<\/jats:italic>\n            = number of closed pull requests,\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            \u2019s 12-month prediction errors are {I=0%, R=33%\u00a0C=47%}, whereas other methods have far larger errors of {I=61%,R=119%\u00a0C=149%}. We conjecture that\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            works so well since it finds the most informative regions of the hyperparameters, then jumps to those regions. Other methods (that do not reflect over the landscape) can waste time exploring less informative options.\n          <\/jats:p>\n          <jats:p>\n            Based on the preceding, we recommend landscape analytics (e.g.,\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            ) especially when learning from very small datasets. This article only explores the application of\n            <jats:sans-serif>niSNEAK<\/jats:sans-serif>\n            to project health. That said, we see nothing in principle that prevents the application of this technique to a wider range of problems.\n          <\/jats:p>\n          <jats:p>To assist other researchers in repeating, improving, or even refuting our results, all our scripts and data are available on GitHub at https:\/\/github.com\/zxcv123456qwe\/niSneak.<\/jats:p>","DOI":"10.1145\/3630252","type":"journal-article","created":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T11:44:18Z","timestamp":1699530258000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project Health"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1202-3130","authenticated-orcid":false,"given":"Andre","family":"Lustosa","sequence":"first","affiliation":[{"name":"North Carolina State University, Raleigh, NC, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5040-3196","authenticated-orcid":false,"given":"Tim","family":"Menzies","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,3,14]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2945020"},{"key":"e_1_3_2_3_2","first-page":"1050","volume-title":"Proceedings of the 2018 IEEE\/ACM 40th International Conference on Software Engineering (ICSE \u201918)","author":"Agrawal Amritanshu","year":"2018","unstructured":"Amritanshu Agrawal and Tim Menzies. 2018. Is \u201cbetter data\u201d better than \u201cbetter data miners\u201d? In Proceedings of the 2018 IEEE\/ACM 40th International Conference on Software Engineering (ICSE \u201918). IEEE, Los Alamitos, CA, 1050\u20131061."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09808-9"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2021.3073242"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330701"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985795"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CSIT.2018.8486222"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2918536"},{"key":"e_1_3_2_10_2","first-page":"242","volume-title":"Artificial Evolution","author":"Belaidouni Meriema","year":"1999","unstructured":"Meriema Belaidouni and Jin-Kao Hao. 1999. Landscapes and the maximal constraint satisfaction problem. In Artificial Evolution. Lecture Notes in Computer Science, Vol. 1829. Springer, 242\u2013253."},{"key":"e_1_3_2_11_2","article-title":"Algorithms for hyper-parameter optimization","volume":"24","author":"Bergstra James","year":"2011","unstructured":"James Bergstra, R\u00e9mi Bardenet, Yoshua Bengio, and Bal\u00e1zs K\u00e9gl. 2011. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 24 (2011), 2546\u20132554.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"2","key":"e_1_3_2_12_2","article-title":"Random search for hyper-parameter optimization.","volume":"13","author":"Bergstra James","year":"2012","unstructured":"James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 2 (2012), 281\u2013305.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1088\/1749-4699\/8\/1\/014008"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.02.113"},{"key":"e_1_3_2_15_2","first-page":"108","volume-title":"Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning","author":"Buitinck Lars","year":"2013","unstructured":"Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake VanderPlas, Arnaud Joly, Brian Holt, and Ga\u00ebl Varoquaux. 2013. API design for machine learning software: Experiences from the scikit-learn project. In Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning. 108\u2013122."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/9780262033589.001.0001"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2627508.2627515"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2790925"},{"key":"e_1_3_2_20_2","article-title":"Calibrating software cost models using Bayesian analysis","volume":"573583","author":"Chulani Sunita","year":"1999","unstructured":"Sunita Chulani, Barry Boehm, and Bert Steece. 1999. Calibrating software cost models using Bayesian analysis. IEEE Transactions on Software Engineering 573583 (1999), 1\u201311.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2006.152"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248548"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/223784.223812"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177731944"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2016.04.017"},{"key":"e_1_3_2_26_2","article-title":"Why is differential evolution better than grid search for tuning defect predictors?","author":"Fu Wei","year":"2016","unstructured":"Wei Fu, Vivek Nair, and Tim Menzies. 2016. Why is differential evolution better than grid search for tuning defect predictors? arXiv preprint arXiv:1609.02613 (2016).","journal-title":"arXiv preprint arXiv:1609.02613"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2019.00013"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2017.2748129"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2790413"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1038\/scientificamerican0792-66"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2014.04.006"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2018.03.010"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2901739.2901751"},{"key":"e_1_3_2_34_2","unstructured":"Georg J. P. Link and Matt Germonprez. 2018. Assessing open source project health. In Proceedings of the 2018 24th Americas Conference on Information Systems."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10796-016-9724-0"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524842.3527934"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.3390\/a14020040"},{"key":"e_1_3_2_38_2","first-page":"33","volume-title":"Proceedings of the 5th International Workshop on Software Ecosystems (IWSECO \u201913)","author":"Manikas Konstantinos","year":"2013","unstructured":"Konstantinos Manikas and Klaus Marius Hansen. 2013. Reviewing the health of software ecosystems\u2014A conceptual framework proposal. In Proceedings of the 5th International Workshop on Software Ecosystems (IWSECO \u201913). 33\u201344."},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"482","DOI":"10.1007\/978-3-030-30241-2_41","volume-title":"Progress in Artificial Intelligence","author":"Mashlakov Aleksei","year":"2019","unstructured":"Aleksei Mashlakov, Ville Tikka, Lasse Lensu, Aleksei Romanenko, and Samuli Honkapuro. 2019. Hyper-parameter optimization of multi-attention recurrent neural network for battery state-of-charge forecasting. In Progress in Artificial Intelligence. Lecture Notes in Computer Science, Vol. 11804. Springer, 482\u2013494."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2001576.2001690"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2019.01.042"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2870895"},{"key":"e_1_3_2_43_2","volume-title":"Distribution-Free Multiple Comparisons.","author":"Nemenyi Peter Bjorn","year":"1963","unstructured":"Peter Bjorn Nemenyi. 1963. Distribution-Free Multiple Comparisons.Princeton University."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.116217"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/1389095.1389204"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377930.3389817"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2017.07.015"},{"key":"e_1_3_2_48_2","article-title":"Software effort estimation using radial basis and generalized regression neural networks","author":"Reddy P. V. G. D. Prasad","year":"2010","unstructured":"P. V. G. D. Prasad Reddy, K. R. Sudha, P. Rama Sree, and S. N. S. V. S. C. Ramesh. 2010. Software effort estimation using radial basis and generalized regression neural networks. arXiv preprint arXiv:1005.4021 (2010).","journal-title":"arXiv preprint arXiv:1005.4021"},{"key":"e_1_3_2_49_2","article-title":"Development effort estimation in free\/open source software from activity in version control systems","author":"Robles Gregorio","year":"2022","unstructured":"Gregorio Robles, Andrea Capiluppi, Jesus M. Gonzalez-Barahona, Bjorn Lundell, and Jonas Gamalielsson. 2022. Development effort estimation in free\/open source software from activity in version control systems. arXiv preprint arXiv:2203.09898 (2022).","journal-title":"arXiv preprint arXiv:2203.09898"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3084226.3084243"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884830"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.5555\/2486788.2486853"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2011.12.008"},{"key":"e_1_3_2_54_2","first-page":"448","volume-title":"Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE \u201921)","author":"Shrikanth N. C.","year":"2021","unstructured":"N. C. Shrikanth, Suvodeep Majumder, and Tim Menzies. 2021. Early life cycle software defect prediction. Why? How? In Proceedings of the 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE \u201921). IEEE, Los Alamitos, CA, 448\u2013459."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008202821328"},{"key":"e_1_3_2_56_2","unstructured":"Huy Tu and Tim Menzies. 2021. FRUGAL: Unlocking SSL for software analytics. arXiv:2108.09847 [cs.SE] (2021)."},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-022-10121-w"},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"H. Tu and T. Menzies. 2021. FRUGAL: Unlocking semi-supervised learning for software analytics. In Proceedings of the 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE \u201921). 394\u2013406.","DOI":"10.1109\/ASE51524.2021.9678617"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR52588.2021.00013"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-019-05855-6"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.5555\/1762545.1762608"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1108\/17440080710829252"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2014.55"},{"key":"e_1_3_2_64_2","first-page":"231","volume-title":"Proceedings of the International Conference on Genetic Algorithms (ICGA \u201995)","author":"Whitley L. Darrell","year":"1995","unstructured":"L. Darrell Whitley, Keith E. Mathias, and Larry D. Pyeatt. 1995. Hyperplane ranking in simple genetic algorithms. In Proceedings of the International Conference on Genetic Algorithms (ICGA \u201995). 231\u2013238."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29044-2"},{"key":"e_1_3_2_66_2","first-page":"209","article-title":"The roles of mutation, inbreeding, crossbreeding and selection in evolution","volume":"8","author":"Wright S.","year":"1932","unstructured":"S. Wright. 1932. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the XI International Congress of Genetics 8 (1932), 209\u2013222.","journal-title":"Proceedings of the XI International Congress of Genetics"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-59904-210-7.ch011"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-022-10171-0"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Tianpei Xia Wei Fu Rui Shu and Tim Menzies. 2022. Predicting health indicators for open source projects (using hyperparameter optimization). Empirical Software Engineering 27 6 (2022) 122. https:\/\/arxiv.org\/pdf\/2006.07240.pdf","DOI":"10.1007\/s10664-022-10171-0"},{"key":"e_1_3_2_70_2","article-title":"Hyperparameter optimization for effort estimation","author":"Xia Tianpei","year":"2018","unstructured":"Tianpei Xia, Rahul Krishna, Jianfeng Chen, George Mathew, Xipeng Shen, and Tim Menzies. 2018. Hyperparameter optimization for effort estimation. arXiv preprint arXiv:1805.00336 (2018).","journal-title":"arXiv preprint arXiv:1805.00336"},{"key":"e_1_3_2_71_2","article-title":"Sequential model optimization for software effort estimation","author":"Xia Tianpei","year":"2020","unstructured":"Tianpei Xia, Rui Shu, Xipeng Shen, and Tim Menzies. 2020. Sequential model optimization for software effort estimation. IEEE Transactions on Software Engineering 48, 6 (2020), 1994\u20132009.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2021.3079841"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3031401"},{"key":"e_1_3_2_74_2","first-page":"117","volume-title":"Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling","author":"Zhou Zhiguo","year":"2022","unstructured":"Zhiguo Zhou, Meijuan Zhou, Zhilong Wang, and Xi Chen. 2022. Predicting treatment outcome in metastatic melanoma through automated multi-objective model with hyperparameter optimization. In Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling, Vol. 12034. SPIE, 117\u2013121."},{"key":"e_1_3_2_75_2","first-page":"95","volume-title":"Evolutionary Methods for Design, Optimisation, and Control","author":"Zitzler Eckart","year":"2002","unstructured":"Eckart Zitzler, Marco Laumanns, and Lothar Thiele. 2002. SPEA2: Improving the strength Pareto evolutionary algorithm for multiobjective optimization. In Evolutionary Methods for Design, Optimisation, and Control. CIMNE, Barcelona, Spain, 95\u2013100."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630252","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3630252","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:45:52Z","timestamp":1750178752000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630252"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,14]]},"references-count":74,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3630252"],"URL":"https:\/\/doi.org\/10.1145\/3630252","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,14]]},"assertion":[{"value":"2022-12-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-09","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}