{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T01:57:44Z","timestamp":1767664664014,"version":"3.37.3"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,3,5]],"date-time":"2020-03-05T00:00:00Z","timestamp":1583366400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,3,5]],"date-time":"2020-03-05T00:00:00Z","timestamp":1583366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Fondi di Ateneo UNiversita di Firenze","award":["ricatena2019"],"award-info":[{"award-number":["ricatena2019"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The analysis of big data is a fundamental challenge for the current and future stream of data coming from many different sources. Geospatial data is one of the sources currently less investigated. A typical example of always increasing data set is that produced by the distribution data of invasive species on the concerned territories. The dataset of <jats:italic>Drosophila suzuki<\/jats:italic> invasion sites in Europe up to 2011 was used to test a possible method to pinpoint its outliers (anomalies). Our aim was to find a method of analysis that would be able to treat large amount of data in order to produce easily readable outputs to summarize and predict the status and, possibly, the future development of a biological invasion. To do that, we aimed to identify the so called anomalies of the dataset, identified with a Python script based on the machine learning algorithm \u201cIsolation Forest\u201d. We used also the K-Means clustering method to partition the dataset. In our test, based on a real dataset, the Silhouette method yielded a number of clusters of 10 as the best result. The clusters were drawn on the map with a Voronoi tessellation, showing that 8 clusters were centered on industrial harbours, while the last two were in the hinterland. This fact led us to guess that: (1) the main entrance mechanisms in Europe may be the wares import fluxes through ports, occurring apparently several times; (2) the spreading into the inland may be due to road transportation of wares; (3) the outliers (anomalies) found with the isolation forest method would identify individuals or populations that tend to detach from their original cluster and hence represent indications about the lines of further spreading of the invasion. This type of analysis aims hence to identify the future direction of an invasion, rather than the center of origin as in the case of geographic profiling. Isolation Forest provides therefore complimentary results with respect to PGP. The recent records of the invasive species, mainly localized close to the outliers position, are an indication that the isolation forest method can be considered predictive and proved to be a useful method to treat large datasets of geospatial data. <\/jats:p>","DOI":"10.1186\/s40537-020-00288-8","type":"journal-article","created":{"date-parts":[[2020,3,5]],"date-time":"2020-03-05T10:03:31Z","timestamp":1583402611000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Tracing outliers in the dataset of Drosophila suzukii records with the Isolation Forest method"],"prefix":"10.1186","volume":"7","author":[{"given":"Ugo","family":"Santosuosso","sequence":"first","affiliation":[]},{"given":"Alessandro","family":"Cini","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7904-0336","authenticated-orcid":false,"given":"Alessio","family":"Papini","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,3,5]]},"reference":[{"key":"288_CR1","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1007\/s10340-015-0681-z","volume":"88","author":"MK Asplen","year":"2015","unstructured":"Asplen MK, Anfora G, Biondi A, et al. Invasion biology of spotted wing Drosophila (Drosophila suzukii): a global perspective and future priorities. J Pest Sci. 2015;88:469\u201394.","journal-title":"J Pest Sci"},{"issue":"3","key":"288_CR2","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1145\/116873.116880","volume":"23","author":"F Aurenhammer","year":"1991","unstructured":"Aurenhammer F. Voronoi diagrams\u2014a survey of a fundamental geometric data structure. ACM Comput Surv. 1991;23(3):345\u2013405.","journal-title":"ACM Comput Surv"},{"key":"288_CR3","doi-asserted-by":"publisher","DOI":"10.1080\/14498596.2019.1642249","author":"DT Aygin","year":"2019","unstructured":"Aygin DT, Cox LA, Faulkner SC, Stevens MCA, Verity R, Le Comber SC. Double cross: geographic profiling of V-2 impact sites. J Spat Sci. 2019. https:\/\/doi.org\/10.1080\/14498596.2019.1642249.","journal-title":"J Spat Sci"},{"key":"288_CR4","first-page":"5","volume":"13","author":"M Bolda","year":"2010","unstructured":"Bolda M, Goodhue RE, Zalom FG. Spotted wing Drosophila: potential economic impact of a newly established pest. Agric Res Econ Updat. 2010;13:5\u20138.","journal-title":"Agric Res Econ Updat"},{"key":"288_CR5","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1016\/j.diin.2018.12.001","volume":"28","author":"A Butkovic","year":"2019","unstructured":"Butkovic A, Mrdovic S, Uludag S, Tanovic A. Geographic profiling for serial cybercrime investigation. Digit Invest. 2019;28:176\u201382.","journal-title":"Digit Invest"},{"key":"288_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/s10530-019-02115-5","author":"J Cerri","year":"2019","unstructured":"Cerri J, Mori E, Zozzoli R, Gigliotti A, Chirco A, Bertolino S. Managing invasive Siberian chipmunks Eutamias sibiricus in Italy: a matter of attitudes and risk of dispersal. Biol Invasions. 2019. https:\/\/doi.org\/10.1007\/s10530-019-02115-5.","journal-title":"Biol Invasions"},{"key":"288_CR7","doi-asserted-by":"publisher","unstructured":"Cheng Z, Zou C, Dong J. Outlier detection using isolation forest and local outlier factor. Proceedings of the Conference on Research in Adaptive and Convergent Systems. 2019; 161\u2013168. Chongqing, China \u2014 September 24\u201327, 2019. ACM New York, NY, USA. ISBN: 978-1-4503-6843-8 https:\/\/doi.org\/10.1145\/3338840.3355641.","DOI":"10.1145\/3338840.3355641"},{"issue":"4","key":"288_CR8","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1007\/s10340-014-0617-z","volume":"87","author":"A Cini","year":"2014","unstructured":"Cini A, Anfora G, Escudero-Colomar LA, Grassi A, Santosuosso U, Seljak G, Papini A. Tracking the invasion of the alien fruit pest Drosophila suzukii in Europe. J Pest Sci. 2014;87(4):559\u201366.","journal-title":"J Pest Sci"},{"key":"288_CR9","first-page":"149","volume":"65","author":"A Cini","year":"2012","unstructured":"Cini A, Ioriatti C, Anfora G. A review of the invasion of Drosophila suzukii in Europe and a draft research agenda for integrated pest management. B Insectol. 2012;65:149\u201360.","journal-title":"B Insectol"},{"issue":"1","key":"288_CR10","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.rbe.2018.11.005","volume":"63","author":"A Cini","year":"2019","unstructured":"Cini A, Santosuosso U, Papini A. Uncovering the spatial pattern of invasion of the honeybee pest small hive beetle, Aethina tumida in Italy. Rev Bras Entomol. 2019;63(1):12\u20137.","journal-title":"Rev Bras Entomol"},{"key":"288_CR11","first-page":"317","volume":"91","author":"G De Ros","year":"2013","unstructured":"De Ros G, Anfora G, Grassi A, Ioriatti C. The potential economic impact of Drosophila suzukii on small fruits production in Trentino (Italy). IOBC-WPRS Bul. 2013;91:317\u201321.","journal-title":"IOBC-WPRS Bul."},{"key":"288_CR12","first-page":"28","volume":"128","author":"L Delbac","year":"2017","unstructured":"Delbac L, Rouzes R, Rusch A, Thiery D. Geographical area extension of Drosophila suzukii (Diptera: Drosophilidae) in Bordeaux vineyards. Integr Prot Prod Viticulture IOBC\u2013WPRS Bull. 2017;128:28\u201336.","journal-title":"Integr Prot Prod Viticulture IOBC\u2013WPRS Bull"},{"key":"288_CR13","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1186\/s40537-019-0275-3","volume":"6","author":"SA Dheyab","year":"2019","unstructured":"Dheyab SA, Abdullah MN, Abed BF. A novel approach for big data processing using message passing interface based on memory mapping. J Big Data. 2019;6:112. https:\/\/doi.org\/10.1186\/s40537-019-0275-3.","journal-title":"J Big Data"},{"key":"288_CR14","unstructured":"Faulkner S. Integrating GIS approaches with geographic profiling as a novel conservation tool. PhD thesis, Queen Mary University, London, 2018. https:\/\/qmro.qmul.ac.uk\/xmlui\/handle\/123456789\/46763."},{"issue":"1","key":"288_CR15","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1093\/jee\/toy321","volume":"112","author":"P Ferronato","year":"2018","unstructured":"Ferronato P, Woch AL, Soares PL, Bernardi D, Botton M, Andreazza F, Oliveira E, Corr\u00eaa AS. A phylogeographic approach to the Drosophila suzukii (Diptera: Drosophilidae) invasion in Brazil. J Econ Entomol. 2018;112(1):425\u201333.","journal-title":"J Econ Entomol"},{"issue":"7","key":"288_CR16","doi-asserted-by":"publisher","first-page":"938","DOI":"10.1016\/j.ejmp.2016.06.007","volume":"32","author":"A Gnerucci","year":"2016","unstructured":"Gnerucci A, Romano G, Ratto F, Fusi F. Statistical detection of nanoparticles in cells by darkfield microscopy. Physica Med. 2016;32(7):938\u201343.","journal-title":"Physica Med"},{"key":"288_CR17","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1109\/MCSE.2007.55","volume":"9","author":"JD Hunter","year":"2007","unstructured":"Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90\u20135.","journal-title":"Comput Sci Eng"},{"issue":"8","key":"288_CR18","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","volume":"31","author":"AK Jain","year":"2010","unstructured":"Jain AK. Data clustering: 50\u00a0years beyond K-Means. Pattern Recogn Lett. 2010;31(8):651\u201366.","journal-title":"Pattern Recogn Lett"},{"key":"288_CR19","volume-title":"Algorithms for Clustering Data","author":"AK Jain","year":"1988","unstructured":"Jain AK, Dubes RC. Algorithms for Clustering Data. New Jersey: Prentice Hall; 1988."},{"issue":"3","key":"288_CR20","doi-asserted-by":"publisher","first-page":"987","DOI":"10.1653\/024.098.0332","volume":"98","author":"R Lasa","year":"2015","unstructured":"Lasa R, Tadeo E. Invasive drosophilid pests Drosophila suzukii and Zaprionus indianus (Diptera: Drosophilidae) in Veracruz, Mexico. Florida Entomol. 2015;98(3):987\u20139.","journal-title":"Florida Entomol"},{"key":"288_CR21","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1016\/j.jtbi.2005.09.012","volume":"240","author":"SC Le Comber","year":"2006","unstructured":"Le Comber SC, Nicholls B, Rossmo DK, Racey PA. Geographic profiling and animal foraging. J Theor Biol. 2006;240:233\u201340.","journal-title":"J Theor Biol"},{"issue":"1","key":"288_CR22","first-page":"3","volume":"6","author":"FT Liu","year":"2012","unstructured":"Liu FT, Ting KM, Zhou ZH. Isolation-based anomaly detection. ACM Trans Knowl Discov Data (TKDD). 2012;6(1):3.","journal-title":"ACM Trans Knowl Discov Data (TKDD)"},{"key":"288_CR23","doi-asserted-by":"crossref","unstructured":"Liu FT, Ting KM, Zhou ZH. Isolation forests. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), 2008. pp. 413\u2013422.","DOI":"10.1109\/ICDM.2008.17"},{"key":"288_CR24","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1111\/j.1469-7998.2009.00586.x","volume":"279","author":"RA Martin","year":"2009","unstructured":"Martin RA, Rossmo DK, Hammerschlag N. Hunting patterns and geographic profiling of white shark predation. J Zool. 2009;279:111\u20138.","journal-title":"J Zool"},{"key":"288_CR25","doi-asserted-by":"publisher","first-page":"1613","DOI":"10.1007\/s10530-012-0396-5","volume":"15","author":"A Papini","year":"2013","unstructured":"Papini A, Mosti S, Santosuosso U. Tracking the origin of the invading Caulerpa (Caulerpales, Chlorophyta) with geographic profiling, a criminological technique for a killer alga. Biol Invasions. 2013;15:1613\u201321.","journal-title":"Biol Invasions"},{"key":"288_CR26","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.ecoinf.2017.02.001","volume":"38","author":"A Papini","year":"2017","unstructured":"Papini A, Rossmo DK, Le Comber SC, Verity R, Stevenson MD, Santosuosso U. The use of jackknifing for the evaluation of geographic profiling reliability. Ecol Inform. 2017;38:76\u201381.","journal-title":"Ecol Inform"},{"issue":"1","key":"288_CR27","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1016\/j.bjid.2016.09.010","volume":"21","author":"A Papini","year":"2016","unstructured":"Papini A, Santosuosso U. Snow\u2019s case revisited: new tool in geographic profiling of epidemiology. Braz J Infect Dis. 2016;21(1):112\u20135.","journal-title":"Braz J Infect Dis"},{"issue":"12","key":"288_CR28","doi-asserted-by":"publisher","first-page":"0190237","DOI":"10.1371\/journal.pone.0190237","volume":"12","author":"A Papini","year":"2017","unstructured":"Papini A, Signorini MA, Foggi B, Della Giovampaola E, Ongaro L, Vivona L, Santosuosso U, Tani C, Bruschi P. History vs. legend: retracing invasion and spread of Oxalis pes-caprae L. in Europe and the Mediterranean area. PLoS ONE. 2017;12(12):0190237.","journal-title":"PLoS ONE"},{"key":"288_CR29","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"288_CR30","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1098\/rsif.2008.0242","volume":"6","author":"NE Raine","year":"2009","unstructured":"Raine NE, Rossmo DK, Le Comber SC. Geographic profiling applied to testing models of bumble-bee foraging. J R Soc Interface. 2009;6:307\u201319.","journal-title":"J R Soc Interface"},{"key":"288_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/BF02885950","volume":"172","author":"DK Rossmo","year":"1993","unstructured":"Rossmo DK. A methodological model. Am J Crim Justice. 1993;172:1\u201321.","journal-title":"Am J Crim Justice"},{"key":"288_CR32","volume-title":"Geographic profiling","author":"DK Rossmo","year":"2000","unstructured":"Rossmo DK. Geographic profiling. Boca Raton: CRC Press; 2000."},{"issue":"1","key":"288_CR33","doi-asserted-by":"publisher","first-page":"R8","DOI":"10.1016\/j.cub.2012.11.021","volume":"23","author":"O Rota-Stabelli","year":"2013","unstructured":"Rota-Stabelli O, Blaxter M, Anfora G. Quick guide: Drosophila suzukii. Curr Biol. 2013;23(1):R8.","journal-title":"Curr Biol"},{"key":"288_CR34","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput Appl Math. 1987;20:53\u201365.","journal-title":"Comput Appl Math"},{"issue":"8","key":"288_CR35","doi-asserted-by":"publisher","first-page":"2037","DOI":"10.1007\/s13762-016-1032-1","volume":"13","author":"U Santosuosso","year":"2016","unstructured":"Santosuosso U, Papini A. Methods for geographic profiling of biological invasions with multiple origin sites. Int J Environ Sci Technol. 2016;13(8):2037\u201344.","journal-title":"Int J Environ Sci Technol"},{"issue":"4","key":"288_CR36","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1134\/S1067413618040112","volume":"49","author":"U Santosuosso","year":"2018","unstructured":"Santosuosso U, Papini A. Geo-profiling: beyond the current limits. A preliminary study of mathematical methods to improve the monitoring of invasive species. Russ J Immunol Ecol. 2018;49(4):362\u201370.","journal-title":"Russ J Immunol Ecol"},{"issue":"1","key":"288_CR37","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1198\/106186006X94072","volume":"15","author":"T Shi","year":"2006","unstructured":"Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat. 2006;15(1):118\u201338.","journal-title":"J Comput Graph Stat"},{"key":"288_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.1600-0587.2011.07292.x","volume":"35","author":"MD Stevenson","year":"2012","unstructured":"Stevenson MD, Rossmo DK, Knell RJ, Le Comber SC. Geographic profiling as a novel spatial tool for targeting the control of invasive species. Ecography. 2012;35:1\u201312.","journal-title":"Ecography"},{"key":"288_CR39","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1016\/j.jtbi.2010.04.010","volume":"265","author":"Y Suzuki-Ohno","year":"2010","unstructured":"Suzuki-Ohno Y, Inoue MN, Ohno K. Applying geographic profiling used in the field of criminology for predicting the nest locations of bumble bees. J Theor Biol. 2010;265:211\u20137.","journal-title":"J Theor Biol"},{"issue":"2","key":"288_CR40","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1089\/big.2012.0002","volume":"1","author":"M Swan","year":"2013","unstructured":"Swan M. The quantified self: fundamental disruption in big data science and biological discovery. Big Data. 2013;1(2):85\u201399.","journal-title":"Big Data"},{"key":"288_CR41","unstructured":"Tian P, Che D. GI-IsolationForest: Genomic Island Discovery Using Isolation Forest Algorithm Internatonal Conf. Bioinformatics and Computational Biology| BIOCOMP\u201918 17\u201323 ISBN: 1-60132-471-5, CSREA Press, 2018. https:\/\/csce.ucmss.com\/cr\/books\/2018\/LFS\/CSREA2018\/BIC4116.pdf."},{"key":"288_CR42","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1890\/080083","volume":"8","author":"M Vil\u00e0","year":"2010","unstructured":"Vil\u00e0 M, Basnou C, Pysek P, Josefsson M, Genovesi P, Gollasch S, Nentwig W, Olenin S, Roques A, Roy D, Hulme PE, DAISIE partners. How well do we understand the impacts of alien species on ecosystem services? A pan-European, cross-taxa assessment. Front Ecol Environ. 2010;8:135\u201344.","journal-title":"Front Ecol Environ"},{"key":"288_CR43","doi-asserted-by":"publisher","first-page":"702","DOI":"10.1111\/j.1461-0248.2011.01628.x","volume":"14","author":"M Vil\u00e0","year":"2011","unstructured":"Vil\u00e0 M, Espinar JL, Hejda M, Hulme PE, Jarosik V, Maron JL, Pergl J, Schaffner U, Sun Y, Py\u0161ek P. Ecological impacts of invasive alien plants: a meta-analysis of their effects on species, communities and ecosystems. Ecol Lett. 2011;14:702\u20138.","journal-title":"Ecol Lett"},{"key":"288_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1603\/IPM10010","volume":"1","author":"DB Walsh","year":"2011","unstructured":"Walsh DB, Bolda MP, Goodhue RE, Dreeves AJ, Lee JC, Bruck DJ, Walton VM, O\u2019Neal SD, Zalom FG. Drosophila suzukii (Diptera: Drosophilidae): Invasive pest of ripening soft fruit expanding its geographic range and damage potential. J Integr Pest Manag. 2011;1:1\u20137.","journal-title":"J Integr Pest Manag"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00288-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-020-00288-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00288-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,5]],"date-time":"2021-03-05T00:12:41Z","timestamp":1614903161000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00288-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,5]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["288"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00288-8","relation":{},"ISSN":["2196-1115"],"issn-type":[{"type":"electronic","value":"2196-1115"}],"subject":[],"published":{"date-parts":[[2020,3,5]]},"assertion":[{"value":"20 October 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 February 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 March 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"14"}}