{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T10:20:49Z","timestamp":1770891649513,"version":"3.50.1"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T00:00:00Z","timestamp":1708300800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T00:00:00Z","timestamp":1708300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Universit\u00e1 degli Studi di Cagliari, Dept. of Business and Economics","award":["1.005.14\/2019"],"award-info":[{"award-number":["1.005.14\/2019"]}]},{"DOI":"10.13039\/501100003407","name":"Ministero dell\u2019Istruzione, dell\u2019Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["PE00000018"],"award-info":[{"award-number":["PE00000018"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100013003","name":"Universit\u00e0 degli Studi di Cagliari","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100013003","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Stat"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Network-based Semi-Supervised Clustering (NeSSC) is a semi-supervised approach for clustering in the presence of an outcome variable. It uses a classification or regression model on resampled versions of the original data to produce a proximity matrix that indicates the magnitude of the similarity between pairs of observations measured with respect to the outcome. This matrix is transformed into a complex network on which a community detection algorithm is applied to search for underlying community structures which is a partition of the instances into highly homogeneous clusters to be evaluated in terms of the outcome. In this paper, we focus on the case the outcome variable to be used in NeSSC is numeric and propose an alternative selection criterion of the optimal partition based on a measure of overlapping between density curves as well as a penalization criterion which takes accounts for the number of clusters in a candidate partition. Next, we consider the performance of the proposed method for some artificial datasets and for 20 different real datasets and compare NeSSC with the other three popular methods of semi-supervised clustering with a numeric outcome. Results show that NeSSC with the overlapping criterion works particularly well when a reduced number of clusters are scattered localized.<\/jats:p>","DOI":"10.1007\/s00180-024-01457-6","type":"journal-article","created":{"date-parts":[[2024,2,19]],"date-time":"2024-02-19T18:02:00Z","timestamp":1708365720000},"page":"3831-3854","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Overlapping coefficient in network-based semi-supervised clustering"],"prefix":"10.1007","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2020-5129","authenticated-orcid":false,"given":"Claudio","family":"Conversano","sequence":"first","affiliation":[]},{"given":"Luca","family":"Frigau","sequence":"additional","affiliation":[]},{"given":"Giulia","family":"Contu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,19]]},"reference":[{"key":"1457_CR1","doi-asserted-by":"crossref","unstructured":"Aggarwal CC (2014) Data classification: algorithms and applications. CRC Press","DOI":"10.1201\/b17320"},{"issue":"1","key":"1457_CR2","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.patcog.2012.07.021","volume":"46","author":"O Arbelaitz","year":"2013","unstructured":"Arbelaitz O, Gurrutxaga I, Muguerza J, P\u00e9rez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recogn 46(1):243\u2013256","journal-title":"Pattern Recogn"},{"issue":"4","key":"1457_CR3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pbio.0020108","volume":"2","author":"E Bair","year":"2004","unstructured":"Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):e108","journal-title":"PLoS Biol"},{"issue":"473","key":"1457_CR4","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1198\/016214505000000628","volume":"101","author":"E Bair","year":"2006","unstructured":"Bair E, Hastie T, Paul D, Tibshirani R (2006) Prediction by supervised principal components. J Am Stat Assoc 101(473):119\u2013137","journal-title":"J Am Stat Assoc"},{"issue":"10","key":"1457_CR5","doi-asserted-by":"publisher","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","volume":"2008","author":"VD Blondel","year":"2008","unstructured":"Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):P10008","journal-title":"J Stat Mech: Theory Exp"},{"issue":"6","key":"1457_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v061.i06","volume":"61","author":"M Charrad","year":"2014","unstructured":"Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw 61(6):1\u201336","journal-title":"J Stat Softw"},{"key":"1457_CR8","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1016\/S0167-9473(99)00074-2","volume":"34","author":"TE Clemons","year":"2000","unstructured":"Clemons TE, Bradley EL Jr (2000) A nonparametric measure of the overlapping coefficient. Comput Stat Data Anal 34:51\u201361","journal-title":"Comput Stat Data Anal"},{"issue":"1","key":"1457_CR9","first-page":"108","volume":"12","author":"C Conversano","year":"2019","unstructured":"Conversano C, Contu G, Mola F (2019) Online promotion of UNESCO heritage sites in Southern Europe: website information content and managerial implications. Electron J Appl Stat Anal 12(1):108\u2013139","journal-title":"Electron J Appl Stat Anal"},{"issue":"8","key":"1457_CR10","doi-asserted-by":"publisher","first-page":"3510","DOI":"10.1109\/TNNLS.2020.3015200","volume":"32","author":"J de Jesus Rubio","year":"2021","unstructured":"de Jesus Rubio J (2021) Stability analysis of the modified Levenberg\u2013Marquardt algorithm for the artificial neural network training. IEEE Trans Neural Netw Learn Syst 32(8):3510\u20133524","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1457_CR11","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1016\/j.ins.2021.11.038","volume":"585","author":"J de Jesus Rubio","year":"2022","unstructured":"de Jesus Rubio J, Islas MA, Ochoa G, Cruz DR, Garcia E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inf Sci 585:89\u2013112","journal-title":"Inf Sci"},{"key":"1457_CR12","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1002\/asmb.2618","volume":"37","author":"L Frigau","year":"2021","unstructured":"Frigau L, Contu G, Mola F, Conversano C (2021) Network-based semi supervised clustering. Appl Stoch Model Bus Ind 37:182\u2013202","journal-title":"Appl Stoch Model Bus Ind"},{"key":"1457_CR13","unstructured":"Halkidi M, Vazirgiannis M, Hennig C (2015) Method-independent indices for cluster validation and estimating the number of clusters. In: Hennig C, Meila M, Murtagh F, Rocci R (eds) Handbook of cluster analysis. Chapman and Hall\/CRC, pp 595\u2013618"},{"issue":"1","key":"1457_CR14","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193\u2013218","journal-title":"J Classif"},{"key":"1457_CR15","doi-asserted-by":"publisher","first-page":"3851","DOI":"10.1080\/03610928908830127","volume":"18","author":"HF Inman","year":"1989","unstructured":"Inman HF, Bradley EL Jr (1989) The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Commun Stat: Theory Methods 18:3851\u20133874","journal-title":"Commun Stat: Theory Methods"},{"issue":"20","key":"1457_CR16","doi-asserted-by":"publisher","first-page":"2578","DOI":"10.1093\/bioinformatics\/btq470","volume":"26","author":"DC Koestler","year":"2010","unstructured":"Koestler DC, Marsit CJ, Christensen BC et al (2010) Semi-supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics 26(20):2578\u20132585","journal-title":"Bioinformatics"},{"issue":"382","key":"1457_CR18","first-page":"427","volume":"78","author":"CR Mehta","year":"1983","unstructured":"Mehta CR, Patel NR (1983) A network algorithm for performing Fisher\u2019s exact test in r$$\\times$$ c contingency tables. J Am Stat Assoc 78(382):427\u2013434","journal-title":"J Am Stat Assoc"},{"key":"1457_CR19","doi-asserted-by":"publisher","first-page":"1089","DOI":"10.3389\/fpsyg.2019.01089","volume":"10","author":"M Pastore","year":"2019","unstructured":"Pastore M, Calcagni A (2019) Measuring distribution similarities between samples: a distribution-free overlapping index. Front Psychol 10:1089","journal-title":"Front Psychol"},{"key":"1457_CR20","series-title":"Lecture Notes in Computer Science","volume-title":"Computer and information sciences\u2013ISCIS 2005","author":"P Pons","year":"2005","unstructured":"Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: Yolum P, Gungor T, Gurgen F, Ozturan C (eds) Computer and information sciences\u2013ISCIS 2005, vol 3733. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg"},{"issue":"1","key":"1457_CR21","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1002\/jae.1026","volume":"24","author":"G Porro","year":"2009","unstructured":"Porro G, Iacus SM (2009) Random Recursive Partitioning: a matching method for the estimation of the average treatment effect. J Appl Economet 24(1):163\u2013165","journal-title":"J Appl Economet"},{"key":"1457_CR22","unstructured":"R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https:\/\/www.R-project.org"},{"issue":"3","key":"1457_CR23","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.76.036106","volume":"76","author":"UN Raghavan","year":"2007","unstructured":"Raghavan UN, R\u00e9ka A, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106","journal-title":"Phys Rev E"},{"key":"1457_CR24","doi-asserted-by":"publisher","first-page":"1583","DOI":"10.1016\/j.csda.2005.01.014","volume":"50","author":"F Schmid","year":"2006","unstructured":"Schmid F, Schmidt A (2006) Nonparametric estimation of the coefficient of overlapping\u2014theory and empirical application. Comput Stat Data Anal 50:1583\u20131596","journal-title":"Comput Stat Data Anal"},{"issue":"2","key":"1457_CR25","doi-asserted-by":"publisher","first-page":"99","DOI":"10.2307\/3001913","volume":"5","author":"JW Tukey","year":"1949","unstructured":"Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics 5(2):99\u2013114","journal-title":"Biometrics"},{"key":"1457_CR26","unstructured":"Van Mechelen I, Boulesteix AL, Dangl R, Dean N, Guyon I, Hennig C, Leisch F, Steinley D (2018) Benchmarking in cluster analysis: a white paper. arXiv preprint arXiv:1809.10496"},{"key":"1457_CR27","unstructured":"Yee TW (2019) VGAM: vector generalized linear and additive models. R package version 1.1-2. https:\/\/CRAN.R-project.org\/package=VGAM"},{"key":"1457_CR28","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/S0167-9473(03)00030-6","volume":"44","author":"A Zeileis","year":"2003","unstructured":"Zeileis A, Kleiber C, Kr\u00e4mer W, Hornik K (2003) Testing and dating of structural changes in practice. Comput Stat Data Anal 44:109\u2013123","journal-title":"Comput Stat Data Anal"}],"container-title":["Computational Statistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-024-01457-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00180-024-01457-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-024-01457-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T08:23:45Z","timestamp":1732091025000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00180-024-01457-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,19]]},"references-count":26,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["1457"],"URL":"https:\/\/doi.org\/10.1007\/s00180-024-01457-6","relation":{},"ISSN":["0943-4062","1613-9658"],"issn-type":[{"value":"0943-4062","type":"print"},{"value":"1613-9658","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,19]]},"assertion":[{"value":"3 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}