{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T03:51:42Z","timestamp":1761709902002,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"24","license":[{"start":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T00:00:00Z","timestamp":1630540800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T00:00:00Z","timestamp":1630540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011596","name":"Conselleria d\u2019Educaci\u00f3, Investigaci\u00f3, Cultura i Esport","doi-asserted-by":"publisher","award":["ACIF\/2019\/042","APOSTD\/2020\/256"],"award-info":[{"award-number":["ACIF\/2019\/042","APOSTD\/2020\/256"]}],"id":[{"id":"10.13039\/501100011596","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010198","name":"Ministerio de Econom\u00eda, Industria y Competitividad, Gobierno de Espa\u00f1a","doi-asserted-by":"publisher","award":["TIN2017-86576-R"],"award-info":[{"award-number":["TIN2017-86576-R"]}],"id":[{"id":"10.13039\/501100010198","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009092","name":"universidad de alicante","doi-asserted-by":"publisher","award":["GRE19-04"],"award-info":[{"award-number":["GRE19-04"]}],"id":[{"id":"10.13039\/100009092","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soft Comput"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The <jats:italic>k<\/jats:italic>-nearest neighbor (<jats:italic>k<\/jats:italic>NN) rule is one of the best-known distance-based classifiers, and is usually associated with high performance and versatility as it requires only the definition of a dissimilarity measure. Nevertheless, <jats:italic>k<\/jats:italic>NN is also coupled with low-efficiency levels since, for each new query, the algorithm must carry out an exhaustive search of the training data, and this drawback is much more relevant when considering complex structural representations, such as graphs, trees or strings, owing to the cost of the dissimilarity metrics. This issue has generally been tackled through the use of data reduction (DR) techniques, which reduce the size of the reference set, but the complexity of structural data has historically limited their application in the aforementioned scenarios. A DR algorithm denominated as reduction through homogeneous clusters (RHC) has recently been adapted to string representations but as obtaining the exact median value of a set of string data is known to be computationally difficult, its authors resorted to computing the set-median value. Under the premise that a more exact median value may be beneficial in this context, we, therefore, present a new adaptation of the RHC algorithm for string data, in which an approximate median computation is carried out. The results obtained show significant improvements when compared to those of the set-median version of the algorithm, in terms of both classification performance and reduction rates.\n<\/jats:p>","DOI":"10.1007\/s00500-021-06178-2","type":"journal-article","created":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T20:59:34Z","timestamp":1630702774000},"page":"15403-15415","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Prototype generation in the string space via approximate median for data reduction in nearest neighbor classification"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9949-5522","authenticated-orcid":false,"given":"Francisco J.","family":"Castellanos","sequence":"first","affiliation":[]},{"given":"Jose J.","family":"Valero-Mas","sequence":"additional","affiliation":[]},{"given":"Jorge","family":"Calvo-Zaragoza","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,9,2]]},"reference":[{"key":"6178_CR1","doi-asserted-by":"crossref","unstructured":"Abdel-Hamid O, Mohamed AR, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4277\u20134280","DOI":"10.1109\/ICASSP.2012.6288864"},{"key":"6178_CR2","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1016\/j.patrec.2013.09.014","volume":"36","author":"J Abreu","year":"2014","unstructured":"Abreu J, Rico-Juan JR (2014) A new iterative algorithm for computing a quality approximate median of strings based on edit operations. Pattern Recogn Lett 36:74\u201380","journal-title":"Pattern Recogn Lett"},{"issue":"3","key":"6178_CR3","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1007\/s00500-008-0323-y","volume":"13","author":"J Alcal\u00e1-Fdez","year":"2009","unstructured":"Alcal\u00e1-Fdez J, S\u00e1nchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM et al (2009) Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307\u2013318","journal-title":"Soft Comput"},{"issue":"1\u20133","key":"6178_CR4","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1016\/j.tcs.2004.12.030","volume":"337","author":"P Bille","year":"2005","unstructured":"Bille P (2005) A survey on tree edit distance and related problems. Theoret Comput Sci 337(1\u20133):217\u2013239","journal-title":"Theoret Comput Sci"},{"issue":"7","key":"6178_CR5","doi-asserted-by":"publisher","first-page":"811","DOI":"10.1016\/j.patrec.2011.04.017","volume":"33","author":"H Bunke","year":"2012","unstructured":"Bunke H, Riesen K (2012) Towards the unification of structural and statistical pattern recognition. Pattern Recogn Lett 33(7):811\u2013825","journal-title":"Pattern Recogn Lett"},{"key":"6178_CR6","unstructured":"Calvo-Zaragoza J, Rizo D, I\u00f1esta JM (2016) Two (note) heads are better than one: pen-based multimodal interaction with music scores. In: Proceedings of the 17th international society for music information retrieval conference (ISMIR). New York City, pp 509\u2013514"},{"issue":"05","key":"6178_CR7","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1142\/S0129054117400093","volume":"28","author":"J Calvo-Zaragoza","year":"2017","unstructured":"Calvo-Zaragoza J, Oncina J, de la Higuera C (2017a) Computing the expected edit distance from a string to a probabilistic finite-state automaton. Int J Found Comput Sci 28(05):603\u2013621","journal-title":"Int J Found Comput Sci"},{"issue":"9","key":"6178_CR8","doi-asserted-by":"publisher","first-page":"2415","DOI":"10.1007\/s00521-016-2278-8","volume":"28","author":"J Calvo-Zaragoza","year":"2017","unstructured":"Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2017b) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl 28(9):2415\u20132424","journal-title":"Neural Comput Appl"},{"issue":"5","key":"6178_CR9","doi-asserted-by":"publisher","first-page":"654","DOI":"10.3390\/app8050654","volume":"8","author":"J Calvo-Zaragoza","year":"2018","unstructured":"Calvo-Zaragoza J, Castellanos FJ, Vigliensoni G, Fujinaga I (2018) Deep neural networks for document processing of music score images. Appl Sci 8(5):654","journal-title":"Appl Sci"},{"key":"6178_CR10","doi-asserted-by":"crossref","unstructured":"Chakraborty D, Das D, Krauthgamer R (2021) Approximating the median under the ulam metric. In: Proceedings of the 2021 ACM-SIAM symposium on discrete algorithms (SODA). SIAM, pp 761\u2013775","DOI":"10.1137\/1.9781611976465.48"},{"key":"6178_CR11","doi-asserted-by":"crossref","unstructured":"Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, IEEE. pp 3642\u20133649","DOI":"10.1109\/CVPR.2012.6248110"},{"issue":"1","key":"6178_CR12","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","volume":"13","author":"TM Cover","year":"1967","unstructured":"Cover TM, Hart PE (1967) Nearest neighbor pattern classification. Inf Theory IEEE Trans 13(1):21\u201327","journal-title":"Inf Theory IEEE Trans"},{"key":"6178_CR13","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"6178_CR14","volume-title":"Pattern classification","author":"RO Duda","year":"2012","unstructured":"Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New Jersey"},{"issue":"7","key":"6178_CR15","doi-asserted-by":"publisher","first-page":"826","DOI":"10.1016\/j.patrec.2011.04.019","volume":"33","author":"RP Duin","year":"2012","unstructured":"Duin RP, Pekalska E (2012) The dissimilarity space:bridging structural and statistical pattern recognition. Pattern Recogn Lett 33(7):826\u2013832","journal-title":"Pattern Recogn Lett"},{"key":"6178_CR16","unstructured":"Fischer I, Zell A (2000) String averages and self-organizing maps for strings. In: Proceedings of the second ICSC symposium on neural computation (NC\u20192000)"},{"key":"6178_CR17","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1109\/TEC.1961.5219197","volume":"2","author":"H Freeman","year":"1961","unstructured":"Freeman H (1961) On the encoding of arbitrary geometric configurations. IRE Trans Electron Comput 2:260\u2013268","journal-title":"IRE Trans Electron Comput"},{"issue":"1","key":"6178_CR18","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s10044-008-0141-y","volume":"13","author":"X Gao","year":"2010","unstructured":"Gao X, Xiao B, Tao D, Li X (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113\u2013129","journal-title":"Pattern Anal Appl"},{"key":"6178_CR19","doi-asserted-by":"crossref","unstructured":"Garc\u00eda S, Luengo J, Herrera F (2015) Data preprocessing in data mining. In: Intelligent systems reference library","DOI":"10.1007\/978-3-319-10247-4"},{"key":"6178_CR20","unstructured":"Hinarejos CDM (2003) La cadena media y su aplicaci\u00f3n en reconocimiento de formas. PhD thesis, Universitat Polit\u00e8cnica de Val\u00e8ncia"},{"issue":"5","key":"6178_CR21","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1109\/34.291440","volume":"16","author":"JJ Hull","year":"1994","unstructured":"Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550\u2013554","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"8","key":"6178_CR22","doi-asserted-by":"publisher","first-page":"1363","DOI":"10.3390\/sym12081363","volume":"12","author":"MS Kaysar","year":"2020","unstructured":"Kaysar MS, Khan MI (2020) A modified median string algorithm for gene regulatory motif classification. Symmetry 12(8):1363","journal-title":"Symmetry"},{"issue":"5","key":"6178_CR23","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1016\/0167-8655(85)90061-3","volume":"3","author":"T Kohonen","year":"1985","unstructured":"Kohonen T (1985) Median strings. Pattern Recogn Lett 3(5):309\u2013313. https:\/\/doi.org\/10.1016\/0167-8655(85)90061-3","journal-title":"Pattern Recogn Lett"},{"issue":"2","key":"6178_CR24","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1137\/1025045","volume":"25","author":"JB Kruskal","year":"1983","unstructured":"Kruskal JB (1983) An overview of sequence comparison: time warps, string edits, and macromolecules. SIAM Rev 25(2):201\u2013237","journal-title":"SIAM Rev"},{"issue":"11","key":"6178_CR25","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324","journal-title":"Proc IEEE"},{"issue":"8","key":"6178_CR26","first-page":"707","volume":"10","author":"VI Levenshtein","year":"1966","unstructured":"Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707\u2013710","journal-title":"Sov Phys Dokl"},{"issue":"2","key":"6178_CR27","doi-asserted-by":"publisher","first-page":"556","DOI":"10.1109\/TASLP.2013.2294580","volume":"22","author":"M McVicar","year":"2014","unstructured":"McVicar M, Santos-Rodr\u00edguez R, Ni Y, De Bie T (2014) Automatic chord estimation from audio: a review of the state of the art. IEEE\/ACM Trans Audio Speech Lang. Process. (TASLP) 22(2):556\u2013575","journal-title":"IEEE\/ACM Trans Audio Speech Lang. Process. (TASLP)"},{"key":"6178_CR28","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1016\/j.patrec.2019.02.004","volume":"120","author":"P Mirabal","year":"2019","unstructured":"Mirabal P, Abreu J, Seco D (2019) Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string. Pattern Recogn Lett 120:104\u2013111","journal-title":"Pattern Recogn Lett"},{"key":"6178_CR29","volume-title":"Machine learning","author":"TM Mitchell","year":"1997","unstructured":"Mitchell TM (1997) Machine learning. McGraw-Hill, New York"},{"issue":"9","key":"6178_CR30","doi-asserted-by":"publisher","first-page":"11820","DOI":"10.1016\/j.eswa.2011.03.070","volume":"38","author":"L Nanni","year":"2011","unstructured":"Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Expert Syst Appl 38(9):11820\u201311828","journal-title":"Expert Syst Appl"},{"issue":"2\u20134","key":"6178_CR31","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1016\/j.jda.2004.08.015","volume":"3","author":"F Nicolas","year":"2005","unstructured":"Nicolas F, Rivals E (2005) Hardness results for the center and median string problems under the weighted and unweighted edit distances. J Discrete Algorithms 3(2\u20134):390\u2013415","journal-title":"J Discrete Algorithms"},{"issue":"1","key":"6178_CR32","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s10044-014-0393-7","volume":"19","author":"S Ougiaroglou","year":"2016","unstructured":"Ougiaroglou S, Evangelidis G (2016) Rhc: a non-parametric cluster-based data reduction for efficient $$k$$-nn classification. IEEE Trans Pattern Anal Appl 19(1):93\u2013109","journal-title":"IEEE Trans Pattern Anal Appl"},{"issue":"1","key":"6178_CR33","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1109\/34.824821","volume":"22","author":"R Plamondon","year":"2000","unstructured":"Plamondon R, Srihari SN (2000) Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63\u201384","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"9","key":"6178_CR34","first-page":"1427","volume":"24","author":"JR Rico-Juan","year":"2003","unstructured":"Rico-Juan JR, Mic\u00f3 L (2003) Comparison of AESA and LAESA search algorithms using string and tree edit distances. Pattern Recogn Lett 24(9):1427\u20131436","journal-title":"Pattern Recogn Lett"},{"key":"6178_CR35","doi-asserted-by":"publisher","first-page":"105803","DOI":"10.1016\/j.asoc.2019.105803","volume":"85","author":"JR Rico-Juan","year":"2019","unstructured":"Rico-Juan JR, Valero-Mas JJ, Calvo-Zaragoza J (2019) Extensions to rank-based prototype selection in k-nearest neighbour classification. Appl Soft Comput 85:105803. https:\/\/doi.org\/10.1016\/j.asoc.2019.105803","journal-title":"Appl Soft Comput"},{"issue":"1","key":"6178_CR36","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1007\/s10032-019-00316-1","volume":"22","author":"K Riesen","year":"2019","unstructured":"Riesen K, Schmidt R (2019) Online signature verification based on string edit distance. Int J Doc Anal Recogn 22(1):41\u201354","journal-title":"Int J Doc Anal Recogn"},{"issue":"1","key":"6178_CR37","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1109\/TSMCC.2010.2103939","volume":"42","author":"I Triguero","year":"2012","unstructured":"Triguero I, Derrac J, Garcia S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C Appl Rev 42(1):86\u2013100","journal-title":"IEEE Trans Syst Man Cybern Part C Appl Rev"},{"issue":"10","key":"6178_CR38","doi-asserted-by":"publisher","first-page":"3356","DOI":"10.3390\/app10103356","volume":"10","author":"JJ Valero-Mas","year":"2020","unstructured":"Valero-Mas JJ, Castellanos FJ (2020) Data reduction in the string space for efficient knn classification through space partitioning. Appl Sci 10(10):3356","journal-title":"Appl Sci"},{"key":"6178_CR39","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1016\/j.neucom.2016.04.018","volume":"203","author":"JJ Valero-Mas","year":"2016","unstructured":"Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR (2016) On the suitability of prototype selection methods for knn classification with distributed data. Neurocomputing 203:150\u2013160","journal-title":"Neurocomputing"},{"key":"6178_CR40","unstructured":"Wilkinson RA (1992) The first census optical character recognition system conference, vol 4912. US Department of Commerce, National Institute of Standards and Technology"},{"key":"6178_CR41","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-019-03865-z","author":"L Yang","year":"2019","unstructured":"Yang L, Zhu QS, Jinlong H, Wu Q, Cheng D, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput. https:\/\/doi.org\/10.1007\/s00500-019-03865-z","journal-title":"Soft Comput."}],"container-title":["Soft Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-021-06178-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00500-021-06178-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00500-021-06178-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,5]],"date-time":"2021-11-05T15:02:47Z","timestamp":1636124567000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00500-021-06178-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,2]]},"references-count":41,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["6178"],"URL":"https:\/\/doi.org\/10.1007\/s00500-021-06178-2","relation":{},"ISSN":["1432-7643","1433-7479"],"issn-type":[{"type":"print","value":"1432-7643"},{"type":"electronic","value":"1433-7479"}],"subject":[],"published":{"date-parts":[[2021,9,2]]},"assertion":[{"value":"20 August 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 September 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}},{"value":"This paper contains no cases of studies with human participants performed by any of the authors.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}]}}