{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:09:22Z","timestamp":1761581362091,"version":"3.37.3"},"reference-count":54,"publisher":"Wiley","license":[{"start":{"date-parts":[[2015,1,1]],"date-time":"2015-01-01T00:00:00Z","timestamp":1420070400000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"],"award-info":[{"award-number":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"crossref","award":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"],"award-info":[{"award-number":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100006280","name":"Spanish Ministry of Science and Technology","doi-asserted-by":"crossref","award":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"],"award-info":[{"award-number":["SFRH\/BPD\/92978\/2013","TIN2014-57251-P"]}],"id":[{"id":"10.13039\/501100006280","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BioMed Research International"],"published-print":{"date-parts":[[2015]]},"abstract":"<jats:p>Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs:<jats:italic>Saccharomyces cerevisiae<\/jats:italic>-<jats:italic>Kluyveromyces lactis<\/jats:italic>,<jats:italic>Saccharomyces cerevisiae<\/jats:italic>-<jats:italic>Candida glabrata<\/jats:italic>, and<jats:italic>Saccharomyces cerevisiae<\/jats:italic>-<jats:italic>Schizosaccharomyces pombe<\/jats:italic>as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.<\/jats:p>","DOI":"10.1155\/2015\/748681","type":"journal-article","created":{"date-parts":[[2015,10,29]],"date-time":"2015-10-29T21:01:30Z","timestamp":1446152490000},"page":"1-12","source":"Crossref","is-referenced-by-count":15,"title":["An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species"],"prefix":"10.1155","volume":"2015","author":[{"given":"Deborah","family":"Galpert","sequence":"first","affiliation":[{"name":"Departamento de Ciencias de la Computaci\u00f3n, Universidad Central \u201cMarta Abreu\u201d de Las Villas (UCLV), 54830 Santa Clara, Cuba"}]},{"given":"Sara","family":"del R\u00edo","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain"}]},{"given":"Francisco","family":"Herrera","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain"}]},{"given":"Evys","family":"Ancede-Gallardo","sequence":"additional","affiliation":[{"name":"Centro de Bioactivos Qu\u00edmicos, Universidad Central \u201cMarta Abreu\u201d de Las Villas (UCLV), 54830 Santa Clara, Cuba"}]},{"given":"Agostinho","family":"Antunes","sequence":"additional","affiliation":[{"name":"Centro Interdisciplinar de Investiga\u00e7\u00e3o Marinha e Ambiental (CIMAR\/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal"},{"name":"Departamento de Biologia, Faculdade de Ci\u00eancias, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal"}]},{"given":"Guillermin","family":"Ag\u00fcero-Chapin","sequence":"additional","affiliation":[{"name":"Centro de Bioactivos Qu\u00edmicos, Universidad Central \u201cMarta Abreu\u201d de Las Villas (UCLV), 54830 Santa Clara, Cuba"},{"name":"Centro Interdisciplinar de Investiga\u00e7\u00e3o Marinha e Ambiental (CIMAR\/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal"}]}],"member":"311","reference":[{"key":"1","doi-asserted-by":"publisher","DOI":"10.2307\/2412448"},{"key":"2","doi-asserted-by":"publisher","DOI":"10.1126\/science.278.5338.631"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl213"},{"key":"4","doi-asserted-by":"publisher","DOI":"10.1101\/gr.1224503"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1007\/11554714_6"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-12-11"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bts006"},{"key":"8","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0105015"},{"key":"9","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btk040"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkp951"},{"key":"11","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkn485"},{"key":"12","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkq953"},{"key":"13","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkq1109"},{"key":"16","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/25.17.3389"},{"key":"17","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.96.6.2896"},{"issue":"6841","key":"18","first-page":"1040","volume":"411","year":"2001","journal-title":"Nature"},{"key":"19","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg213"},{"key":"21","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-518"},{"key":"22","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbr030"},{"key":"23","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2008.08.009"},{"key":"24","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0018755"},{"year":"2005","key":"25"},{"key":"26","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti045"},{"key":"27","doi-asserted-by":"publisher","DOI":"10.1109\/tcbb.2005.48"},{"key":"28","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2007.0048"},{"key":"29","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-11-s7-s6"},{"key":"30","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu492"},{"key":"31","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1134"},{"key":"33","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.01.015"},{"key":"34","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.03.043"},{"key":"35","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"37","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2012-13-7-r57"},{"key":"38","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(81)90087-5"},{"key":"39","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(70)90057-4"},{"year":"2006","key":"40"},{"key":"41","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0011147"},{"key":"42","doi-asserted-by":"publisher","DOI":"10.1002\/(sici)1097-0134(19990101)34:160;49::aid-prot562;3.0.co;2-l"},{"key":"43","series-title":"Lecture Notes in Computer Science","volume-title":"Rough sets in ortholog gene detection","volume":"8537","year":"2014"},{"issue":"1","key":"44","first-page":"19","volume":"18","year":"2014","journal-title":"Computaci\u00f3n y Sistemas"},{"year":"2012","key":"45"},{"year":"2011","key":"46"},{"key":"49","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(02)00257-1"},{"key":"50","doi-asserted-by":"publisher","DOI":"10.1016\/s0031-3203(96)00142-2"},{"key":"51","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.239"},{"key":"52","doi-asserted-by":"publisher","DOI":"10.1101\/gr.3672305"},{"key":"53","doi-asserted-by":"publisher","DOI":"10.1002\/0471250953.bi0305s43"},{"key":"54","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2015.05.027"},{"key":"55","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl286"},{"key":"56","doi-asserted-by":"publisher","DOI":"10.1007\/4735_97"},{"key":"57","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2010.09.018"},{"key":"58","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm585"},{"key":"59","first-page":"163","volume":"13","year":"2002","journal-title":"Genome Informatics"},{"key":"60","doi-asserted-by":"publisher","DOI":"10.1101\/gr.2289704"},{"key":"61","doi-asserted-by":"publisher","DOI":"10.1101\/gr.5232407"}],"container-title":["BioMed Research International"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/bmri\/2015\/748681.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/bmri\/2015\/748681.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/bmri\/2015\/748681.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,16]],"date-time":"2020-05-16T01:49:12Z","timestamp":1589593752000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.hindawi.com\/journals\/bmri\/2015\/748681\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015]]},"references-count":54,"alternative-id":["748681","748681"],"URL":"https:\/\/doi.org\/10.1155\/2015\/748681","relation":{},"ISSN":["2314-6133","2314-6141"],"issn-type":[{"type":"print","value":"2314-6133"},{"type":"electronic","value":"2314-6141"}],"subject":[],"published":{"date-parts":[[2015]]}}}