{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T05:24:54Z","timestamp":1740029094072,"version":"3.37.3"},"reference-count":28,"publisher":"IGI Global","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,1,1]]},"abstract":"<p>Protein-protein interaction (PPI) networks are essential to understand the fundamental processes governing cell biology. Recently, studying PPI networks becomes possible due to advances in experimental high-throughput genomics and proteomics technologies. Many interactions from such high-throughput studies and most interactions from small-scale studies are reported only in the scientific literature and thus are not accessible in a readily analyzable format. This has led to the birth of manual curation initiatives such as the International Molecular Exchange Consortium (IMEx). The manual curation of PPI knowledge can be accelerated by text mining systems to retrieve PPI-relevant articles (article retrieval) and extract PPI-relevant knowledge (information extraction). In this article, the authors focus on article retrieval and define the task as binary classification where PPI-relevant articles are positives and the others are negatives. In order to build such classifier, an annotated corpus is needed. It is very expensive to obtain an annotated corpus manually but a noisy and imbalanced annotated corpus can be obtained automatically, where a collection of positive documents can be retrieved from existing PPI knowledge bases and a large number of unlabeled documents (most of them are negatives) can be retrieved from PubMed. They compared the performance of several machine learning algorithms by varying the ratio of the number of positives to the number of unlabeled documents and the number of features used.<\/p>","DOI":"10.4018\/jcmam.2010072003","type":"journal-article","created":{"date-parts":[[2010,4,16]],"date-time":"2010-04-16T17:53:52Z","timestamp":1271440432000},"page":"34-44","source":"Crossref","is-referenced-by-count":0,"title":["Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval"],"prefix":"10.4018","volume":"1","author":[{"given":"Hongfang","family":"Liu","sequence":"first","affiliation":[{"name":"Georgetown University Medical Center, USA"}]},{"given":"Manabu","family":"Torii","sequence":"additional","affiliation":[{"name":"Georgetown University Medical Center, USA"}]},{"given":"Guixian","family":"Xu","sequence":"additional","affiliation":[{"name":"Minzu University of China, China"}]},{"given":"Johannes","family":"Goll","sequence":"additional","affiliation":[{"name":"The J. Craig Venter Institute, USA"}]}],"member":"2432","reference":[{"key":"jcmam.2010072003-0","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/978-3-540-30115-8_7","article-title":"Applying support vector machines to imbalanced datasets.","volume":"3201","author":"R.Akbani","year":"2004","journal-title":"Lecture Notes in Computer Science"},{"key":"jcmam.2010072003-1","unstructured":"Chang, C.-C., & Lin, C.-J. (2009). LIBSVM: A library for support vector machines. Software available at http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\/."},{"key":"jcmam.2010072003-2","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007733"},{"key":"jcmam.2010072003-3","doi-asserted-by":"crossref","unstructured":"Chen, X., & Wasikowski, M. (2008). FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems.","DOI":"10.1145\/1401890.1401910"},{"key":"jcmam.2010072003-4","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017197"},{"key":"jcmam.2010072003-5","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45014-9_1"},{"key":"jcmam.2010072003-6","doi-asserted-by":"crossref","unstructured":"Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data.","DOI":"10.1145\/1401890.1401920"},{"key":"jcmam.2010072003-7","doi-asserted-by":"publisher","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x"},{"key":"jcmam.2010072003-8","doi-asserted-by":"publisher","DOI":"10.1162\/153244303322753670"},{"key":"jcmam.2010072003-9","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btn285"},{"key":"jcmam.2010072003-10","unstructured":"Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of Data Mining. MIT Press."},{"key":"jcmam.2010072003-11","doi-asserted-by":"crossref","unstructured":"Joachims, T. (1998). Text categorization with Support Vector Machines: Learning with many relevant features. In Proc of Tenth European Conference on Machine Learning (ECML-98).","DOI":"10.1007\/BFb0026683"},{"key":"jcmam.2010072003-12","unstructured":"Joachims, T. (1999). Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. MIT-Press."},{"key":"jcmam.2010072003-13","unstructured":"Komarek, P., & Moore, A. (2005). Making logistic regression a core data mining tool: A practical investigation of accuracy, speed, and simplicity (pp. 685-688). Institute, Carnegie Mellon University."},{"key":"jcmam.2010072003-14","doi-asserted-by":"crossref","unstructured":"Molinara, M., Ricamato, M. T., & Tortorella, F. (2007). Facing Imbalanced Classes through Aggregation of Classifiers. IEEE Computer Society Washington, DC, USA.","DOI":"10.1109\/ICIAP.2007.4362755"},{"key":"jcmam.2010072003-15","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-6-233"},{"key":"jcmam.2010072003-16","doi-asserted-by":"crossref","unstructured":"Ng, W., & Dash, M. (2006). An Evaluation of Progressive Sampling for Imbalanced Data Sets. IEEE Computer Society Washington, DC, USA.","DOI":"10.1109\/ICDMW.2006.28"},{"key":"jcmam.2010072003-17","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btn481"},{"key":"jcmam.2010072003-18","doi-asserted-by":"crossref","unstructured":"Scholkopf, B., & Smola, A. J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MA, USA: MIT Press Cambridge.","DOI":"10.7551\/mitpress\/4175.001.0001"},{"key":"jcmam.2010072003-19","doi-asserted-by":"crossref","unstructured":"Vapnik, V. N. (2000). The Nature of Statistical Learning Theory. Springer.","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"jcmam.2010072003-20","unstructured":"Visa, S., & Ralescu, A. (2005). Issues in mining imbalanced data sets-a review paper."},{"key":"jcmam.2010072003-21","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007734"},{"key":"jcmam.2010072003-22","doi-asserted-by":"crossref","unstructured":"Weng, C. G., & Poon, J. (2006). A Data Complexity Analysis on Imbalanced Datasets and an Alternative Imbalance Recovering Strategy. IEEE Computer Society Washington, DC, USA.","DOI":"10.1109\/WI.2006.9"},{"key":"jcmam.2010072003-23","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkj161"},{"key":"jcmam.2010072003-24","doi-asserted-by":"crossref","unstructured":"Xu, G., Niu, Z., Uetz, P., Gao, X., Qin, X., & Liu, H. (2009). Semi-Supervised Learning of Text Classification on Bacterial Protein-Protein Interaction Documents. International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS'09).","DOI":"10.1109\/IJCBS.2009.68"},{"key":"jcmam.2010072003-25","unstructured":"Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Fourteenth International Conference on Machine Learning."},{"key":"jcmam.2010072003-26","unstructured":"Yen, S. J., Lee, Y. S., Lin, C. H., & Ying, J. C. (n.d.). Investigating the Effect ofSampling Methods for Imbalanced Data Distributions."},{"key":"jcmam.2010072003-27","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007741"}],"container-title":["International Journal of Computational Models and Algorithms in Medicine"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=38943","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T01:32:46Z","timestamp":1740015166000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jcmam.2010072003"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2010,1,1]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"URL":"https:\/\/doi.org\/10.4018\/jcmam.2010072003","relation":{},"ISSN":["1947-3133","1947-3141"],"issn-type":[{"type":"print","value":"1947-3133"},{"type":"electronic","value":"1947-3141"}],"subject":[],"published":{"date-parts":[[2010,1,1]]}}}