{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T13:48:26Z","timestamp":1778593706908,"version":"3.51.4"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivations: Technological advances in biomedical research are generating a plethora of heterogeneous data at a high rate. There is a critical need for extraction, integration and management tools for information discovery and synthesis from these heterogeneous data.<\/jats:p><jats:p>Results: In this paper, we present a general architecture, called ALFA, for information extraction and representation from diverse biological data. The ALFA architecture consists of: (i) a networked, hierarchical, hyper-graph object model for representing information from heterogeneous data sources in a standardized, structured format; and (ii) a suite of integrated, interactive software tools for information extraction and representation from diverse biological data sources. As part of our research efforts to explore this space, we have currently prototyped the ALFA object model and a set of interactive software tools for searching, filtering, and extracting information from scientific text. In particular, we describe BioFerret, a meta-search tool for searching and filtering relevant information from the web, and ALFA Text Viewer, an interactive tool for user-guided extraction, disambiguation, and representation of information from scientific text. We further demonstrate the potential of our tools in integrating the extracted information with experimental data and diagrammatic biological models via the common underlying ALFA representation.<\/jats:p><jats:p>Contact: \u00a0aditya_vailaya@agilent.com<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti187","type":"journal-article","created":{"date-parts":[[2004,12,18]],"date-time":"2004-12-18T01:16:41Z","timestamp":1103332601000},"page":"430-438","source":"Crossref","is-referenced-by-count":53,"title":["An architecture for biological information extraction and representation"],"prefix":"10.1093","volume":"21","author":[{"given":"Aditya","family":"Vailaya","sequence":"first","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Bluvas","sequence":"additional","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Kincaid","sequence":"additional","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Allan","family":"Kuchinsky","sequence":"additional","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Creech","sequence":"additional","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Annette","family":"Adler","sequence":"additional","affiliation":[{"name":"Agilent Laboratories 3500 Deer Creek Road, MS 26U-16, Palo Alto, CA 94304, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2004,12,17]]},"reference":[{"key":"2023013107235575800_B1","unstructured":"Clare, A. and King, R.D. 2002How well do we understand the clusters found in microarray data?. InSilico Biol.2511\u2013522"},{"key":"2023013107235575800_B2","doi-asserted-by":"crossref","unstructured":"Collier, N., Nobata, C., Tsujii, J. 2000Extracting the names of genes and gene products with a hidden Markov model. Proceedings of the 18th International Conference on Computational Linguistics, Universit\u00e4t des Saarlandes , Saarbr\u00fccken July 31\u2013August 4. Association for Computational Linguistics, Morristown, NJ Germany, pp. 201\u2013207","DOI":"10.3115\/990820.990850"},{"key":"2023013107235575800_B3","doi-asserted-by":"crossref","unstructured":"Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A. 2001GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics11\u20139","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74"},{"key":"2023013107235575800_B4","unstructured":"Fukuda, K., Tsunoda, T., Tamura, A., Takagi, A. 1998Toward information extraction: identifying protein names from biological papers. Pac. Symp. Biocomput.3705\u2013716"},{"key":"2023013107235575800_B5","unstructured":"Gene Ontology\u2122. 2004Gene Ontology Consortium"},{"key":"2023013107235575800_B6","unstructured":"Humphreys, K., Demetriou, G., Gaizauskas, R. 2000Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac. Symp. Biocomput.5502\u2013513"},{"key":"2023013107235575800_B7","unstructured":"Iliopoulos, I., Enright, A.J., Ouzounis, C.A. 2001TEXTQUEST: document clustering of MEDLINE abstracts for concept discovery in molecular biology. Pac. Symp. Biocomput.6384\u2013395"},{"key":"2023013107235575800_B8","unstructured":"Jain, A.K. and Dubes, R.C. Algorithms for Clustering Data1998, Englewood, NJ Prentice Hall"},{"key":"2023013107235575800_B9","unstructured":"Kincaid, R., Kleusing, D., Vailaya, A. 2002BNS: an LDAP-based biomolecule naming service. Proceedings of the Conference on Objects in Bio- & Chem-Informatics 2002 (OiBC-2002) , Arlington, VA November 18\u201319"},{"key":"2023013107235575800_B10","doi-asserted-by":"crossref","unstructured":"Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C. 2000Using BLAST for identifying gene and protein names in journal articles. Gene259, pp. 245\u2013252","DOI":"10.1016\/S0378-1119(00)00431-5"},{"key":"2023013107235575800_B11","unstructured":"Ng, S.-K. and Wong, M. 1999Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Inform.10104\u2013112"},{"key":"2023013107235575800_B12","unstructured":"O'Day, V.L., Adler, A., Kuchinsky, A., Bouch, A. 2001When worlds collide: molecular biology as interdisciplinary collaboration. In Prinz, W., Jarke, M., Rogers, Y., Schmidt, K., Wulf, V. (Eds.). Proceedings of the Seventh European Conference on Computer Supported Cooperative Work , Germany September 16\u201320. Kluwer Academic Publishers, Dordrecht Bonn, pp. 399\u2013418"},{"key":"2023013107235575800_B13","unstructured":"Palakal, M., Mukhopadhyay, S., Mostafa, J., Raje, R., N'Cho, M., Mishra, S. 2002An intelligent biological information management system. Bioinformatics181283\u20131288"},{"key":"2023013107235575800_B14","unstructured":"Palakal, M., Stephens, M., Mukhpodhyay, S., Raje, R., Rhodes, S. 2002A multi-level text mining method to extract biological relationships. Proceedings of the 1st IEEE Computer Society Bioinformatics Conference (CSB 2002) , Stanford, CA August 14\u201316 IEEE Computer Society"},{"key":"2023013107235575800_B15","doi-asserted-by":"crossref","unstructured":"Park, J.C., Kim, H.S., Kim, J.J. 2001Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. Pac. Symp. Biocomput.6, pp. 396\u2013407","DOI":"10.1142\/9789814447362_0039"},{"key":"2023013107235575800_B16","unstructured":"Porter, M.F. 1980An algorithm for suffix stripping. Program14130\u2013137"},{"key":"2023013107235575800_B17","unstructured":"Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L. 2000EDGAR: extraction of and drugs, genes, and relations from the biomedical literature. Pac. Symp. Biocomput.5514\u2013525"},{"key":"2023013107235575800_B18","unstructured":"Sekimizu, T., Park, H.S., Tsujii, J. 1998Identifying the interaction between genes and gene products based on frequently seen verbs in medline abstracts. Genome Inform.962\u201371"},{"key":"2023013107235575800_B19","unstructured":"Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R., Mostafa, J. 2001Detecting gene relations from MEDLINE abstracts. Pac. Symp. Biocomput.6483\u2013496"},{"key":"2023013107235575800_B20","doi-asserted-by":"crossref","unstructured":"Wahl, M., Howes, T., Kille, S. 1997Lightweight Directory Access Protocol (v3). IETF RFC2551","DOI":"10.17487\/rfc2251"},{"key":"2023013107235575800_B21","unstructured":"Wong, L. 2001PIES, a Protein Interaction Extraction System. Pac. Symp. Biocomput.6520\u2013531"},{"key":"2023013107235575800_B22","unstructured":"Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J. 2001Event extraction from biomedical papers using a full parser. Pac. Symp. Biocomput.6408\u2013419"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/4\/430\/48965185\/bioinformatics_21_4_430.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/4\/430\/48965185\/bioinformatics_21_4_430.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,21]],"date-time":"2024-12-21T12:46:31Z","timestamp":1734785191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/4\/430\/203523"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12,17]]},"references-count":22,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2005,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti187","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,2,15]]},"published":{"date-parts":[[2004,12,17]]}}}