{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T18:09:23Z","timestamp":1754158163316,"version":"3.41.2"},"reference-count":28,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2008,8,8]],"date-time":"2008-08-08T00:00:00Z","timestamp":1218153600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,8,8]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-heading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Design\/methodology\/approach<\/jats:title><jats:p>The paper presents and discusses an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that supports such an extraction pipeline is detailed and discussed.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Findings<\/jats:title><jats:p>The proposed pipeline is implemented in a working prototype of an autonomous digital library (A\u2010DL) system called ScienceTreks that: supports a broad range of methods for document acquisition; does not rely on any external information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; and provides application programming interfaces (API) to support easy integration of external systems and tools in the existing pipeline.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Practical implications<\/jats:title><jats:p>The proposed A\u2010DL system can be used in automating end\u2010to\u2010end information retrieval and processing, supporting the control and elimination of error\u2010prone human intervention in the process.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-heading\">Originality\/value<\/jats:title><jats:p>High quality automatic metadata extraction is a crucial step in the move from linguistic entities to logical entities, relation information and logical relations, and therefore to the semantic level of digital library usability. This in turn creates the opportunity for value\u2010added services within existing and future semantic\u2010enabled digital library systems.<\/jats:p><\/jats:sec>","DOI":"10.1108\/14684520810897368","type":"journal-article","created":{"date-parts":[[2008,8,30]],"date-time":"2008-08-30T07:10:41Z","timestamp":1220080241000},"page":"488-499","source":"Crossref","is-referenced-by-count":3,"title":["ScienceTreks: an autonomous digital library system"],"prefix":"10.1108","volume":"32","author":[{"given":"Alexander","family":"Ivanyukovich","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maurizio","family":"Marchese","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fausto","family":"Giunchiglia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"key2022012120403459200_b1","doi-asserted-by":"crossref","unstructured":"Brin, S. and Page, L. (1998), \u201cThe anatomy of a large\u2010scale hypertextual web search engine\u201d, Proceedings of the 7th World Wide Web Conference, Computer Networks and ISDN Systems, Vol. 30 Nos 1\/7, pp. 107\u201017.","DOI":"10.1016\/S0169-7552(98)00110-X"},{"key":"key2022012120403459200_b2","doi-asserted-by":"crossref","unstructured":"Cho, J. and Garcia\u2010Molina, H. (2002), \u201cParallel crawlers\u201d, Proceedings of the WWW2002, Honolulu, Hawaii, 7\u201011 May, available at: www2002.org\/CDROM\/refereed\/108\/.","DOI":"10.1145\/511446.511464"},{"key":"key2022012120403459200_b3","doi-asserted-by":"crossref","unstructured":"Conrad, J.G. and Schriber, C.P. (2006), \u201cManaging d\u00e9j\u00e0 vu: collection building for the identification of nonidentical duplicate documents\u201d, Journal of the American Society for Information Science and Technology, Vol. 57 No. 7, pp. 921\u201032.","DOI":"10.1002\/asi.20363"},{"key":"key2022012120403459200_b4","doi-asserted-by":"crossref","unstructured":"Cordy, J. (2004), \u201cTxl \u2013 a language for programming language tools and applications\u201d, Proceedings of 4th International Workshop on Language Descriptions, Tools and Applications. Electronic Notes in Theoretical Computer Science, Vol. 110, pp. 3\u201031.","DOI":"10.1016\/j.entcs.2004.11.006"},{"key":"key2022012120403459200_b5","unstructured":"Diligenti, M., Coetzee, F.M., Lawrence, S., Giles, C.L. and Gori, M. (2000), \u201cFocused crawling using context graphs\u201d, Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp. 527\u201034."},{"key":"key2022012120403459200_b6","doi-asserted-by":"crossref","unstructured":"Giles, C.L., Bollacker, K.D. and Lawrence, S. (1998), \u201cCiteSeer: an automatic citation indexing system\u201d, Proceedings of the 3rd ACM Conference on Digital Libraries, ACM, New York, NY, pp. 89\u201098.","DOI":"10.1145\/276675.276685"},{"key":"key2022012120403459200_b7","unstructured":"Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z. and Fox, E.A. (2003), \u201cAutomatic document metadata extraction using support vector machines\u201d, Proceedings of the 3rd ACM\/IEEE\u2010CS joint Conference on Digital Libraries, IEEE Computer Society, Washington, DC, pp. 37\u201048."},{"key":"key2022012120403459200_b8","doi-asserted-by":"crossref","unstructured":"Heath, B.P., McArthur, D.J., McClelland, M.K. and Vetter, R.J. (2005), \u201cMetadata lessons from the iLumina digital library\u201d, Communications of the ACM, Vol. 49 No. 7, pp. 68\u201074.","DOI":"10.1145\/1070838.1070839"},{"key":"key2022012120403459200_b9","doi-asserted-by":"crossref","unstructured":"Ioannidis, Y., Maier, D., Abiteboul, S., Buneman, P., Davidson, S., Fox, E., Halevy, A., Knoblock, C., Rabitti, F., Schek, H. and Weikum, G. (2005), \u201cDigital library information\u2010technology infrastructures\u201d, International Journal on Digital Libraries, Vol. 5 No. 4, pp. 266\u201074.","DOI":"10.1007\/s00799-004-0094-8"},{"key":"key2022012120403459200_b10","unstructured":"Ivanyukovich, A. and Marchese, M. (2006a), \u201cUnsupervised free\u2010text processing and structuring in digital archives\u201d, Proceedings of the 1st International Conference on Multidisciplinary Information Sciences and Technologies, InScit2006, Merida, Spain, 25\u201028 October, available at: www.science.unitn.it\/ \u223c\u2009marchese\/pdf\/inscit2006_full_paper.pdf."},{"key":"key2022012120403459200_b11","unstructured":"Ivanyukovich, A. and Marchese, M. (2006b), \u201cUnsupervised metadata extraction in scientific digital libraries using a\u2010priori domain\u2010specific knowledge\u201d, SWAP 2006 \u2013 Semantic Web Applications and Perspectives, Proceedings of the 3rd Italian Semantic Web Workshop, Pisa, Italy, 18\u201020 December, CEUR Workshop Proceedings, 201, available at: http:\/\/ceur\u2010ws.org\/Vol\u2010201\/19.pdf."},{"key":"key2022012120403459200_b12","doi-asserted-by":"crossref","unstructured":"Ivanyukovich, A., Marchese, M. and Reuther, P. (2007), \u201cAssessing quality dynamics in unsupervised metadata extraction for digital libraries\u201d, Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, Vol. 4675, Springer, Berlin, pp. 454\u20107.","DOI":"10.1007\/978-3-540-74851-9_41"},{"key":"key2022012120403459200_b13","unstructured":"Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L. and Mylopoulos, J. (2006), \u201cSemi\u2010automatic semantic annotations for next generation information systems\u201d, Advanced Information Systems Engineering: 18th International Conference, CAiSE 2006, Luxembourg, Luxembourg, Proceedings, Lecture Notes in Computer Science, Springer, Berlin, 5\u20109 June."},{"key":"key2022012120403459200_b14","doi-asserted-by":"crossref","unstructured":"Klink, S., Reuther, P., Weber, A., Walter, B. and Ley, M. (2006), \u201cAnalysing social networks within bibliographical data\u201d, Database and Expert Systems Applications, 17th International Conference, DEXA 2006, Krak\u00f3w, Poland, Proceedings, Lecture Notes in Computer Science, Vol. 4080, Springer, Berlin, 4\u20108 September, pp. 234\u201043.","DOI":"10.1007\/11827405_23"},{"key":"key2022012120403459200_b15","doi-asserted-by":"crossref","unstructured":"Kruk, S.R., Decker, S. and Zieborak, L. (2005), \u201cJeromeDL \u2013 adding semantic web technologies to digital libraries\u201d, Database and Expert Systems Applications, Lecture Notes in Computer Science, Vol. 3588, pp. 716\u201025.","DOI":"10.1007\/11546924_70"},{"key":"key2022012120403459200_b16","doi-asserted-by":"crossref","unstructured":"Lagoze, C., Krafft, D., Cornwell, T., Dushay, N., Eckstrom, D. and Saylor, J. (2006), \u201cMetadata aggregation and \u2018automated digital libraries\u2019: a retrospective on the NSDL experience\u201d, Proceedings of the 6th ACM\/IEEE\u2010CS Joint Conference on Digital Libraries, ACM, New York, NY, pp. 230\u20109.","DOI":"10.1145\/1141753.1141804"},{"key":"key2022012120403459200_b17","unstructured":"Ley, M. and Reuther, P. (2006), \u201cMaintaining an online bibliographical database: the problem of data quality\u201d, Extraction et Gestion des Connaissances (EGC'2006), Revue des Nouvelles Technologies de l'Information, Vol. RNTI\u2010E\u20106, pp. 5\u201010."},{"key":"key2022012120403459200_b18","doi-asserted-by":"crossref","unstructured":"Newman, M.E.J. (2001), \u201cScientific collaboration networks. I. Network construction and fundamental results\u201d, Physical Review E, Vol. 64, 016131, available at: www\u2010personal.umich.edu\/ \u223c\u2009mejn\/papers\/016131.pdf.","DOI":"10.1103\/PhysRevE.64.016131"},{"key":"key2022012120403459200_b19","doi-asserted-by":"crossref","unstructured":"Noyons, E.C.M., Moed, H.F. and Luwel, M. (1999), \u201cCombining mapping and citation analysis for evaluative bibliometric purposes: a bibliometric study\u201d, Journal of the American Society for Information Science, Vol. 50 No. 2, pp. 115\u201031.","DOI":"10.1002\/(SICI)1097-4571(1999)50:2<115::AID-ASI3>3.0.CO;2-J"},{"key":"key2022012120403459200_b20","unstructured":"Peshkin, L. and Pfeffer, A. (2003), \u201cBayesian information extraction network\u201d, Proceedings of the 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 9\u201015 August, available at: www.eecs.harvard.edu\/ \u223c\u2009pesha\/Public\/BIEN.pdf."},{"key":"key2022012120403459200_b21","doi-asserted-by":"crossref","unstructured":"Petinot, Y., Giles, C.L., Bhatnagar, V., Teregowda, P.B. and Han, H. (2004a), \u201cEnabling interoperability for autonomous digital libraries: an API to CiteSeer services\u201d, Proceedings of the 4th ACM\/IEEE\u2010CS Joint Conference on Digital Libraries, ACM, New York, NY, pp. 372\u20103.","DOI":"10.1145\/996350.996437"},{"key":"key2022012120403459200_b22","doi-asserted-by":"crossref","unstructured":"Petinot, Y., Giles, C.L., Bhatnagar, V., Teregowda, P.B., Han, H. and Councill, I. (2004b), \u201cCiteSeer\u2010API: towards seamless resource location and interlinking for digital libraries\u201d, Proceedings of the 13th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, pp. 553\u201061.","DOI":"10.1145\/1031171.1031275"},{"key":"key2022012120403459200_b23","doi-asserted-by":"crossref","unstructured":"Reuther, P. and Walter, B. (2006), \u201cSurvey on test collections and techniques for personal name matching\u201d, International Journal of Metadata, Semantics and Ontologies, Vol. 1 No. 2, pp. 89\u201099.","DOI":"10.1504\/IJMSO.2006.011006"},{"key":"key2022012120403459200_b24","doi-asserted-by":"crossref","unstructured":"Salton, G., Singhal, A., Mitra, M. and Buckley, C. (1997), \u201cAutomatic text structuring and summarization\u201d, Information Processing & Management, Vol. 33 No. 2, pp. 193\u2010207.","DOI":"10.1016\/S0306-4573(96)00062-3"},{"key":"key2022012120403459200_b25","doi-asserted-by":"crossref","unstructured":"Suleman, H. and Fox, E.A. (2002), \u201cDesigning protocols in support of digital library componentization\u201d, Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, Vol. 2458, Springer, London, pp. 568\u201082.","DOI":"10.1007\/3-540-45747-X_43"},{"key":"key2022012120403459200_b26","unstructured":"Tryfonopoulos, C., Idreos, S. and Koubarakis, M. (2005), \u201cLibraRing: an architecture for distributed digital libraries based on DHT\u201d, Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2005), Vienna, Austria, 18\u201023 September, available at: www.mpi\u2010inf.mpg.de\/ \u223c\u2009trifon\/papers\/pdf\/ecdl05\u2010TIK.pdf."},{"key":"key2022012120403459200_b27","doi-asserted-by":"crossref","unstructured":"van Raan, A. (1997), \u201cScientometrics: state\u2010of\u2010the\u2010art\u201d, Scientometrics, Vol. 38 No. 1, pp. 205\u201018.","DOI":"10.1007\/BF02461131"},{"key":"key2022012120403459200_b28","doi-asserted-by":"crossref","unstructured":"Yang, H., Callan, J. and Shulman, S. (2006), \u201cNext steps in near\u2010duplicate detection for eRulemaking\u201d, Proceedings of the 2006 National Conference on Digital Government Research, ACM International Conference Proceeding Series, Vol. 151, ACM, New York NY, pp. 239\u201048.","DOI":"10.1145\/1146598.1146663"}],"container-title":["Online Information Review"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.emeraldinsight.com\/doi\/full-xml\/10.1108\/14684520810897368","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/14684520810897368\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/14684520810897368\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T00:41:04Z","timestamp":1753404064000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/oir\/article\/32\/4\/488-499\/314992"}},"subtitle":[],"editor":[{"given":"A.R.D.","family":"Prasad","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2008,8,8]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2008,8,8]]}},"alternative-id":["10.1108\/14684520810897368"],"URL":"https:\/\/doi.org\/10.1108\/14684520810897368","relation":{},"ISSN":["1468-4527"],"issn-type":[{"type":"print","value":"1468-4527"}],"subject":[],"published":{"date-parts":[[2008,8,8]]}}}