{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:25:30Z","timestamp":1750307130978,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2011,9,19]],"date-time":"2011-09-19T00:00:00Z","timestamp":1316390400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2011,9,19]]},"DOI":"10.1145\/2034691.2034720","type":"proceedings-article","created":{"date-parts":[[2011,9,20]],"date-time":"2011-09-20T13:50:16Z","timestamp":1316526616000},"page":"121-128","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["An efficient language-independent method to extract content from news webpages"],"prefix":"10.1145","author":[{"given":"Eduardo","family":"Cardoso","sequence":"first","affiliation":[{"name":"PUC-Rio, Rio de Janeiro, Brazil"}]},{"given":"Iam","family":"Jabour","sequence":"additional","affiliation":[{"name":"PUC-Rio, Rio de Janeiro, Brazil"}]},{"given":"Eduardo","family":"Laber","sequence":"additional","affiliation":[{"name":"PUC-Rio, Rio de Janeiro, Brazil"}]},{"given":"Rog\u00e9rio","family":"Rodrigues","sequence":"additional","affiliation":[{"name":"Microsoft Corporation, Rio de Janeiro, Brazil"}]},{"given":"Pedro","family":"Cardoso","sequence":"additional","affiliation":[{"name":"PUC-Rio, Rio de Janeiro, Brazil"}]}],"member":"320","published-online":{"date-parts":[[2011,9,19]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"w3","article-title":"Cascading style sheets level 2 revision 1 (CSS 2.1) specification. Candidate recommendation","author":"\u00c7elik T.","year":"2009","unstructured":"\u00c7elik , T. , Bos , B. , Hickson , I. , and Lie , H. W . Cascading style sheets level 2 revision 1 (CSS 2.1) specification. Candidate recommendation , W3C , Sept. 2009 . http:\/\/www. w3 .org\/TR\/2009\/CR-CSS2-20090908. \u00c7elik, T., Bos, B., Hickson, I., and Lie, H. W. Cascading style sheets level 2 revision 1 (CSS 2.1) specification. Candidate recommendation, W3C, Sept. 2009. http:\/\/www.w3.org\/TR\/2009\/CR-CSS2-20090908.","journal-title":"W3C"},{"key":"e_1_3_2_1_2_1","volume-title":"http:\/\/cleaneval.sigwac.org.uk. {Online","author":"Cleaneval","year":"2011","unstructured":"Cleaneval home page. http:\/\/cleaneval.sigwac.org.uk. {Online ; accessed 18- January - 2011 }. Cleaneval home page. http:\/\/cleaneval.sigwac.org.uk. {Online; accessed 18-January-2011}."},{"key":"e_1_3_2_1_3_1","volume-title":"https:\/\/developer.mozilla.org\/en\/DOM\/CSS. {Online","author":"DOM CSS Properties -- MDC Doc Center","year":"2011","unstructured":"DOM CSS Properties -- MDC Doc Center . https:\/\/developer.mozilla.org\/en\/DOM\/CSS. {Online ; accessed 13- April - 2011 }. DOM CSS Properties -- MDC Doc Center. https:\/\/developer.mozilla.org\/en\/DOM\/CSS. {Online; accessed 13-April-2011}."},{"key":"e_1_3_2_1_4_1","volume-title":"https:\/\/developer.mozilla.org\/en\/Gecko. {Online","author":"Gecko","year":"2010","unstructured":"Gecko . https:\/\/developer.mozilla.org\/en\/Gecko. {Online ; accessed 01- September - 2010 }. Gecko. https:\/\/developer.mozilla.org\/en\/Gecko. {Online; accessed 01-September-2010}."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076079"},{"key":"e_1_3_2_1_7_1","volume-title":"W3C","author":"Jacobs I.","year":"1999","unstructured":"Jacobs , I. , Raggett , D. , and Hors , A. L . HTML 4.01 specification. W3C recommendation , W3C , Dec. 1999 . http:\/\/www.w3.org\/TR\/1999\/REC-html401-19991224. Jacobs, I., Raggett, D., and Hors, A. L. HTML 4.01 specification. W3C recommendation, W3C, Dec. 1999. http:\/\/www.w3.org\/TR\/1999\/REC-html401-19991224."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646204"},{"key":"e_1_3_2_1_9_1","first-page":"4","article-title":"Binary codes with correction of deletions, insertions and substitution of symbols","volume":"163","author":"Levenshtein V. I","year":"1965","unstructured":"Levenshtein , V. I . Binary codes with correction of deletions, insertions and substitution of symbols . Doklady Akademii Nauk SSSR 163 , 4 ( 1965 ), 845--848. Levenshtein, V. I. Binary codes with correction of deletions, insertions and substitution of symbols. Doklady Akademii Nauk SSSR 163, 4 (1965), 845--848.","journal-title":"Doklady Akademii Nauk SSSR"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1600193.1600208"},{"key":"e_1_3_2_1_11_1","first-page":"1","article-title":"Web page cleaning with conditional random fields","volume":"5","author":"Marek M.","year":"2007","unstructured":"Marek , M. , Pecina , P. , and Spousta , M . Web page cleaning with conditional random fields . Cahiers du Cental 5 ( 2007 ), 1 . Marek, M., Pecina, P., and Spousta, M. Web page cleaning with conditional random fields. Cahiers du Cental 5 (2007), 1.","journal-title":"Cahiers du Cental"},{"key":"e_1_3_2_1_12_1","volume-title":"http:\/\/msdn.microsoft.com\/en-us\/library\/bb508516(v=VS.85).aspx. {Online","author":"MSHTML.","year":"2010","unstructured":"MSHTML. http:\/\/msdn.microsoft.com\/en-us\/library\/bb508516(v=VS.85).aspx. {Online ; accessed 01- September - 2010 }. MSHTML. http:\/\/msdn.microsoft.com\/en-us\/library\/bb508516(v=VS.85).aspx. {Online; accessed 01-September-2010}."},{"key":"e_1_3_2_1_13_1","first-page":"w3","article-title":"Document object model (DOM) level 3 core specification. W3C recommendation","author":"Nicol G.","year":"2004","unstructured":"Nicol , G. , Champion , M. , H\u00e9garet , P. L. , Robie , J. , Wood , L. , Hors , A. L. , and Byrne , S . Document object model (DOM) level 3 core specification. W3C recommendation , W3C , Apr. 2004 . http:\/\/www. w3 .org\/TR\/2004\/REC-DOM-Level-3-Core-20040407. Nicol, G., Champion, M., H\u00e9garet, P. L., Robie, J., Wood, L., Hors, A. L., and Byrne, S. Document object model (DOM) level 3 core specification. W3C recommendation, W3C, Apr. 2004. http:\/\/www.w3.org\/TR\/2004\/REC-DOM-Level-3-Core-20040407.","journal-title":"W3C"},{"key":"e_1_3_2_1_14_1","volume-title":"http:\/\/dev.opera.com\/articles\/view\/presto-2--1-web-standards-supported-by\/. {Online","author":"Opera Presto","year":"2011","unstructured":"Opera Presto 2.1 -- Web standards supported by Opera's core -- Dev.Opera. http:\/\/dev.opera.com\/articles\/view\/presto-2--1-web-standards-supported-by\/. {Online ; accessed 13- April - 2011 }. Opera Presto 2.1 -- Web standards supported by Opera's core -- Dev.Opera. http:\/\/dev.opera.com\/articles\/view\/presto-2--1-web-standards-supported-by\/. {Online; accessed 13-April-2011}."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526840"},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of RIAO 2010, 9th Int. Conf. on Adaptivity, Personalization and Fusion of Heterogeneous Information","author":"Spengler A.","year":"2010","unstructured":"Spengler , A. , Bordes , A. , and Gallinari , P . A comparison of discriminative classifiers for web news content extraction . In Proceedings of RIAO 2010, 9th Int. Conf. on Adaptivity, Personalization and Fusion of Heterogeneous Information ( 2010 ). Spengler, A., Bordes, A., and Gallinari, P. A comparison of discriminative classifiers for web news content extraction. In Proceedings of RIAO 2010, 9th Int. Conf. on Adaptivity, Personalization and Fusion of Heterogeneous Information (2010)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/WAINA.2009.97"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1860559.1860590"},{"key":"e_1_3_2_1_19_1","volume-title":"Introduction to Data Mining","author":"Tan P.-N.","year":"2005","unstructured":"Tan , P.-N. , Steinbach , M. , and Kumar , V . Introduction to Data Mining . Addison-Wesley , 2005 . Tan, P.-N., Steinbach, M., and Kumar, V. Introduction to Data Mining. Addison-Wesley, 2005."},{"key":"e_1_3_2_1_20_1","volume-title":"http:\/\/www.webkit.org\/projects\/css\/index.html. {Online","author":"The WebKit Open","year":"2011","unstructured":"The WebKit Open Source Project -- CSS (Cascading Style Sheets). http:\/\/www.webkit.org\/projects\/css\/index.html. {Online ; accessed 13- April - 2011 }. The WebKit Open Source Project -- CSS (Cascading Style Sheets). http:\/\/www.webkit.org\/projects\/css\/index.html. {Online; accessed 13-April-2011}."},{"key":"e_1_3_2_1_21_1","volume-title":"W3C","author":"van Kesteren A.","year":"2009","unstructured":"van Kesteren , A. HTML 5 differences from HTML 4. W3C working draft , W3C , Aug. 2009 . http:\/\/www.w3.org\/TR\/2009\/WD-html5-diff-20090825\/. van Kesteren, A. HTML 5 differences from HTML 4. W3C working draft, W3C, Aug. 2009. http:\/\/www.w3.org\/TR\/2009\/WD-html5-diff-20090825\/."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183654"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557163"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526868"},{"key":"e_1_3_2_1_25_1","volume-title":"http:\/\/webkit.org\/. {Online","author":"WebKit","year":"2010","unstructured":"WebKit . http:\/\/webkit.org\/. {Online ; accessed 01- September - 2010 }. WebKit. http:\/\/webkit.org\/. {Online; accessed 01-September-2010}."},{"key":"e_1_3_2_1_26_1","volume-title":"Comparison of layout engines (cascading style sheets) -- Wikipedia, the free encyclopedia","author":"Wikipedia","year":"2010","unstructured":"Wikipedia . Comparison of layout engines (cascading style sheets) -- Wikipedia, the free encyclopedia , 2010 . {Online; accessed 22-September-2010}. Wikipedia. Comparison of layout engines (cascading style sheets) -- Wikipedia, the free encyclopedia, 2010. {Online; accessed 22-September-2010}."},{"key":"e_1_3_2_1_27_1","volume-title":"http:\/\/www.w3c.org\/. {Online","author":"World Wide Web Consortium (W3C).","year":"2010","unstructured":"World Wide Web Consortium (W3C). http:\/\/www.w3c.org\/. {Online ; accessed 14- September - 2010 }. World Wide Web Consortium (W3C). http:\/\/www.w3c.org\/. {Online; accessed 14-September-2010}."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2006.11.007"}],"event":{"name":"DocEng '11: ACM Symposium on Document Engineering","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGDOC ACM Special Interest Group for Design of Communications"],"location":"Mountain View California USA","acronym":"DocEng '11"},"container-title":["Proceedings of the 11th ACM symposium on Document engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2034691.2034720","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2034691.2034720","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:48:38Z","timestamp":1750240118000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2034691.2034720"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,9,19]]},"references-count":28,"alternative-id":["10.1145\/2034691.2034720","10.1145\/2034691"],"URL":"https:\/\/doi.org\/10.1145\/2034691.2034720","relation":{},"subject":[],"published":{"date-parts":[[2011,9,19]]},"assertion":[{"value":"2011-09-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}