{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T14:39:41Z","timestamp":1776695981163,"version":"3.51.2"},"reference-count":42,"publisher":"Proceedings of the National Academy of Sciences","issue":"33","content-domain":{"domain":["www.pnas.org"],"crossmark-restriction":true},"short-container-title":["Proc. Natl. Acad. Sci. U.S.A."],"published-print":{"date-parts":[[2005,8,16]]},"abstract":"<jats:p>We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The<jats:sc>adios<\/jats:sc>(automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.<\/jats:p>","DOI":"10.1073\/pnas.0409746102","type":"journal-article","created":{"date-parts":[[2005,8,9]],"date-time":"2005-08-09T00:33:51Z","timestamp":1123547631000},"page":"11629-11634","update-policy":"https:\/\/doi.org\/10.1073\/pnas.cm10313","source":"Crossref","is-referenced-by-count":168,"title":["Unsupervised learning of natural languages"],"prefix":"10.1073","volume":"102","author":[{"given":"Zach","family":"Solan","sequence":"first","affiliation":[{"name":"School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853"}]},{"given":"David","family":"Horn","sequence":"additional","affiliation":[{"name":"School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853"}]},{"given":"Eytan","family":"Ruppin","sequence":"additional","affiliation":[{"name":"School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853"}]},{"given":"Shimon","family":"Edelman","sequence":"additional","affiliation":[{"name":"School of Physics and Astronomy and School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Department of Psychology, Cornell University, Ithaca, NY 14853"}]}],"member":"341","published-online":{"date-parts":[[2005,8,8]]},"reference":[{"key":"e_1_3_2_1_2","first-page":"319","volume":"4","year":"2003","unstructured":"Phillips, C. (2003) in Encyclopedia of Cognitive Science, ed. Nadel, L. (Macmillan, London) Vol. 4, pp. 319\u2013329.","journal-title":"Encyclopedia of Cognitive Science"},{"key":"e_1_3_2_2_2","first-page":"140","volume":"10","year":"1954","unstructured":"Harris, Z. S. (1954) Word 10, 140\u2013162.","journal-title":"Word"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"Hemphill C. T. Godfrey J. J. & Doddington G. R. (1990) in Proceedings of a Workshop on Speech and Natural Language ed. Stern R. M. (Morgan Kaufmann San Francisco) pp. 96\u2013101.","DOI":"10.3115\/116580.116613"},{"key":"e_1_3_2_4_2","first-page":"271","volume":"12","year":"1985","unstructured":"MacWhinney, B. & Snow, C. (1985) J. Comput. Lingustics 12, 271\u2013296.","journal-title":"J. Comput. Lingustics"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1001798929185"},{"key":"e_1_3_2_6_2","first-page":"57","volume":"7","year":"2004","unstructured":"Adriaans, P. & van Zaanen, M. (2004) Grammars 7, 57\u201368.","journal-title":"Grammars"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.274.5294.1926"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.275.5306.1599"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.283.5398.77"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1072901"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1078094"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2000.0583"},{"key":"e_1_3_2_13_2","first-page":"1","volume":"138","year":"2003","unstructured":"Geman, S. & Johnson, M. (2003) in Mathematical Foundations of Speech and Language Processing, IMA Volumes in Mathematics and Its Applications, eds. Johnson, M., Khudanpur, S., Ostendorf, M. & Rosenfeld, R. (Springer, New York), Vol. 138, pp. 1\u201326.","journal-title":"Mathematical Foundations of Speech and Language Processing"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Stolcke A. & Omohundro S. (1994) in Grammatical Inference and Applications eds. Carrasco R. C. & Oncina J. (Springer New York) pp. 106\u2013118.","DOI":"10.1007\/3-540-58473-0_141"},{"key":"e_1_3_2_15_2","unstructured":"Guyon I. & Pereira F. (1995) in Proceedings of the Third International Conference on Document Analysis and Recogition ed. Shen C. Y. (IEEE Computer Society Montreal) pp. 454\u2013457."},{"key":"e_1_3_2_16_2","unstructured":"Gross M. (1997) in Finite-State Language Processing eds. Roche E. & Schab\u00e8s Y. (MIT Press Cambridge MA) pp. 329\u2013354."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/0885-2308(90)90022-X"},{"key":"e_1_3_2_18_2","unstructured":"van Zaanen M. (2000) in Proceedings of the 18th International Conference on Computational Linguistics ed. Kay M. (Saarbr\u00fccken Germany) pp. 961\u2013967."},{"key":"e_1_3_2_19_2","unstructured":"Hopcroft J. E. & Ullman J. D. (1979) Introduction to Automata Theory Languages and Computation (Addison\u2013Wesley Reading MA)."},{"key":"e_1_3_2_20_2","unstructured":"Klein D. & Manning C. D. (2004) in Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics eds. Daelemans W. & Walker M. (Barcelona) pp. 478\u2013485."},{"key":"e_1_3_2_21_2","first-page":"35","volume":"14","year":"2002","unstructured":"Klein, D. & Manning, C. D. (2002) in Advances in Neural Information Processing Systems, eds. Dietterich, T. G., Becker, S. & Ghahramani, Z. (MIT Press, Cambridge, MA), Vol. 14, pp. 35\u201342.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45790-9_24"},{"key":"e_1_3_2_23_2","unstructured":"Goodman J. T. (2001) A Bit of Progress in Language Modeling: Extended Version (Microsoft Research Seattle) Technical Report MSR-TR-2001-72."},{"key":"e_1_3_2_24_2","unstructured":"McCandless M. & Glass J. (1993) in Proceedings of EuroSpeech'93 ed. Fellbaum K. (Berlin) pp. 981\u2013984."},{"key":"e_1_3_2_25_2","first-page":"544a","volume":"1","year":"2001","unstructured":"Chelba, C. (2001) in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ed. Swindlehurst, L. (IEEE, Piscataway, NJ), Vol. 1, pp. 544a\u2013544d.","journal-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1162\/089120101750300526"},{"key":"e_1_3_2_27_2","unstructured":"Kermorvant C. de la Higuera C. & Dupont P. (2004) J. \u00c9lectronique d'Intelligence Artificielle 6."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkg600"},{"key":"e_1_3_2_29_2","unstructured":"Clark A. (2001) Ph.D. thesis (COGS Univ. of Sussex Sussex U.K.)."},{"key":"e_1_3_2_30_2","unstructured":"Wolff J. G. (1988) in Categories and Processes in Language Acquisition eds. Levy Y Schlesinger I. M. & Braine M. D. S. (Lawrence Erlbaum Hillsdale NJ) pp. 179\u2013215."},{"key":"e_1_3_2_31_2","unstructured":"Henrichsen P. J. (2002) in Proceedings of CoNLL-2002 eds. Roth D. & van den Bosch A. (Assoc. Computer Linguistics New Brunswick NJ) pp. 22\u201328."},{"key":"e_1_3_2_32_2","unstructured":"de Marcken C. G. (1996) Ph.D. thesis (MIT Cambridge MA)."},{"key":"e_1_3_2_33_2","unstructured":"Magerman D. M. & Marcus M. P. (1990) in Proceedings of the Eighth National Conference on Artificial Intelligence eds. Dietterich T. & Swartout W. (AAAI Press Menlo Park CA) pp. 984\u2013989."},{"key":"e_1_3_2_34_2","unstructured":"Chomsky N. (1986) Knowledge of Language: Its Nature Origin and Use (Praeger New York)."},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Elman J. L. Bates E. A. Johnson M. H. Karmiloff-Smith A. Parisi D. & Plunkett K. (1996) Rethinking Innateness: A Connectionist Perspective on Development (MIT Press Cambridge MA).","DOI":"10.7551\/mitpress\/5929.001.0001"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.291.5501.114"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1111\/1467-9280.00476"},{"key":"e_1_3_2_38_2","first-page":"73","volume":"11","year":"1985","unstructured":"Fillmore, C. J. (1985) Berkeley Linguistic Soc. 11, 73\u201386.","journal-title":"Berkeley Linguistic Soc."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(03)00080-9"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog2706_2"},{"key":"e_1_3_2_41_2","first-page":"16","volume":"78","year":"1991","unstructured":"Finch, S. & Chater, N. (1991) Artif. Intell. Simul. Behav. Q. 78, 16\u201324.","journal-title":"Artif. Intell. Simul. Behav. Q."},{"key":"e_1_3_2_42_2","unstructured":"Markman E. (1989) Categorization and Naming in Children (MIT Press Cambridge MA)."}],"container-title":["Proceedings of the National Academy of Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/pnas.org\/doi\/pdf\/10.1073\/pnas.0409746102","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T05:52:44Z","timestamp":1735883564000},"score":1,"resource":{"primary":{"URL":"https:\/\/pnas.org\/doi\/full\/10.1073\/pnas.0409746102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8,8]]},"references-count":42,"journal-issue":{"issue":"33","published-print":{"date-parts":[[2005,8,16]]}},"alternative-id":["10.1073\/pnas.0409746102"],"URL":"https:\/\/doi.org\/10.1073\/pnas.0409746102","relation":{},"ISSN":["0027-8424","1091-6490"],"issn-type":[{"value":"0027-8424","type":"print"},{"value":"1091-6490","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,8,8]]},"assertion":[{"value":"2004-12-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2005-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}