{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T13:25:45Z","timestamp":1760707545884,"version":"3.32.0"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"1-3","license":[{"start":{"date-parts":[[2005,6,2]],"date-time":"2005-06-02T00:00:00Z","timestamp":1117670400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2005,9]]},"DOI":"10.1007\/s10994-005-0916-y","type":"journal-article","created":{"date-parts":[[2005,6,10]],"date-time":"2005-06-10T14:37:33Z","timestamp":1118414253000},"page":"195-227","source":"Crossref","is-referenced-by-count":23,"title":["A Neural Syntactic Language Model"],"prefix":"10.1007","volume":"60","author":[{"given":"Ahmad","family":"Emami","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frederick","family":"Jelinek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2005,6,2]]},"reference":[{"key":"916_CR1","doi-asserted-by":"crossref","unstructured":"Bellegarda, J. R. (1997). A latent semantic analysis framework for large\u2013span language modeling. In Proceedings of the 5th European Conference on Speech Communication and Technology (pp. 1451&1454). Vol. 3. Rhodes, Greece.","DOI":"10.21437\/Eurospeech.1997-421"},{"key":"916_CR2","first-page":"933","volume":"13","author":"Y. Bengio","year":"2001","unstructured":"Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 13, 933\u2013938.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"916_CR3","first-page":"1137","volume":"3","author":"Y. Bengio","year":"2003","unstructured":"Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neuralprobabilistic language model. Journal of Machine Learning Reseach, 3, 1137\u20131155.","journal-title":"Journal of Machine Learning Reseach"},{"issue":"1","key":"916_CR4","first-page":"39","volume":"22","author":"A. L. Berger","year":"1996","unstructured":"Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropyapproach to natural language processing. Computational Linguistics, 22:1, 39\u201372.","journal-title":"Computational Linguistics"},{"key":"916_CR5","unstructured":"Bridle, J. S. (1989). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical patternrecognition. In F. Fougelman-Soulie and J. Herault (Eds.), Neuro-computing: Algorithms, architectures and applicatations (pp. 227&236)."},{"key":"916_CR6","unstructured":"Byrne, W., Gunawardana, A., & Khudanpur, S. (1998). Information geometry and EMvariants. Technical Report CLSP Research Note (17). Department of Electrical andComputer Engineering, The Johns Hopkins University, Baltimore, MD."},{"key":"916_CR7","doi-asserted-by":"crossref","unstructured":"Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting and 10th Conference of the European Chapter of ACL (pp. 116\u2013123). Toulouse, France.","DOI":"10.3115\/1073012.1073029"},{"key":"916_CR8","unstructured":"Chelba, C. (1997). A structured language model. In ACL-EACL, Student Section (pp. 498&500). Madrid, Spain."},{"issue":"4","key":"916_CR9","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1006\/csla.2000.0147","volume":"14","author":"C., Chelba","year":"2000","unstructured":"Chelba, C., & Jelinek, F. (2000). Structured language modeling. Computer Speech and Language, 14:4, 283\u2013332.","journal-title":"Computer Speech and Language"},{"key":"916_CR10","doi-asserted-by":"crossref","unstructured":"Chelba, C., & Xu, P. (2001). Richer syntactic dependencies for structuredlanguage modeling. In Proceedings of the Automatic Speech Recognition and Understanding Workshop. Madonna di Campiglio, Trento-Italy.","DOI":"10.1109\/ASRU.2001.1034623"},{"key":"916_CR11","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","volume":"13","author":"S. F. Chen","year":"1999","unstructured":"Chen, S. F. & Goodman, J. (1999). An empirical study of smoothing techniquesfor language modeling. Computer Speech and Language, 13, 359\u2013394.","journal-title":"Computer Speech and Language"},{"key":"916_CR12","doi-asserted-by":"crossref","unstructured":"Collins, M. (1996). A new statistical parser based on bigram lexicaldependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (pp. 184&191). Santa Cruz, CA.","DOI":"10.3115\/981863.981888"},{"issue":"6","key":"916_CR13","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","volume":"41","author":"S. C. Deerwester","year":"1990","unstructured":"Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41:6, 391\u2013407.","journal-title":"Journal of the American Society of Information Science"},{"key":"916_CR14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"A. P. Dempster","year":"1977","unstructured":"Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood fromincomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1\u201338.","journal-title":"Journal of the Royal Statistical Society"},{"key":"916_CR15","first-page":"195","volume":"7","author":"J. L. Elman","year":"1991","unstructured":"Elman, J. L. (1991). Distributed representations, simple recurrent networks,and grammatical structure. Machine Learning, 7, 195\u2013225.","journal-title":"Machine Learning"},{"key":"916_CR16","doi-asserted-by":"crossref","unstructured":"Emami, A. (2003). Improving a connectionist based syntactical language model. In Proceedings of the 8th European Conference on Speech Communication and Technology (pp. 413\u2013416), Vol. 1. Geneva, Switzerland.","DOI":"10.21437\/Eurospeech.2003-158"},{"key":"916_CR17","unstructured":"Emami, A., & Jelinek, F. (2004). Exact training of a neural syntactic languagemodel. In Proceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing. Montreal,Quebec."},{"key":"916_CR18","unstructured":"Emami, A., Xu, P., & Jelinek, F. (2003). Using a connectionist model in asyntactical based language model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 372\u2013375). Vol. I. Hong Kong."},{"key":"916_CR19","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/0010-0277(88)90031-5","volume":"28","author":"J. A. Fodor","year":"1988","unstructured":"Fodor, J. A. & Pylyshyn, Z.W. (1988). Connectionism and cognitive structure: A critical analysis. Cognition, 28, 3\u201371.","journal-title":"Cognition"},{"key":"916_CR20","doi-asserted-by":"crossref","unstructured":"Goodman, J. (2001). A bit of progress in language modeling. Technical Report MSR-TR-2001-72, Microsoft Research, Redmond, WA.","DOI":"10.1006\/csla.2001.0174"},{"key":"916_CR21","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/7056.001.0001","volume-title":"Using MPI: Portable parallelProgramming with themessage-passing interface","author":"W. Gropp","year":"1999","unstructured":"Gropp,W., Lusk, E., & Skjellum, A. (1999). Using MPI: Portable parallelProgramming with themessage-passing interface. Cambridge: MA: MIT Press."},{"key":"916_CR22","unstructured":"Henderson, J. (2000). A neural network parser that handles sparse data. In Proceedings of 6th International Workshop on Parsing Technologies (pp. 123\u2013134). Trento, Italy."},{"key":"916_CR23","unstructured":"Henderson, J. (2003). Inducing history representations for broad coveragestatistical parsing. In Proceedings of the North American Chapter of Association Computational Linguistics and Human Language Technology Conference HLT-NAACL."},{"key":"916_CR24","first-page":"46","volume-title":"Parallel distributed processing:Implications for psychology and Neurobiology","author":"G. E. Hinton","year":"1986","unstructured":"Hinton, G. E. (1986). Learning distributed representations of concepts. In R. G. M. Morris (Ed.), Parallel distributed processing:Implications for psychology and Neurobiology (pp. 46\u201361). Oxford, UK: Oxford University Press."},{"issue":"8","key":"916_CR25","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1162\/089976699300016061","volume":"11","author":"E. Ho","year":"1999","unstructured":"Ho, E. & Chan, L. (1999). How to design a connectionist holistic parser. Neural Computation, 11:8, 1995\u20132016.","journal-title":"Neural Computation"},{"key":"916_CR26","volume-title":"Statistical methods for speech recognition","author":"F. Jelinek","year":"1998","unstructured":"Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge, MA and London: MIT Press."},{"key":"916_CR27","unstructured":"Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov sourceparameters from sparse data. In Proceedings of Workshop on Pattern Recognition in Practice (pp. 381\u2013397). Amsterdam, The Netherlands: North Holland Publishing Co."},{"key":"916_CR28","unstructured":"Kim, W., Khudanpur, S., & Wu, J. (2001). Smoothing issues in the structuredlanguage model. In Proceedings of the 7th European Conference on Speech Communication and Technology (pp. 717\u2013720). Alborg, Denmark."},{"key":"916_CR29","unstructured":"Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram languagemodeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 181&184), Vol. I."},{"key":"916_CR30","doi-asserted-by":"crossref","unstructured":"Lawrence, S., Giles, C. L., & Fong, S. (1996). Can recurrent neural networkslearn natural language grammars?. In Proceedings of the IEEE International Conference on Neural Networks (pp. 1853&1858). Piscataway, NJ: IEEE Press.","DOI":"10.1109\/ICNN.1996.549183"},{"issue":"3","key":"916_CR31","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1145\/355841.355847","volume":"5","author":"C. L. Lawson","year":"1979","unstructured":"Lawson, C. L., Hanson, R. J., Kincaid, D. R., & Krogh, F. T. (1979). Basiclinear algebra subprograms for fortran usage. ACM Transactions on Mathematical Software, 5:3, 308\u2013323.","journal-title":"ACM Transactions on Mathematical Software"},{"key":"916_CR32","unstructured":"LeCun, Y. (1985). A learning scheme for asymmetric threshold networks. In Proceedings of Cognitiva 85 (pp. 599\u2013604). Paris, France."},{"key":"916_CR33","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1207\/s15516709cog1503_2","volume":"15","author":"R. Miikkulainen","year":"1991","unstructured":"Miikkulainen, R. & Dyer, M. G. (1991). Natural language processing withmodular neural networks and distributed lexicon. Cognitive Science, 15, 343\u2013399.","journal-title":"Cognitive Science"},{"key":"916_CR34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/csla.1994.1001","volume":"8","author":"H. Ney","year":"1994","unstructured":"Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilisticdependencies in stochastic language modeling.. Computer Speech and Language, 8, 1\u201338.","journal-title":"Computer Speech and Language"},{"key":"916_CR35","doi-asserted-by":"crossref","unstructured":"Paul, D. B., & Baker, J. M. (1992). The design for the wall street journal-based CSR corpus. In Proceedings of the DARPA SLS Workshop.","DOI":"10.3115\/1075527.1075614"},{"key":"916_CR36","unstructured":"Ratnaparkhi, A. (1997). A linear observed time statistical parser based onmaximum entropy models. In Second Conference on Empirical Methods in Natural Language Processing (pp. 1\u201310). Providence, RI."},{"key":"916_CR37","unstructured":"Roark, B. (2001). Robust probabilistic predictive syntactic processing: Motivations, models and applications. Ph.D. thesis, Brown University, Providence, RI."},{"key":"916_CR38","volume-title":"Paralleldistributed processing, I","author":"D. E. Rumelhart","year":"1986","unstructured":"Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Leaning internalrepresentations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Paralleldistributed processing, I. Cambridge, MA: MIT Press."},{"key":"916_CR39","doi-asserted-by":"crossref","unstructured":"Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for largevocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, (pp. 765\u2013768). Vol. II. Orlando, FL.","DOI":"10.1109\/ICASSP.2002.1005852"},{"key":"916_CR40","unstructured":"Van Uystel, D. H., Van Compernolle, D., & Wambacq, P. (2001). Maximum-likelihood training of the PLCG-based language model. In Proceedings of the Automatic Speech Recognition andUnderstanding Workshop. Madonna di Campiglio, Trento-Italy."},{"key":"916_CR41","unstructured":"Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysisin the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA."},{"key":"916_CR42","unstructured":"Xu, P., Chelba, C., & Jelinek, F. (2002). A study on richer syntacticdependencies for structured language modeling. In Proceedings of the 40th Annual Meeting of the Associationfor Computational Linguistics. Philadelphia, PA."},{"key":"916_CR43","unstructured":"Xu, P., Emami, A., & Jelinek, F. (2003). Training connectionist models for thestructured language model. In M. Collins, & M. Steedman (Eds.), Proceedings of the 2003conference on empirical methods in natural language processing. Sapporo, Japan: (pp. 160\u2013167). Association for Computational Linguistics."},{"key":"916_CR44","unstructured":"Xu, W., & Rudnicky, A. (2000). Can artificial neural networks learn languagemodels? In Proceedings of 6th International Conference on Spoken Language Processing. Beijing, China."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-005-0916-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-005-0916-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-005-0916-y","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T15:14:30Z","timestamp":1735744470000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-005-0916-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,6,2]]},"references-count":44,"journal-issue":{"issue":"1-3","published-print":{"date-parts":[[2005,9]]}},"alternative-id":["916"],"URL":"https:\/\/doi.org\/10.1007\/s10994-005-0916-y","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2005,6,2]]}}}