{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T15:57:46Z","timestamp":1717430266526},"reference-count":5,"publisher":"World Scientific Pub Co Pte Lt","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Semantic Computing"],"published-print":{"date-parts":[[2018,6]]},"abstract":"<jats:p> Unigram is a fundamental element of [Formula: see text]-gram in natural language processing. However, unigrams collected from a natural language corpus are unsuitable for solving problems in the domain of computer programming languages. In this paper, we analyze the properties of unigrams collected from an ultra-large source code repository. Specifically, we have collected 1.01 billion unigrams from 0.7 million open source projects hosted at GitHub.com. By analyzing these unigrams, we have discovered statistical properties regarding (1) how developers name variables, methods, and classes, and (2) how developers choose abbreviations. We describe a probabilistic model which relies on these properties for solving a well-known problem in source code analysis: how to expand a given abbreviation to its original indented word. Our empirical study shows that using the unigrams extracted from source code repository outperforms the using of the natural language corpus by 21% when solving the domain specific problems. <\/jats:p>","DOI":"10.1142\/s1793351x18400123","type":"journal-article","created":{"date-parts":[[2018,7,5]],"date-time":"2018-07-05T06:40:56Z","timestamp":1530772856000},"page":"237-260","source":"Crossref","is-referenced-by-count":1,"title":["Statistical Unigram Analysis for Source Code Repository"],"prefix":"10.1142","volume":"12","author":[{"given":"Weifeng","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Computer Science, Bowie State University, Bowie, Maryland, USA"}]},{"given":"Dianxiang","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Boise State University, Boise, Idaho, USA"}]},{"given":"Abdulrahman","family":"Alatawi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bowie State University, Bowie, Maryland, USA"}]},{"given":"Omar","family":"El Ariss","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Texas A&amp;M University, Commerce, TX, 75428, USA"}]},{"given":"Yunkai","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer &amp; Information Science, Gannon University, Erie, Pennsylvania, USA"}]}],"member":"219","published-online":{"date-parts":[[2018,7,4]]},"reference":[{"key":"S1793351X18400123BIB002","volume-title":"From Discourse to Logic: Introduction to Model Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory","volume":"42","author":"Kamp H.","year":"2013"},{"key":"S1793351X18400123BIB003","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(59)90362-6"},{"key":"S1793351X18400123BIB004","volume-title":"Programming Language Pragmatics","year":"2009","edition":"3"},{"key":"S1793351X18400123BIB016","volume-title":"Beautiful Data: The Stories Behind Elegant Data Solutions","author":"Segaran T.","year":"2009"},{"key":"S1793351X18400123BIB019","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2009.70"}],"container-title":["International Journal of Semantic Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1793351X18400123","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T21:19:26Z","timestamp":1565126366000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S1793351X18400123"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6]]},"references-count":5,"journal-issue":{"issue":"02","published-online":{"date-parts":[[2018,7,4]]},"published-print":{"date-parts":[[2018,6]]}},"alternative-id":["10.1142\/S1793351X18400123"],"URL":"https:\/\/doi.org\/10.1142\/s1793351x18400123","relation":{},"ISSN":["1793-351X","1793-7108"],"issn-type":[{"value":"1793-351X","type":"print"},{"value":"1793-7108","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,6]]}}}