{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T13:15:53Z","timestamp":1768482953182,"version":"3.49.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,4,16]],"date-time":"2018-04-16T00:00:00Z","timestamp":1523836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006754","name":"Army Research Laboratory","doi-asserted-by":"crossref","award":["W911NF-17-1-0021"],"award-info":[{"award-number":["W911NF-17-1-0021"]}],"id":[{"id":"10.13039\/100006754","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center","award":["D12PC000337"],"award-info":[{"award-number":["D12PC000337"]}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation via","doi-asserted-by":"crossref","award":["DGE-1545362 and IIS-1633363"],"award-info":[{"award-number":["DGE-1545362 and IIS-1633363"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2018,8,31]]},"abstract":"<jats:p>Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application.<\/jats:p>","DOI":"10.1145\/3182383","type":"journal-article","created":{"date-parts":[[2018,4,18]],"date-time":"2018-04-18T17:21:50Z","timestamp":1524072110000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Generating Realistic Synthetic Population Datasets"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4392-1307","authenticated-orcid":false,"given":"Hao","family":"Wu","sequence":"first","affiliation":[{"name":"Virginia Tech"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yue","family":"Ning","sequence":"additional","affiliation":[{"name":"Virginia Tech, Arlington, VA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prithwish","family":"Chakraborty","sequence":"additional","affiliation":[{"name":"Virginia Tech"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jilles","family":"Vreeken","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics and Saarland University, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikolaj","family":"Tatti","sequence":"additional","affiliation":[{"name":"Aalto University, Aalto, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Naren","family":"Ramakrishnan","sequence":"additional","affiliation":[{"name":"Virginia Tech, Arlington, VA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,4,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989395"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the WSC. 1003--1014","author":"Barrett C. L."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1287\/trsc.1120.0408"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the VLDB\u201905","author":"Bruno Nicolas","year":"2005"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1214\/aop\/1176996454"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177692379"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020497"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-010-0209-3"},{"key":"e_1_2_1_9_1","volume-title":"Data Source: Google Flu Trends.","author":"Google Inc.","year":"2014"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/191839.191886"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the VLDB\u201906","author":"Houkj\u00e6r Kenneth","year":"2006"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ITSC. 2023--2028","author":"Hu W."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03793-1_6"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40991-2_17"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1111\/mice.12085"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020499"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the VLDB\u201905","author":"Markl V."},{"key":"e_1_2_1_18_1","volume-title":"Axhausen","author":"Mueller Kirill","year":"2011"},{"key":"e_1_2_1_19_1","volume-title":"Generating a dynamic synthetic population--Using an age-structured two-sex model for household dynamics. PLoS One 9, 4","author":"Namazi-Rad Mohammad-Reza","year":"2014"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICHI.2013.76"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807085.1807095"},{"key":"e_1_2_1_23_1","volume-title":"Slim: Directly mining descriptive patterns. In Proceedings of the SDM\u201912","author":"Smets Koen","year":"2012"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-013-0319-9"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.84"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.trc.2015.10.010"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2015.2506182"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.1096"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.70"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1146068.1711152"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.39"},{"key":"e_1_2_1_32_1","volume-title":"Proceeding of the ECMLPKDD\u201911","author":"Tatti Nikolaj"},{"key":"e_1_2_1_33_1","unstructured":"United States Census Bureau. 2012. American Community Survey. Retrieved from http:\/\/www.census.gov\/acs\/www\/.  United States Census Bureau. 2012. American Community Survey. Retrieved from http:\/\/www.census.gov\/acs\/www\/."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-010-0202-x"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-014-0370-1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2014.2307862"},{"key":"e_1_2_1_37_1","first-page":"1","article-title":"Deep multimodal distance metric learning using click constraints for image ranking","volume":"99","author":"Yu J.","year":"2017","journal-title":"IEEE Transactions on Cybernetics PP"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3141\/2429-18"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3182383","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3182383","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3182383","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:41:20Z","timestamp":1750282880000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3182383"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,16]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,8,31]]}},"alternative-id":["10.1145\/3182383"],"URL":"https:\/\/doi.org\/10.1145\/3182383","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,4,16]]},"assertion":[{"value":"2017-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-04-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}