{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T00:24:07Z","timestamp":1779236647556,"version":"3.51.4"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,6]]},"abstract":"<jats:p>The growing popularity of the JSON format has fueled increased interest in loading and processing JSON data within analytical data processing systems. However, in many applications, JSON parsing dominates performance and cost. In this paper, we present a new JSON parser called Mison that is particularly tailored to this class of applications, by pushing down both projection and filter operators of analytical queries into the parser. To achieve these features, we propose to deviate from the traditional approach of building parsers using finite state machines (FSMs). Instead, we follow a two-level approach that enables the parser to jump directly to the correct position of a queried field without having to perform expensive tokenizing steps to find the field. At the upper level, Mison speculatively predicts the logical locations of queried fields based on previously seen patterns in a dataset. At the lower level, Mison builds structural indices on JSON data to map logical locations to physical locations. Unlike all existing FSM-based parsers, building structural indices converts control flow into data flow, thereby largely eliminating inherently unpredictable branches in the program and exploiting the parallelism available in modern processors. We experimentally evaluate Mison using representative real-world JSON datasets and the TPC-H benchmark, and show that Mison produces significant performance benefits over the best existing JSON parsers; in some cases, the performance improvement is over one order of magnitude.<\/jats:p>","DOI":"10.14778\/3115404.3115416","type":"journal-article","created":{"date-parts":[[2017,9,7]],"date-time":"2017-09-07T13:35:53Z","timestamp":1504791353000},"page":"1118-1129","source":"Crossref","is-referenced-by-count":67,"title":["Mison"],"prefix":"10.14778","volume":"10","author":[{"given":"Yinan","family":"Li","sequence":"first","affiliation":[{"name":"Microsoft Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikos R.","family":"Katsipoulakis","sequence":"additional","affiliation":[{"name":"University of Pittsburgh and Microsoft Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Badrish","family":"Chandramouli","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Goldstein","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Donald","family":"Kossmann","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache Avro. https:\/\/avro.apache.org\/.  Apache Avro. https:\/\/avro.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"Apache Drill. https:\/\/drill.apache.org\/.  Apache Drill. https:\/\/drill.apache.org\/."},{"key":"e_1_2_1_3_1","unstructured":"Apache Parquet. https:\/\/parquet.apache.org\/.  Apache Parquet. https:\/\/parquet.apache.org\/."},{"key":"e_1_2_1_4_1","unstructured":"Google Gson. https:\/\/github.com\/google\/gson.  Google Gson. https:\/\/github.com\/google\/gson."},{"key":"e_1_2_1_5_1","unstructured":"Jackson. https:\/\/github.com\/FasterXML\/jackson.  Jackson. https:\/\/github.com\/FasterXML\/jackson."},{"key":"e_1_2_1_6_1","unstructured":"RapidJSON. http:\/\/rapidjson.org\/.  RapidJSON. http:\/\/rapidjson.org\/."},{"key":"e_1_2_1_7_1","unstructured":"The JSON Data Interchange Format. Standard ECMA-404 Oct. 2013.  The JSON Data Interchange Format. Standard ECMA-404 Oct. 2013."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452377"},{"key":"e_1_2_1_9_1","volume-title":"Inc.","author":"Aho A. V.","year":"2006"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213864"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402761"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"E. T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159 Mar. 2014.  E. T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 7159 Mar. 2014.","DOI":"10.17487\/rfc7159"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/2033408.2033411"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735503"},{"key":"e_1_2_1_16_1","volume-title":"WebDB","author":"Chasseur C.","year":"2013"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2593673"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882924"},{"key":"e_1_2_1_19_1","volume-title":"MSR. IEEE Press","author":"Gousios G.","year":"2013"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1042046.1042051"},{"key":"e_1_2_1_21_1","volume-title":"CIDR","author":"Idreos S.","year":"2011"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453925"},{"key":"e_1_2_1_23_1","unstructured":"D. E. Knuth. The Art of Computer Programming Volume 4 Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams. Addison-Wesley Professional 12th edition 2009.   D. E. Knuth. The Art of Computer Programming Volume 4 Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams. Addison-Wesley Professional 12th edition 2009."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497471"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465322"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595628"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556555"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956898"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2612183"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060845"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595641"},{"key":"e_1_2_1_33_1","volume-title":"NSDI","author":"Zaharia M.","year":"2012"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3115404.3115416","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:49:30Z","timestamp":1672220970000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3115404.3115416"}},"subtitle":["a fast JSON parser for data analytics"],"short-title":[],"issued":{"date-parts":[[2017,6]]},"references-count":33,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2017,6]]}},"alternative-id":["10.14778\/3115404.3115416"],"URL":"https:\/\/doi.org\/10.14778\/3115404.3115416","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,6]]}}}