{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T03:14:22Z","timestamp":1764645262847},"reference-count":8,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,8]]},"abstract":"<jats:p>\n            The application of machine learning to large datasets has become a core component of many important and exciting software systems being built today. The extreme value in these\n            <jats:italic>trained systems<\/jats:italic>\n            is tempered, however, by the difficulty of constructing them. As shown by the experience of Google, Netflix, IBM, and many others, a critical problem in building trained systems is that of\n            <jats:italic>feature engineering.<\/jats:italic>\n            High-quality machine learning features are crucial for the system's performance but are difficult and time-consuming for engineers to develop. Data-centric developer tools that improve the productivity of feature engineers will thus likely have a large impact on an important area of work.\n          <\/jats:p>\n          <jats:p>\n            We have built a demonstration integrated development environment for feature engineers. It accelerates one particular step in the feature engineering development cycle: evaluating the effectiveness of novel feature code. In particular, it uses an index and runtime execution planner to process raw data objects (\n            <jats:italic>e.g.<\/jats:italic>\n            , Web pages) in order of descending likelihood that the data object will be relevant to the user's feature code. This demonstration IDE allows the user to write arbitrary feature code, evaluate its impact on learner quality, and observe exactly how much faster our technique performs compared to a baseline system.\n          <\/jats:p>","DOI":"10.14778\/2733004.2733054","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"1657-1660","source":"Crossref","is-referenced-by-count":13,"title":["An integrated development environment for faster feature engineering"],"prefix":"10.14778","volume":"7","author":[{"given":"Michael R.","family":"Anderson","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Cafarella","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yixing","family":"Jiang","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guan","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bochun","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2014,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"CIDR","author":"Anderson M.","year":"2013","unstructured":"M. Anderson , D. Antenucci , V. Bittorf , M. Burgess , M. Cafarella , A. Kumar , F. Niu , Y. Park , C. R\u00e9 , and C. Zhang . Brainwash: A data system for feature engineering . In CIDR , 2013 . M. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. Cafarella, A. Kumar, F. Niu, Y. Park, C. R\u00e9, and C. Zhang. Brainwash: A data system for feature engineering. In CIDR, 2013."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000024"},{"key":"e_1_2_1_4_1","volume-title":"OSDI","author":"Dean J.","year":"2004","unstructured":"J. Dean and S. Ghemawat . MapReduce: Simplified data processing on large clusters . In OSDI , 2004 . J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v31i3.2303"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"e_1_2_1_7_1","volume-title":"February","author":"Levy S.","year":"2010","unstructured":"S. Levy . How Google's Algorithm Rules the Web. Wired , February 2010 . S. Levy. How Google's Algorithm Rules the Web. Wired, February 2010."},{"key":"e_1_2_1_9_1","volume-title":"ACL","author":"Zhang C.","year":"2012","unstructured":"C. Zhang , F. Niu , C. R\u00e9 , and J. W. Shavlik . Big data versus the crowd: Looking for relationships in all the right places . In ACL , 2012 . C. Zhang, F. Niu, C. R\u00e9, and J. W. Shavlik. Big data versus the crowd: Looking for relationships in all the right places. In ACL, 2012."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2733004.2733054","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:43:31Z","timestamp":1672220611000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2733004.2733054"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":8,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.14778\/2733004.2733054"],"URL":"https:\/\/doi.org\/10.14778\/2733004.2733054","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,8]]}}}