{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T16:01:47Z","timestamp":1780588907908,"version":"3.54.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,6]]},"abstract":"<jats:p>Today, business analysts and data scientists increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, and phone number, before they can perform analysis. This process of data transformation is an important part of data preparation, and is known to be difficult and time-consuming for end-users.<\/jats:p>\n          <jats:p>Traditionally, developers have dealt with these longstanding transformation problems using custom code libraries. They have built vast varieties of custom logic for name parsing and address standardization, etc., and shared their source code in places like GitHub. Data transformation would be a lot easier for end-users if they can discover and reuse such existing transformation logic.<\/jats:p>\n          <jats:p>\n            We developed\n            <jats:italic>Transform-Data-by-Example<\/jats:italic>\n            (\n            <jats:italic>TDE<\/jats:italic>\n            ), which works like a search engine for data transformations.\n            <jats:italic>TDE<\/jats:italic>\n            \"indexes\" vast varieties of transformation logic in source code, DLLs, web services and mapping tables, so that users only need to provide a few input\/output examples to demonstrate a desired transformation, and\n            <jats:italic>TDE<\/jats:italic>\n            can interactively find relevant functions to synthesize new programs consistent with all examples. Using an index of 50K functions crawled from GitHub and Stackoverflow,\n            <jats:italic>TDE<\/jats:italic>\n            can already handle many common transformations not currently supported by existing systems. On a benchmark with over 200 transformation tasks,\n            <jats:italic>TDE<\/jats:italic>\n            generates correct transformations for 72% tasks, which is considerably better than other systems evaluated. A beta version of\n            <jats:italic>TDE<\/jats:italic>\n            for Microsoft Excel is available via Office store\n            <jats:sup>1<\/jats:sup>\n            . Part of the\n            <jats:italic>TDE<\/jats:italic>\n            technology also ships in Microsoft Power BI.\n          <\/jats:p>","DOI":"10.14778\/3231751.3231766","type":"journal-article","created":{"date-parts":[[2018,7,27]],"date-time":"2018-07-27T12:21:07Z","timestamp":1532694067000},"page":"1165-1177","source":"Crossref","is-referenced-by-count":48,"title":["Transform-data-by-example (TDE)"],"prefix":"10.14778","volume":"11","author":[{"given":"Yeye","family":"He","sequence":"first","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xu","family":"Chu","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology and Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kris","family":"Ganjam","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yudian","family":"Zheng","sequence":"additional","affiliation":[{"name":"Twitter Inc. and Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vivek","family":"Narasayya","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Surajit","family":"Chaudhuri","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Bing maps api. https:\/\/www.microsoft.com\/maps\/choose-your-bing-maps-API.aspx.  Bing maps api. https:\/\/www.microsoft.com\/maps\/choose-your-bing-maps-API.aspx."},{"key":"e_1_2_1_2_1","unstructured":"Informatica Rev. https:\/\/www.informatica.com\/products\/data-quality\/rev.html.  Informatica Rev. https:\/\/www.informatica.com\/products\/data-quality\/rev.html."},{"key":"e_1_2_1_3_1","unstructured":"Openrefine. openrefine.org.  Openrefine. openrefine.org."},{"key":"e_1_2_1_4_1","unstructured":"Paxata. https:\/\/www.paxata.com\/.  Paxata. https:\/\/www.paxata.com\/."},{"key":"e_1_2_1_5_1","unstructured":"Roslyn compiler framework. https:\/\/github.com\/dotnet\/roslyn\/wiki\/RoslynOverview.  Roslyn compiler framework. https:\/\/github.com\/dotnet\/roslyn\/wiki\/RoslynOverview."},{"key":"e_1_2_1_6_1","unstructured":"Talend. https:\/\/www.talend.com\/.  Talend. https:\/\/www.talend.com\/."},{"key":"e_1_2_1_7_1","unstructured":"Transform Data by Example (from Microsoft Office Store). https:\/\/aka.ms\/transform-data-by-example-download.  Transform Data by Example (from Microsoft Office Store). https:\/\/aka.ms\/transform-data-by-example-download."},{"key":"e_1_2_1_8_1","unstructured":"Trifacta. https:\/\/www.trifacta.com\/.  Trifacta. https:\/\/www.trifacta.com\/."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498319"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367605"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_12_1","volume-title":"Data services leveraging bing's data assets","author":"Chakrabarti K.","year":"2016","unstructured":"K. Chakrabarti , S. Chaudhuri , Z. Chen , K. Ganjam , and Y. He . Data services leveraging bing's data assets . IEEE Data Eng. Bull ., 2016 . K. Chakrabarti, S. Chaudhuri, Z. Chen, K. Ganjam, and Y. He. Data services leveraging bing's data assets. IEEE Data Eng. Bull., 2016."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","DOI":"10.1002\/0471448354","volume-title":"Exploratory Data Mining and Data Cleaning","author":"Dasu T.","year":"2003","unstructured":"T. Dasu and T. Johnson . Exploratory Data Mining and Data Cleaning . John Wiley & Sons, Inc. , New York , 2003 . T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Inc., New York, 2003."},{"key":"e_1_2_1_14_1","volume-title":"Clinical Chemistry","author":"Gollery M.","year":"2005","unstructured":"M. Gollery . Bioinformatics : Sequence and genome analysis . Clinical Chemistry , 2005 . M. Gollery. Bioinformatics: Sequence and genome analysis. Clinical Chemistry, 2005."},{"key":"e_1_2_1_15_1","volume-title":"Inc.","author":"Hare J.","year":"2016","unstructured":"J. Hare , C. Adams , A. Woodward , and H. Swinehart . Forecast snapshot: Self-service data preparation, worldwide, 2016. Gartner , Inc. , February 2016 . J. Hare, C. Adams, A. Woodward, and H. Swinehart. Forecast snapshot: Self-service data preparation, worldwide, 2016. Gartner, Inc., February 2016."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993536"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1230819.1241670"},{"key":"e_1_2_1_18_1","volume-title":"CIDR","author":"Heer J.","year":"2015","unstructured":"J. Heer , J. M. Hellerstein , and S. Kandel . Predictive interaction for data transformation . In CIDR , 2015 . J. Heer, J. M. Hellerstein, and S. Kandel. Predictive interaction for data transformation. In CIDR, 2015."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196889"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064034"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1080\/03081078608934927"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2666356.2594333"},{"key":"e_1_2_1_23_1","volume-title":"Morgan Kaufmann","author":"Lieberman H.","year":"2001","unstructured":"H. Lieberman , editor. Your Wish is My Command: Programming by Example . Morgan Kaufmann , 2001 . H. Lieberman, editor. Your Wish is My Command: Programming by Example. Morgan Kaufmann, 2001."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454163"},{"key":"e_1_2_1_25_1","volume-title":"Gartner: Market guide for self-service data preparation","author":"Sallam R. L.","year":"2016","unstructured":"R. L. Sallam , P. Forry , E. Zaidi , and S. Vashisth . Gartner: Market guide for self-service data preparation . 2016 . R. L. Sallam, P. Forry, E. Zaidi, and S. Vashisth. Gartner: Market guide for self-service data preparation. 2016."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2977797.2977807"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064010"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213848"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196888"},{"key":"e_1_2_1_31_1","volume-title":"Evaluation of explore-exploit policies in multi-result ranking systems. arXiv preprint arXiv:1504.07662","author":"Yankov D.","year":"2015","unstructured":"D. Yankov , P. Berkhin , and L. Li . Evaluation of explore-exploit policies in multi-result ranking systems. arXiv preprint arXiv:1504.07662 , 2015 . D. Yankov, P. Berkhin, and L. Li. Evaluation of explore-exploit policies in multi-result ranking systems. arXiv preprint arXiv:1504.07662, 2015."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115409"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3231751.3231766","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:42:40Z","timestamp":1672224160000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3231751.3231766"}},"subtitle":["an extensible search engine for data transformations"],"short-title":[],"issued":{"date-parts":[[2018,6]]},"references-count":31,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2018,6]]}},"alternative-id":["10.14778\/3231751.3231766"],"URL":"https:\/\/doi.org\/10.14778\/3231751.3231766","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,6]]}}}