{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:51:22Z","timestamp":1773481882278,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,11]]},"abstract":"<jats:p>Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires understanding and implementing the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.<\/jats:p>","DOI":"10.14778\/3632093.3632111","type":"journal-article","created":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T11:26:31Z","timestamp":1705749991000},"page":"497-510","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language"],"prefix":"10.14778","volume":"17","author":[{"given":"Mukul","family":"Singh","sequence":"first","affiliation":[{"name":"Microsoft, Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9","family":"Cambronero","sequence":"additional","affiliation":[{"name":"Microsoft, New Haven, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sumit","family":"Gulwani","sequence":"additional","affiliation":[{"name":"Microsoft, Redmond, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vu","family":"Le","sequence":"additional","affiliation":[{"name":"Microsoft, Redmond, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carina","family":"Negreanu","sequence":"additional","affiliation":[{"name":"Microsoft Research, Cambridge, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elnaz","family":"Nouri","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad","family":"Raza","sequence":"additional","affiliation":[{"name":"Microsoft, Redmond, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gust","family":"Verbruggen","sequence":"additional","affiliation":[{"name":"Microsoft, Keerbergen, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1,20]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"OpenAI Platform Documentation. https:\/\/platform.openai.com\/docs\/model-index-for-researchers Accessed on","year":"2023","unstructured":"[n.d.]. OpenAI Platform Documentation. https:\/\/platform.openai.com\/docs\/model-index-for-researchers Accessed on December 3, 2023."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. The Spider leaderboard. https:\/\/yale-lily.github.io\/spider."},{"key":"e_1_2_1_3_1","volume-title":"2015 IEEE\/ACM 12th Working Conference on Mining Software Repositories. IEEE, 486--489","author":"Barik Titus","year":"2015","unstructured":"Titus Barik, Kevin Lubick, Justin Smith, John Slankas, and Emerson Murphy-Hill. 2015. Fuse: a reproducible, extendable, internet-scale corpus of spreadsheets. In 2015 IEEE\/ACM 12th Working Conference on Mining Software Repositories. IEEE, 486--489."},{"key":"e_1_2_1_4_1","volume-title":"Lin (Eds.)","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00220"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Josh Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. 10.48550\/ARXIV.2107.03374","DOI":"10.48550\/ARXIV.2107.03374"},{"key":"e_1_2_1_7_1","volume-title":"International Conference on Machine Learning.","author":"Chen Xinyun","year":"2021","unstructured":"Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, and Denny Zhou. 2021. SpreadsheetCoder: Formula Prediction from Semi-structured Context. In International Conference on Machine Learning."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623617"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.82"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-020-05934-z"},{"key":"e_1_2_1_11_1","volume-title":"International conference on machine learning. PMLR, 990--998","author":"Devlin Jacob","year":"2017","unstructured":"Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. Robustfill: Neural program learning under noisy i\/o. In International conference on machine learning. PMLR, 990--998."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.330169"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411943"},{"key":"e_1_2_1_14_1","unstructured":"Microsoft Excel. 2023. Color scales data bars and icon sets. https:\/\/support.microsoft.com\/en-us\/office\/use-data-bars-color-scales-and-icon-sets-to-highlight-data-f118d0a6-5921-4e2e-905b-fe00f3378fb9. Last Accessed: 2023-04-30."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the first workshop on End-user software engineering. 1--5.","author":"Fisher Marc","year":"2005","unstructured":"Marc Fisher and Gregg Rothermel. 2005. The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In Proceedings of the first workshop on End-user software engineering. 1--5."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","unstructured":"Daniel Fried Armen Aghajanyan Jessy Lin Sida Wang Eric Wallace Freda Shi Ruiqi Zhong Wen-tau Yih Luke Zettlemoyer and Mike Lewis. 2022. InCoder: A Generative Model for Code Infilling and Synthesis. 10.48550\/ARXIV.2204.05999","DOI":"10.48550\/ARXIV.2204.05999"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1926385.1926423"},{"key":"e_1_2_1_19_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=q79uMSC6ZBT","author":"Guo Daya","year":"2022","unstructured":"Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. 2022. Learning to Complete Code with Sketches. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=q79uMSC6ZBT"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409732"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.nlp4prog-1.9"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","unstructured":"Chaitra Hegde and Shrikumar Patil. 2020. Unsupervised Paraphrase Generation using Pre-trained Language Models. 10.48550\/ARXIV.2006.05477","DOI":"10.48550\/ARXIV.2006.05477"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Herzig Jonathan","year":"2020","unstructured":"Jonathan Herzig, Pawe\u0142 Krzysztof Nowak, Thomas M\u00fcller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. Tapas: Weakly Supervised Table Parsing via Pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Seattle, Washington, United States. https:\/\/www.aclweb.org\/anthology\/2020.acl-main.398\/"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1096601.1096623"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00300"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Vu Le and Sumit Gulwani. 2014. FlashExtract: a framework for data extraction by examples. In 2014 Programming Language Design and Implementation. ACM 542--553. https:\/\/www.microsoft.com\/en-us\/research\/publication\/flashextract-framework-data-extraction-examples\/","DOI":"10.1145\/2666356.2594333"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2594519"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831369"},{"key":"e_1_2_1_29_1","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim Qian Liu Evgenii Zheltonozhskii Terry Yue Zhuo Thomas Wang Olivier Dehaene Mishig Davaadorj Joel Lamy-Poirier Jo\u00e3o Monteiro Oleh Shliazhko Nicolas Gontier Nicholas Meade Armel Zebaze Ming-Ho Yee Logesh Kumar Umapathi Jian Zhu Benjamin Lipkin Muhtasham Oblokulov Zhiruo Wang Rudra Murthy Jason Stillerman Siva Sankalp Patel Dmitry Abulkhanov Marco Zocca Manan Dey Zhihan Zhang Nourhan Fahmy Urvashi Bhattacharyya W. Yu Swayam Singh Sasha Luccioni Paulo Villegas Maxim Kunakov Fedor Zhdanov Manuel Romero Tony Lee Nadav Timor Jennifer Ding Claire Schlesinger Hailey Schoelkopf Jana Ebert Tri Dao Mayank Mishra Alexander Gu Jennifer Robinson Carolyn Jane Anderson Brendan Dolan-Gavitt Danish Contractor Siva Reddy Daniel Fried Dzmitry Bahdanau Yacine Jernite Carlos Mu\u00f1oz Ferrandis Sean M. Hughes Thomas Wolf Arjun Guha Leandro von Werra and Harm de Vries. 2023. StarCoder: may the source be with you! ArXiv abs\/2305.06161 (2023). https:\/\/api.semanticscholar.org\/CorpusID:258588247"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cad.2005.11.006"},{"key":"e_1_2_1_31_1","volume-title":"Tapex: Table pre-training via learning a neural sql executor. arXiv preprint arXiv:2107.07653","author":"Liu Qian","year":"2021","unstructured":"Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. 2021. Tapex: Table pre-training via learning a neural sql executor. arXiv preprint arXiv:2107.07653 (2021)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","unstructured":"Yu Meng Jiaxin Huang Yu Zhang and Jiawei Han. 2022. Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. 10.48550\/ARXIV.2202.04538","DOI":"10.48550\/ARXIV.2202.04538"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","unstructured":"Yu Meng Martin Michalski Jiaxin Huang Yu Zhang Tarek Abdelzaher and Jiawei Han. 2022. Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning. 10.48550\/ARXIV.2211.03044","DOI":"10.48550\/ARXIV.2211.03044"},{"key":"e_1_2_1_34_1","volume-title":"Ben Van Durme, and Elnaz Nouri","author":"Mishra Swaroop","year":"2023","unstructured":"Swaroop Mishra, Justin Payan, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Ben Van Durme, and Elnaz Nouri. 2023. InstructExcel: A Benchmark for Natural Language Instruction in Excel. arXiv preprint (2023)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Avanika Narayan Ines Chami Laurel Orr Simran Arora and Christopher R\u00e9. 2022. Can Foundation Models Wrangle Your Data? arXiv:2205.09911 [cs.LG]","DOI":"10.14778\/3574245.3574258"},{"key":"e_1_2_1_36_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024--8035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"e_1_2_1_37_1","volume-title":"Synchromesh: Reliable code generation from pre-trained language models. ArXiv abs\/2201.11227","author":"Poesia Gabriel","year":"2022","unstructured":"Gabriel Poesia, Oleksandr Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable code generation from pre-trained language models. ArXiv abs\/2201.11227 (2022)."},{"key":"e_1_2_1_38_1","unstructured":"Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. arXiv:2304.11015 [cs.CL]"},{"key":"e_1_2_1_39_1","first-page":"1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485535"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_42_1","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov Ivan Evtimov Joanna Bitton Manish Bhatt Cristian Canton Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D\u00e9fossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. 2023. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL]"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994536"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.779"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407858"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603600"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15","volume":"97","author":"Thulasidasan Sunil","year":"2019","unstructured":"Sunil Thulasidasan, Tanmoy Bhattacharya, Jeff A. Bilmes, Gopinath Chennupati, and Jamal Mohd-Yusof. 2019. Combating Label Noise in Deep Learning using Abstention. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9--15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6234--6243. http:\/\/proceedings.mlr.press\/v97\/thulasidasan19a.html"},{"key":"e_1_2_1_48_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467434"},{"key":"e_1_2_1_51_1","volume-title":"Programming by Example and Text-to-Code Translation for Conversational Code Generation. ArXiv abs\/2211.11554","author":"Whitehouse Eli","year":"2022","unstructured":"Eli Whitehouse, William Gerard, Yauhen Klimovich, and Marc Franco-Salvador. 2022. Programming by Example and Text-to-Code Translation for Conversational Code Generation. ArXiv abs\/2211.11554 (2022)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.745"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1425"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3632093.3632111","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T11:27:31Z","timestamp":1705750051000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3632093.3632111"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,11]]}},"alternative-id":["10.14778\/3632093.3632111"],"URL":"https:\/\/doi.org\/10.14778\/3632093.3632111","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,11]]},"assertion":[{"value":"2024-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}