{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T03:06:16Z","timestamp":1776395176700,"version":"3.51.2"},"reference-count":77,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2020,8]]},"abstract":"<jats:p>\n            As applications in large organizations evolve, the machine learning (ML) models that power them must adapt the same predictive tasks to newly arising data modalities (e.g., a new video content launch in a social media application requires existing text or image models to extend to video). To solve this problem, organizations typically create ML pipelines from scratch. However, this fails to utilize the domain expertise and data they have cultivated from developing tasks for existing modalities. We demonstrate how\n            <jats:italic toggle=\"yes\">organizational resources<\/jats:italic>\n            , in the form of aggregate statistics, knowledge bases, and existing services that operate over related tasks, enable teams to construct a common feature space that connects new and existing data modalities. This allows teams to apply methods for data curation (e.g., weak supervision and label propagation) and model training (e.g., forms of multi-modal learning) across these different data modalities. We study how this use of organizational resources composes at production scale in over 5 classification tasks at Google, and demonstrate how it reduces the time needed to develop models for new modalities from months to weeks or days.\n          <\/jats:p>","DOI":"10.14778\/3415478.3415559","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T18:46:46Z","timestamp":1600109206000},"page":"3396-3410","source":"Crossref","is-referenced-by-count":6,"title":["Leveraging organizational resources to adapt models to new data modalities"],"prefix":"10.14778","volume":"13","author":[{"given":"Sahaana","family":"Suri","sequence":"first","affiliation":[{"name":"Google and Stanford"}]},{"given":"Raghuveer","family":"Chanda","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Neslihan","family":"Bulut","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Pradyumna","family":"Narayana","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Yemao","family":"Zeng","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Peter","family":"Bailis","sequence":"additional","affiliation":[{"name":"Stanford"}]},{"given":"Sugato","family":"Basu","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Girija","family":"Narlikar","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Christopher","family":"R\u00e9","sequence":"additional","affiliation":[{"name":"Stanford"}]},{"given":"Abishek","family":"Sethi","sequence":"additional","affiliation":[{"name":"Google"}]}],"member":"320","published-online":{"date-parts":[[2020,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Quandl 2011. https:\/\/www.quandl.com\/."},{"key":"e_1_2_1_2_1","unstructured":"ONNX 2017. https:\/\/onnx.ai\/."},{"key":"e_1_2_1_3_1","volume-title":"Machine Learning","author":"MarketPlace AWS","year":"2018","unstructured":"AWS MarketPlace, Machine Learning, 2018. https:\/\/aws.amazon.com\/marketplace\/solutions\/machine-learning."},{"key":"e_1_2_1_4_1","unstructured":"Google Cloud AI Hub 2019. https:\/\/cloud.google.com\/products\/ai\/."},{"issue":"4","key":"e_1_2_1_5_1","first-page":"419","article-title":"Diff: a relational interface for large-scale data explanation","volume":"12","author":"Abuzaid F.","year":"2018","unstructured":"F. Abuzaid, P. Kraft, S. Suri, E. Gan, E. Xu, A. Shenoy, A. Ananthanarayan, J. Sheu, E. Meijer, X. Wu, et al. Diff: a relational interface for large-scale data explanation. PVLDB, 12(4):419--432, 2018.","journal-title":"PVLDB"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/93.621580"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2007.109"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314036"},{"key":"e_1_2_1_9_1","volume-title":"A general framework for scalable transductive transfer learning. Knowledge and information systems, 38(1):61--83","author":"Bahadori M. T.","year":"2014","unstructured":"M. T. Bahadori, Y. Liu, and D. Zhang. A general framework for scalable transductive transfer learning. Knowledge and information systems, 38(1):61--83, 2014."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_11_1","volume-title":"Prioritizing attention in fast data: Principles and promise. CIDR, 10(3035918.3035928)","author":"Bailis P.","year":"2017","unstructured":"P. Bailis, E. Gan, K. Rong, and S. Suri. Prioritizing attention in fast data: Principles and promise. CIDR, 10(3035918.3035928), 2017."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098021"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5152-4"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1610075.1610094"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2009.2015974"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3329486.3329491"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281218"},{"key":"e_1_2_1_18_1","volume-title":"Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815","author":"H. Daum\u00e9","year":"2009","unstructured":"H. Daum\u00e9 III. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815, 2009."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-008-0119-9"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687620"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3197387"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544914"},{"key":"e_1_2_1_23_1","volume-title":"Cross-modal data programming enables rapid medical machine learning. arXiv preprint arXiv:1903.11101","author":"Dunnmon J.","year":"2019","unstructured":"J. Dunnmon, A. Ratner, N. Khandwala, K. Saab, M. Markert, H. Sagreiya, R. Goldman, C. Lee-Messer, M. Lungren, D. Rubin, et al. Cross-modal data programming enables rapid medical machine learning. arXiv preprint arXiv:1903.11101, 2019."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206772"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.79"},{"key":"e_1_2_1_26_1","first-page":"2121","volume-title":"Advances in neural information processing systems","author":"Frome A.","year":"2013","unstructured":"A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, pages 2121--2129, 2013."},{"key":"e_1_2_1_27_1","volume-title":"Rekall: Specifying video events using compositions of spatiotemporal labels. arXiv preprint arXiv:1910.02993","author":"Fu D. Y.","year":"2019","unstructured":"D. Y. Fu, W. Crichton, J. Hong, X. Yao, H. Zhang, A. Truong, A. Narayan, M. Agrawala, C. R\u00e9, and K. Fatahalian. Rekall: Specifying video events using compositions of spatiotemporal labels. arXiv preprint arXiv:1910.02993, 2019."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098043"},{"key":"e_1_2_1_29_1","first-page":"6042","volume-title":"Advances in Neural Information Processing Systems","author":"Grave E.","year":"2017","unstructured":"E. Grave, M. M. Cisse, and A. Joulin. Unbounded cache model for online language modeling with open vocabulary. In Advances in Neural Information Processing Systems, pages 6042--6052, 2017."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330658"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00059"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IV.2000.859745"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3297753.3297756"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305576"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the annual meeting of the cognitive science society","volume":"33","author":"Lake B.","year":"2011","unstructured":"B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum. One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, volume 33, 2011."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206594"},{"key":"e_1_2_1_37_1","first-page":"3","volume-title":"AAAI","volume":"1","author":"Larochelle H.","year":"2008","unstructured":"H. Larochelle, D. Erhan, and Y. Bengio. Zero-data learning of new tasks. In AAAI, volume 1, page 3, 2008."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401951"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2000.855856"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/3360093"},{"key":"e_1_2_1_42_1","volume-title":"Materialization trade-offs for feature transfer from deep cnns for multimodal data analytics","author":"Nakandala S.","year":"2018","unstructured":"S. Nakandala and A. Kumar. Materialization trade-offs for feature transfer from deep cnns for multimodal data analytics. 2018."},{"key":"e_1_2_1_43_1","volume-title":"HUSE: Hierarchical universal semantic embeddings. arXiv:1911.05978(cs.CV)","author":"Narayana P.","year":"2019","unstructured":"P. Narayana, A. Pednekar, A. Krishnamoorthy, K. Sone, and S. Basu. HUSE: Hierarchical universal semantic embeddings. arXiv:1911.05978(cs.CV), 2019."},{"key":"e_1_2_1_44_1","volume-title":"Multimodal deep learning","author":"Ngiam J.","year":"2011","unstructured":"J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. 2011."},{"key":"e_1_2_1_45_1","first-page":"1410","volume-title":"Advances in neural information processing systems","author":"Palatucci M.","year":"2009","unstructured":"M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell. Zero-shot learning with semantic output codes. In Advances in neural information processing systems, pages 1410--1418, 2009."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_2_1_47_1","first-page":"1","volume-title":"The VLDB Journal","author":"Ratner A.","year":"2019","unstructured":"A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. R\u00e9. Snorkel: Rapid training data creation with weak supervision. The VLDB Journal, pages 1--22, 2019."},{"key":"e_1_2_1_48_1","unstructured":"S. Ravi. Graph-powered Machine Learning at Google 2016. https:\/\/ai.googleblog.com\/2016\/10\/graph-powered-machine-learning-at-google.html."},{"key":"e_1_2_1_49_1","first-page":"519","volume-title":"Artificial Intelligence and Statistics","author":"Ravi S.","year":"2016","unstructured":"S. Ravi and Q. Diao. Large scale distributed semi-supervised learning using streaming approximation. In Artificial Intelligence and Statistics, pages 519--528, 2016."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035951"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_1_52_1","first-page":"46","volume-title":"Advances in neural information processing systems","author":"Rohrbach M.","year":"2013","unstructured":"M. Rohrbach, S. Ebert, and B. Schiele. Transfer learning in a transductive setting. In Advances in neural information processing systems, pages 46--54, 2013."},{"key":"e_1_2_1_53_1","first-page":"2","article-title":"Semi-supervised self-training of object detection models","author":"Rosenberg C.","year":"2005","unstructured":"C. Rosenberg, M. Hebert, and H. Schneiderman. Semi-supervised self-training of object detection models. WACV\/MOTION, 2, 2005.","journal-title":"WACV\/MOTION"},{"key":"e_1_2_1_54_1","first-page":"1","volume-title":"AAAI Conference on Artificial Intelligence (AAAI)","volume":"18","author":"Safranchik E.","unstructured":"E. Safranchik, S. Luo, S. H. Bach, S. H. Bach, D. Rodriguez, Y. Liu, C. Luo, H. Shao, C. Xia, S. Sen, et al. Weakly supervised sequence tagging from noisy rules. In AAAI Conference on Artificial Intelligence (AAAI), volume 18, pages 1--67."},{"key":"e_1_2_1_55_1","first-page":"2503","volume-title":"Advances in neural information processing systems","author":"Sculley D.","year":"2015","unstructured":"D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503--2511, 2015."},{"key":"e_1_2_1_56_1","volume-title":"Technical report","author":"Settles B.","year":"2009","unstructured":"B. Settles. Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009."},{"key":"e_1_2_1_57_1","first-page":"4077","volume-title":"Advances in neural information processing systems","author":"Snell J.","year":"2017","unstructured":"J. Snell, K. Swersky, and R. Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems, pages 4077--4087, 2017."},{"key":"e_1_2_1_58_1","first-page":"935","volume-title":"Advances in neural information processing systems","author":"Socher R.","year":"2013","unstructured":"R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pages 935--943, 2013."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/645337.650382"},{"key":"e_1_2_1_60_1","first-page":"2222","volume-title":"Advances in neural information processing systems","author":"Srivastava N.","year":"2012","unstructured":"N. Srivastava and R. R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems, pages 2222--2230, 2012."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/3016100.3016186"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00049"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3306024"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_65_1","first-page":"240","volume-title":"Advances in neural information processing systems","author":"Varma P.","year":"2017","unstructured":"P. Varma, B. D. He, P. Bajaj, N. Khandwala, I. Banerjee, D. Rubin, and C. R\u00e9. Inferring generative model structure with static analysis. In Advances in neural information processing systems, pages 240--250, 2017."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/3291264.3291268"},{"key":"e_1_2_1_67_1","first-page":"192","volume-title":"Advances in Neural Information Processing Systems","author":"Varma P.","year":"2019","unstructured":"P. Varma, F. Sala, S. Sagawa, J. Fries, D. Fu, S. Khattar, A. Ramamoorthy, K. Xiao, K. Fatahalian, J. Priest, et al. Multi-resolution weak supervision for sequential data. In Advances in Neural Information Processing Systems, pages 192--203, 2019."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939516"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247938"},{"key":"e_1_2_1_70_1","first-page":"3630","volume-title":"Advances in neural information processing systems","author":"Vinyals O.","year":"2016","unstructured":"O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630--3638, 2016."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654948"},{"key":"e_1_2_1_72_1","volume-title":"International Conference on Learning Representations","author":"Wu S.","year":"2020","unstructured":"S. Wu, H. Zhang, and C. R. Understanding and improving information transfer in multi-task learning. In International Conference on Learning Representations, 2020."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.314"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2015.7301272"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3054201"},{"key":"e_1_2_1_76_1","volume-title":"International Conference on Learning Representations","author":"Zhan E.","year":"2019","unstructured":"E. Zhan, S. Zheng, Y. Yue, L. Sha, and P. Lucey. Generating multi-agent trajectories using programmatic weak supervision. In International Conference on Learning Representations, 2019."},{"key":"e_1_2_1_77_1","volume-title":"Learning from labeled and unlabeled data with label propagation","author":"Zhu X.","year":"2002","unstructured":"X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. 2002."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3415478.3415559","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T02:44:16Z","timestamp":1758077056000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3415478.3415559"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8]]},"references-count":77,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020,8]]}},"alternative-id":["10.14778\/3415478.3415559"],"URL":"https:\/\/doi.org\/10.14778\/3415478.3415559","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2020,8]]}}}