{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:50:40Z","timestamp":1771699840586,"version":"3.50.1"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:p>We are often thrilled by the abundance of information surrounding us and wish to integrate data from as many sources as possible. However, understanding, analyzing, and using these data are often hard. Too much data can introduce a huge integration cost, such as expenses for purchasing data and resources for integration and cleaning. Furthermore, including low-quality data can even deteriorate the quality of integration results instead of bringing the desired quality gain. Thus, \"the more the better\" does not always hold for data integration and often \"less is more\".<\/jats:p>\n          <jats:p>In this paper, we study how to select a subset of sources before integration such that we can balance the quality of integrated data and integration cost. Inspired by the Marginalism principle in economic theory, we wish to integrate a new source only if its marginal gain, often a function of improved integration quality, is higher than the marginal cost, associated with data-purchase expense and integration resources. As a first step towards this goal, we focus on data fusion tasks, where the goal is to resolve conflicts from different sources. We propose a randomized solution for selecting sources for fusion and show empirically its effectiveness and scalability on both real-world data and synthetic data.<\/jats:p>","DOI":"10.14778\/2535568.2448938","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"37-48","source":"Crossref","is-referenced-by-count":95,"title":["Less is more"],"prefix":"10.14778","volume":"6","author":[{"given":"Xin Luna","family":"Dong","sequence":"first","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Barna","family":"Saha","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]}],"member":"320","published-online":{"date-parts":[[2012,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Data markets in the cloud: An opportunity for the database community. PVLDB, 4(12)","author":"Balazinska M.","year":"2011","unstructured":"M. Balazinska , B. Howe , and D. Suciu . Data markets in the cloud: An opportunity for the database community. PVLDB, 4(12) , 2011 . M. Balazinska, B. Howe, and D. Suciu. Data markets in the cloud: An opportunity for the database community. PVLDB, 4(12), 2011."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1456650.1456651"},{"key":"e_1_2_1_3_1","volume-title":"Integrating conflicting data: the role of source dependence. PVLDB, 2(1)","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong , L. Berti-Equille , and D. Srivastava . Integrating conflicting data: the role of source dependence. PVLDB, 2(1) , 2009 . X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009."},{"key":"e_1_2_1_4_1","volume-title":"PVLDB","author":"Dong X. L.","year":"2009","unstructured":"X. L. Dong and F. Naumann . Data fusion-resolving data conflicts for integration . PVLDB , 2009 . X. L. Dong and F. Naumann. Data fusion-resolving data conflicts for integration. PVLDB, 2009."},{"key":"e_1_2_1_5_1","unstructured":"X. L. Dong B. Saha and D. Srivastava. Less is more: Selecting sources wisely for integration. http:\/\/lunadong.com\/publication\/marginalism report.pdf  X. L. Dong B. Saha and D. Srivastava. Less is more: Selecting sources wisely for integration. http:\/\/lunadong.com\/publication\/marginalism report.pdf"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1577840.1577846"},{"key":"e_1_2_1_7_1","first-page":"6","article-title":"Greedy randomized adaptive search procedures","author":"Feo T.","year":"1995","unstructured":"T. Feo and M. G. Resende . Greedy randomized adaptive search procedures . J. of Global Optimization , 6 , 1995 . T. Feo and M. G. Resende. Greedy randomized adaptive search procedures. J. of Global Optimization, 6, 1995.","journal-title":"J. of Global Optimization"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718504"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2008.01.012"},{"key":"e_1_2_1_10_1","volume-title":"Truth finding on the deep web: Is the problem solved? PVLDB, 6(2)","author":"Li X.","year":"2013","unstructured":"X. Li , X. L. Dong , K. B. Lyons , W. Meng , and D. Srivastava . Truth finding on the deep web: Is the problem solved? PVLDB, 6(2) , 2013 . X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2), 2013."},{"key":"e_1_2_1_11_1","volume-title":"Pinciples of Economics","author":"Marshall A.","year":"1890","unstructured":"A. Marshall . Pinciples of Economics . Prometheus Books , 1890 . A. Marshall. Pinciples of Economics. Prometheus Books, 1890."},{"key":"e_1_2_1_12_1","volume-title":"Morgan & Claypool","author":"Meng W.","year":"2010","unstructured":"W. Meng and C. T. Yu . Advanced Metasearch Engine Technology . Morgan & Claypool , 2010 . W. Meng and C. T. Yu. Advanced Metasearch Engine Technology. Morgan & Claypool, 2010."},{"key":"e_1_2_1_13_1","volume-title":"WebDB","author":"Mihaila G. A.","year":"2000","unstructured":"G. A. Mihaila , L. Raschid , and M.-E. Vidal . Using quality of data metadata for source selection and ranking . In WebDB , 2000 . G. A. Mihaila, L. Raschid, and M.-E. Vidal. Using quality of data metadata for source selection and ranking. In WebDB, 2000."},{"key":"e_1_2_1_14_1","volume-title":"IQ","author":"Naumann F.","year":"1998","unstructured":"F. Naumann , J. C. Freytag , and M. Spiliopoulou . Quality driven source selection using data envelope analysis . In IQ , 1998 . F. Naumann, J. C. Freytag, and M. Spiliopoulou. Quality driven source selection using data envelope analysis. In IQ, 1998."},{"key":"e_1_2_1_15_1","first-page":"877","volume-title":"COLING","author":"Pasternack J.","year":"2010","unstructured":"J. Pasternack and D. Roth . Knowing what to believe (when you already know something) . In COLING , pages 877 - 885 , 2010 . J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877-885, 2010."},{"key":"e_1_2_1_16_1","first-page":"2324 11","volume-title":"IJCAI","author":"Pasternack J.","year":"2011","unstructured":"J. Pasternack and D. Roth . Making better informed trust decisions with generalized fact-finding . In IJCAI , pages 2324 - 2329 , 2011 . 10.5591\/978-1-57735-516-8\/IJCAI 11 - 387 J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In IJCAI, pages 2324-2329, 2011. 10.5591\/978-1-57735-516-8\/IJCAI11-387"},{"key":"e_1_2_1_17_1","volume-title":"SIGMOD","author":"Qu H.","year":"2007","unstructured":"H. Qu , J. Xu , and A. Labrinidis . Quality is in the eye of the beholder: towards user-centric web-databases . In SIGMOD , 2007 . 10.1145\/1247480.1247622 H. Qu, J. Xu, and A. Labrinidis. Quality is in the eye of the beholder: towards user-centric web-databases. In SIGMOD, 2007. 10.1145\/1247480.1247622"},{"key":"e_1_2_1_18_1","volume-title":"WSDM","author":"Suryanto M. A.","year":"2009","unstructured":"M. A. Suryanto , E.-P. Lim , A. Sun , and R. Chiang . Quality-aware collaborative question answering: Methods and evaluation . In WSDM , 2009 . 10.1145\/1498759.1498820 M. A. Suryanto, E.-P. Lim, A. Sun, and R. Chiang. Quality-aware collaborative question answering: Methods and evaluation. In WSDM, 2009. 10.1145\/1498759.1498820"},{"key":"e_1_2_1_19_1","volume-title":"DMSN","author":"Wu H.","year":"2009","unstructured":"H. Wu , Q. Luo , J. Li , and A. Labrinidis . Quality aware query scheduling in wireless sensor networks . In DMSN , 2009 . 10.1145\/1594187.1594197 H. Wu, Q. Luo, J. Li, and A. Labrinidis. Quality aware query scheduling in wireless sensor networks. In DMSN, 2009. 10.1145\/1594187.1594197"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00672-2_6"},{"key":"e_1_2_1_21_1","volume-title":"Proc. of SIGKDD","author":"Yin X.","year":"2007","unstructured":"X. Yin , J. Han , and P. S. Yu . Truth discovery with multiple conflicting information providers on the web . In Proc. of SIGKDD , 2007 . 10.1145\/1281192.1281309 X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of SIGKDD, 2007. 10.1145\/1281192.1281309"},{"key":"e_1_2_1_22_1","first-page":"217","volume-title":"WWW","author":"Yin X.","year":"2011","unstructured":"X. Yin and W. Tan . Semi-supervised truth discovery . In WWW , pages 217 - 226 , 2011 . 10.1145\/1963405.1963439 X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, pages 217-226, 2011. 10.1145\/1963405.1963439"},{"issue":"6","key":"e_1_2_1_23_1","first-page":"550","article-title":"A bayesian approach to discovering truth from conflicting sources for data integration","volume":"5","author":"Zhao B.","year":"2012","unstructured":"B. Zhao , B. I. P. Rubinstein , J. Gemmell , and J. Han . A bayesian approach to discovering truth from conflicting sources for data integration . PVLDB , 5 ( 6 ): 550 - 561 , 2012 . B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550-561, 2012.","journal-title":"PVLDB"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2535568.2448938","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:04:05Z","timestamp":1672225445000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2535568.2448938"}},"subtitle":["selecting sources wisely for integration"],"short-title":[],"issued":{"date-parts":[[2012,12]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["10.14778\/2535568.2448938"],"URL":"https:\/\/doi.org\/10.14778\/2535568.2448938","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2012,12]]}}}