{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:16:37Z","timestamp":1763468197874},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Google's computing infrastructure is based on commodity hardware.<\/jats:p>\n          <jats:p>\n            In this paper, we describe PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the\n            <jats:italic>MapReduce<\/jats:italic>\n            model of distributed computation. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising.\n          <\/jats:p>","DOI":"10.14778\/1687553.1687569","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1426-1437","source":"Crossref","is-referenced-by-count":187,"title":["PLANET"],"prefix":"10.14778","volume":"2","author":[{"given":"Biswanath","family":"Panda","sequence":"first","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joshua S.","family":"Herbach","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sugato","family":"Basu","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roberto J.","family":"Bayardo","sequence":"additional","affiliation":[{"name":"Google, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Clouds: A decision tree classier for large datasets. Technical report","author":"Alsabti K.","year":"1998"},{"key":"e_1_2_1_2_1","volume-title":"Large Scale Learning Challenge Workshop at the International Conference on Machine Learning (ICML)","author":"Ben-Haim Y.","year":"2008"},{"key":"e_1_2_1_3_1","volume-title":"Characterization and parallelization of decision tree induction. Technical report","author":"Bradford J. P.","year":"1999"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_6_1","volume-title":"Wadsworth and Brooks","author":"Breiman L.","year":"1984"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/1232788.1232798"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390169"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143865"},{"key":"e_1_2_1_10_1","first-page":"227","volume-title":"Workshop on Knowledge Discovery in Databases at the Conference of Association for the Advancement of Artificial Intelligence (AAAI)","author":"Chan P. K.","year":"1993"},{"key":"e_1_2_1_11_1","first-page":"281","volume-title":"Advances in Neural Information Processing Systems (NIPS) 19","author":"Chu C.-T.","year":"2007"},{"key":"e_1_2_1_12_1","volume-title":"Symposium on Operating System Design and Implementation (OSDI)","author":"Dean J.","year":"2004"},{"key":"e_1_2_1_13_1","volume-title":"Pattern Classification","author":"Duda R. O.","year":"2001"},{"key":"e_1_2_1_14_1","first-page":"148","volume-title":"International Conference on Machine Learning (ICML)","author":"Freund Y.","year":"1996"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1013203451"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304197"},{"key":"e_1_2_1_17_1","first-page":"416","volume-title":"International Conference on Very Large Data Bases (VLDB)","author":"Gehrke J.","year":"1998"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/1032649.1033438"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972733.11"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956821"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/876880.879586"},{"key":"e_1_2_1_22_1","volume-title":"MarketingProfs","author":"Kaushik A.","year":"2007"},{"key":"e_1_2_1_23_1","volume-title":"May","author":"Kaushik A.","year":"2007"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/502512.502557"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304204"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/645337.650384"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009876119989"},{"key":"e_1_2_1_28_1","unstructured":"G. Ridgeway. Generalized boosted models: A guide to the gbm package. http:\/\/cran.r-project.org\/web\/packages\/gbm 2006.  G. Ridgeway. Generalized boosted models: A guide to the gbm package. http:\/\/cran.r-project.org\/web\/packages\/gbm 2006."},{"key":"e_1_2_1_29_1","volume-title":"Data Mining with Decision Trees: Theory and Applications","author":"Rokach L.","year":"2008"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557161"},{"key":"e_1_2_1_31_1","first-page":"544","volume-title":"International Conference on Very Large Data Bases (VLDB)","author":"Shafer J. C.","year":"1996"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/211359"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/846218.847248"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687553.1687569","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:58:12Z","timestamp":1672225092000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687553.1687569"}},"subtitle":["massively parallel learning of tree ensembles with MapReduce"],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":33,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687553.1687569"],"URL":"https:\/\/doi.org\/10.14778\/1687553.1687569","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}