{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T06:58:57Z","timestamp":1775199537244,"version":"3.50.1"},"reference-count":116,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,8]]},"abstract":"<jats:p>In order to democratize data science, we need to fundamentally rethink the current analytics stack, from the user interface to the \"guts.\" Most importantly, enabling a broader range of users to unfold the potential of (their) data requires a change in the interface and the \"protection\" we offer them. On the one hand, visual interfaces for data science have to be intuitive, easy, and interactive to reach users without a strong background in computer science or statistics. On the other hand, we need to protect users from making false discoveries. Furthermore, it requires that technically involved (and often boring) tasks have to be automatically done by the system so that the user can focus on contributing their domain expertise to the problem. In this paper, we present Northstar, the Interactive Data Science System, which we have developed over the last 4 years to explore designs that make advanced analytics and model building more accessible.<\/jats:p>","DOI":"10.14778\/3229863.3240493","type":"journal-article","created":{"date-parts":[[2018,9,10]],"date-time":"2018-09-10T12:12:28Z","timestamp":1536581548000},"page":"2150-2164","source":"Crossref","is-referenced-by-count":54,"title":["Northstar"],"prefix":"10.14778","volume":"11","author":[{"given":"Tim","family":"Kraska","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology"}]}],"member":"320","published-online":{"date-parts":[[2018,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Project tungsten: Bringing apache spark closer to bare metal. https:\/\/databricks.com\/blog\/2015\/04\/28\/project-tungsten-bringing-spark-closer-to-bare-metal.html. Accessed: 2018-07-15.  Project tungsten: Bringing apache spark closer to bare metal. https:\/\/databricks.com\/blog\/2015\/04\/28\/project-tungsten-bringing-spark-closer-to-bare-metal.html. Accessed: 2018-07-15."},{"key":"e_1_2_1_2_1","unstructured":"Trifacta. http:\/\/www.trifacta.com.  Trifacta. http:\/\/www.trifacta.com."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335450"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/304181.304581"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939512"},{"issue":"5","key":"e_1_2_1_7_1","article-title":"Controlling the false discovery rate","volume":"57","author":"Benjamini Y.","year":"1995","journal-title":"Journal of the Royal Statistical Society, Series B"},{"key":"e_1_2_1_8_1","unstructured":"J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb):281--305 2012.   J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb):281--305 2012."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209889.3209891"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2904483.2904485"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_12_1","unstructured":"C. E. Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber 1936.  C. E. Bonferroni. Teoria statistica delle classi e calcolo delle probabilita . Libreria internazionale Seeber 1936."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056097"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2665069"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915245"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1--2):90--121 2005.   J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1--2):90--121 2005.","DOI":"10.1016\/j.ic.2004.04.007"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Y. Chung T. Kraska N. Polyzotis and S. E. Whang. Slice finder: Automated data slicing for model validation. CoRR abs\/1807.06068 2018.  Y. Chung T. Kraska N. Polyzotis and S. E. Whang. Slice finder: Automated data slicing for model validation. CoRR abs\/1807.06068 2018.","DOI":"10.1109\/ICDE.2019.00139"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115414"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882909"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3167970"},{"key":"e_1_2_1_21_1","first-page":"313","volume-title":"NSDI","author":"Condie T.","year":"2010"},{"key":"e_1_2_1_22_1","first-page":"315","volume-title":"Proc. of VLDB","author":"Cong G.","year":"2007"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346298"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824045"},{"key":"e_1_2_1_25_1","volume-title":"CIDR","author":"Crotty A.","year":"2015"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824127"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939513"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"A. Doan A. Halevy and Z. Ives. Principles of Data Integration. Morgan Kaufmann Publishers Inc. San Francisco CA USA 1st edition 2012.   A. Doan A. Halevy and Z. Ives. Principles of Data Integration . Morgan Kaufmann Publishers Inc. San Francisco CA USA 1st edition 2012.","DOI":"10.1016\/B978-0-12-416044-6.00001-6"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035957"},{"key":"e_1_2_1_30_1","first-page":"2350","volume-title":"NIPS","author":"Dwork C.","year":"2015"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882945"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498285"},{"key":"e_1_2_1_33_1","unstructured":"P. Eichmann C. Binnig T. Kraska and E. Zgraggen. Idebench: A benchmark for interactive data exploration. CoRR abs\/1804.02593 2018.  P. Eichmann C. Binnig T. Kraska and E. Zgraggen. Idebench: A benchmark for interactive data exploration. CoRR abs\/1804.02593 2018."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3027063.3053222"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939502.2939507"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1177\/1473871611413180"},{"key":"e_1_2_1_37_1","unstructured":"G. M. Essertel R. Y. Tahboub J. M. Decker K. J. Brown K. Olukotun and T. Rompf. Flare: Native compilation for heterogeneous workloads in apache spark. CoRR abs\/1703.08219 2017.  G. M. Essertel R. Y. Tahboub J. M. Decker K. J. Brown K. Olukotun and T. Rompf. Flare: Native compilation for heterogeneous workloads in apache spark. CoRR abs\/1703.08219 2017."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1659225.1659228"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.154"},{"key":"e_1_2_1_40_1","first-page":"2962","volume-title":"NIPS","author":"Feurer M.","year":"2015"},{"key":"e_1_2_1_41_1","first-page":"1128","volume-title":"AAAI","author":"Feurer M.","year":"2015"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168931.2168943"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115418"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106277"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098043"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077257.3077266"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","unstructured":"T.\n       \n      Hastie R.\n       \n      Tibshirani and \n      \n      \n      J.\n       \n      Friedman\n      \n  \n  . \n  The Elements of Statistical Learning\n  . \n  Springer Series in Statistics\n  . \n  Springer New York Inc\n  . New York NY USA 2001\n  .  T. Hastie R. Tibshirani and J. Friedman. The Elements of Statistical Learning . Springer Series in Statistics. Springer New York Inc. New York NY USA 2001.","DOI":"10.1007\/978-0-387-21606-5"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.781635"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253291"},{"key":"e_1_2_1_50_1","first-page":"68","volume-title":"CIDR","author":"Idreos S.","year":"2007"},{"key":"e_1_2_1_51_1","unstructured":"INRIA. scikit-learn: Machine learning in python. http:\/\/scikit-learn.org. Accessed: 2018-07-15.  INRIA. scikit-learn: Machine learning in python. http:\/\/scikit-learn.org. Accessed: 2018-07-15."},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"J. P. A. Ioannidis. Why most published research findings are false. Plos Med 2(8) 2005.  J. P. A. Ioannidis. Why most published research findings are false. Plos Med 2(8) 2005.","DOI":"10.1371\/journal.pmed.0020124"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559879"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247560"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816674"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1979444"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1710115.1710121"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055330.3055333"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2220357.2220359"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00027"},{"key":"e_1_2_1_61_1","unstructured":"L. Kotthoff C. Thornton H. H. Hoos F. Hutter and K. Leyton-Brown. Auto-weka 2.0: Automatic model selection and hyperparameter optimization in weka. The Journal of Machine Learning Research 18(1):826--830 2017.   L. Kotthoff C. Thornton H. H. Hoos F. Hutter and K. Leyton-Brown. Auto-weka 2.0: Automatic model selection and hyperparameter optimization in weka. The Journal of Machine Learning Research 18(1):826--830 2017."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_63_1","volume-title":"CIDR","author":"Kraska T.","year":"2013"},{"key":"e_1_2_1_64_1","unstructured":"S. Krishnan M. J. Franklin K. Goldberg and E. Wu. Boostclean: Automated error detection and repair for machine learning. arXiv preprint arXiv:1711.01299 2017.  S. Krishnan M. J. Franklin K. Goldberg and E. Wu. Boostclean: Automated error detection and repair for machine learning. arXiv preprint arXiv:1711.01299 2017."},{"key":"e_1_2_1_65_1","unstructured":"L. Li K. Jamieson G. DeSalvo A. Rostamizadeh and A. Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560 2016.   L. Li K. Jamieson G. DeSalvo A. Rostamizadeh and A. Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv preprint arXiv:1603.06560 2016."},{"key":"e_1_2_1_66_1","first-page":"185","article-title":"Hyperband: A novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li L.","year":"2017","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3187009.3177737"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346452"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.12129"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367867"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13721-016-0125-6"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231753"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/22949.22950"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211958"},{"key":"e_1_2_1_75_1","unstructured":"Microsoft hololens. https:\/\/www.microsoft.com\/en-us\/hololens. Accessed: 2018-07-15.  Microsoft hololens. https:\/\/www.microsoft.com\/en-us\/hololens. Accessed: 2018-07-15."},{"key":"e_1_2_1_76_1","unstructured":"Mimic ii data set. https:\/\/mimic.physionet.org\/. Accessed: 2018-07-15.  Mimic ii data set. https:\/\/mimic.physionet.org\/. Accessed: 2018-07-15."},{"key":"e_1_2_1_77_1","unstructured":"B. Myers M. Oskin and B. Howe. Compiling queries for high-performance computing. https:\/\/dada.cs.washington.edu\/research\/tr\/2016\/02\/UW-CSE-16-02-02.pdf. Accessed: 2018-07-15.  B. Myers M. Oskin and B. Howe. Compiling queries for high-performance computing. https:\/\/dada.cs.washington.edu\/research\/tr\/2016\/02\/UW-CSE-16-02-02.pdf. Accessed: 2018-07-15."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544837"},{"key":"e_1_2_1_79_1","unstructured":"J. Nielsen. Powers of 10: Time scales in user experience. Retrieved January 5:2015 2009.  J. Nielsen. Powers of 10: Time scales in user experience. Retrieved January 5:2015 2009."},{"key":"e_1_2_1_80_1","unstructured":"B. of Transportation Statistics. Bureau of transportation statistics. http:\/\/www.transtats.bts.gov 2017. Accessed: 2017-10-21.  B. of Transportation Statistics. Bureau of transportation statistics. http:\/\/www.transtats.bts.gov 2017. Accessed: 2017-10-21."},{"key":"e_1_2_1_81_1","unstructured":"F. Olken. Random Sampling from Databases. PhD thesis University of California at Berkeley 1993.  F. Olken. Random Sampling from Databases. PhD thesis University of California at Berkeley 1993."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.14778\/3213880.3213890"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064013"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506010"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209900.3209904"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.14778\/3199517.3199522"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.26599\/BDMA.2018.9020007"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137645"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/191666.191719"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/1659225.1659227"},{"key":"e_1_2_1_91_1","unstructured":"W. Shen. Data-driven discovery of models (d3m). https:\/\/www.darpa.mil\/program\/data-driven-discovery-of-models. Accessed: 2018-07-15.  W. Shen. Data-driven discovery of models (d3m). https:\/\/www.darpa.mil\/program\/data-driven-discovery-of-models. Accessed: 2018-07-15."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/2514.2517"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025126"},{"key":"e_1_2_1_94_1","unstructured":"E. R. Sparks A. Talwalkar M. J. Franklin M. I. Jordan and T. Kraska. Tupaq: An efficient planner for large-scale predictive analytic queries. CoRR abs\/1502.00068 2015.  E. R. Sparks A. Talwalkar M. J. Franklin M. I. Jordan and T. Kraska. Tupaq: An efficient planner for large-scale predictive analytic queries. CoRR abs\/1502.00068 2015."},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806777.2806945"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/2945.981851"},{"key":"e_1_2_1_97_1","first-page":"633","volume-title":"ICDE","author":"Tan K.","year":"2001"},{"key":"e_1_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487629"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544865"},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1145\/2845644"},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2339857"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733035"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610505"},{"key":"e_1_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-016-0043-6"},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732951.2732964"},{"key":"e_1_2_1_106_1","volume-title":"CIDR","author":"Wu E.","year":"2017"},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.14778\/1952376.1952378"},{"key":"e_1_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085504.3085526"},{"key":"e_1_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055330.3055335"},{"key":"e_1_2_1_110_1","unstructured":"zdnet. Microsoft: Surface hub demand is strong; product is now in stock. https:\/\/www.cs.waikato.ac.nz\/ml\/weka\/.  zdnet. Microsoft: Surface hub demand is strong; product is now in stock. https:\/\/www.cs.waikato.ac.nz\/ml\/weka\/."},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2735381"},{"key":"e_1_2_1_112_1","volume-title":"TVCG","author":"Zgraggen E.","year":"2016"},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346293"},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174053"},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064019"},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3058749"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3229863.3240493","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:16:21Z","timestamp":1672222581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3229863.3240493"}},"subtitle":["an interactive data science system"],"short-title":[],"issued":{"date-parts":[[2018,8]]},"references-count":116,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,8]]}},"alternative-id":["10.14778\/3229863.3240493"],"URL":"https:\/\/doi.org\/10.14778\/3229863.3240493","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,8]]}}}