{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:11:26Z","timestamp":1779174686530,"version":"3.51.4"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Data is often stored in a database management system (DBMS) but dataframe libraries are widely used among data scientists. An important but challenging problem is how to bridge the gap between databases and dataframes. To solve this problem, we present ConnectorX, a client library that enables fast and memory-efficient data loading from various databases to different dataframes. We first investigate why the loading process is slow and consumes large memory. We surprisingly find that the main overhead comes from the client-side rather than query execution or data transfer. We integrate several existing and new techniques to reduce the overhead and carefully design the system architecture and interface to make ConnectorX easy to extend to various databases and dataframes. Moreover, we propose server-side result partitioning that can be adopted by DBMSs in order to better support exporting data to data science tools. We conduct extensive experiments to evaluate ConnectorX and compare it with popular libraries. The results show that ConnectorX significantly outperforms existing solutions. ConnectorX is open sourced at: https:\/\/github.com\/sfu-db\/connector-x.<\/jats:p>","DOI":"10.14778\/3551793.3551847","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:25:03Z","timestamp":1664490303000},"page":"2994-3003","source":"Crossref","is-referenced-by-count":10,"title":["ConnectorX"],"prefix":"10.14778","volume":"15","author":[{"given":"Xiaoying","family":"Wang","sequence":"first","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiyuan","family":"Wu","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinze","family":"Wu","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yizhou","family":"Chen","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nick","family":"Zrymiak","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Changbo","family":"Qu","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lampros","family":"Flokas","sequence":"additional","affiliation":[{"name":"Columbia University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Chow","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiannan","family":"Wang","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianzheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Simon Fraser University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eugene","family":"Wu","sequence":"additional","affiliation":[{"name":"Columbia University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingqing","family":"Zhou","sequence":"additional","affiliation":[{"name":"Tencent Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2001--2022. pickle --- Python object serialization. https:\/\/docs.python.org\/3\/library\/pickle.html. Accessed: 2022-05-01.  2001--2022. pickle --- Python object serialization. https:\/\/docs.python.org\/3\/library\/pickle.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_2_1","unstructured":"2011--2019. Apache Sqoop. https:\/\/sqoop.apache.org\/. Accessed: 2022-05-01.  2011--2019. Apache Sqoop. https:\/\/sqoop.apache.org\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_3_1","volume-title":"Ibis: Write your analytics code once, run it everywhere","unstructured":"2014--2021. Ibis: Write your analytics code once, run it everywhere . http:\/\/ibis-project.org. Accessed: 2022-01-27. 2014--2021. Ibis: Write your analytics code once, run it everywhere. http:\/\/ibis-project.org. Accessed: 2022-01-27."},{"key":"e_1_2_1_4_1","unstructured":"2016. pandas read_sql is unusually slow. https:\/\/stackoverflow.com\/questions\/40045093\/pandas-read-sql-is-unusually-slow. Accessed: 2022-01-27.  2016. pandas read_sql is unusually slow. https:\/\/stackoverflow.com\/questions\/40045093\/pandas-read-sql-is-unusually-slow. Accessed: 2022-01-27."},{"key":"e_1_2_1_5_1","unstructured":"2016. Pandas using too much memory with read_sql_table. https:\/\/stackoverflow.com\/questions\/41253326\/pandas-using-too-much-memory-with-read-sql-table. Accessed: 2022-01-27.  2016. Pandas using too much memory with read_sql_table. https:\/\/stackoverflow.com\/questions\/41253326\/pandas-using-too-much-memory-with-read-sql-table. Accessed: 2022-01-27."},{"key":"e_1_2_1_6_1","unstructured":"2016--2022. Apache Arrow. https:\/\/arrow.apache.org\/. Accessed: 2022-01-27.  2016--2022. Apache Arrow. https:\/\/arrow.apache.org\/. Accessed: 2022-01-27."},{"key":"e_1_2_1_7_1","unstructured":"2017. Program (Time) Bottleneck is Database Interaction. https:\/\/stackoverflow.com\/questions\/44154430\/program-time-bottleneck-is-database-interaction. Accessed: 2022-01-27.  2017. Program (Time) Bottleneck is Database Interaction. https:\/\/stackoverflow.com\/questions\/44154430\/program-time-bottleneck-is-database-interaction. Accessed: 2022-01-27."},{"key":"e_1_2_1_8_1","unstructured":"2017. Use Turbodbc\/Arrow for read_sql_table. https:\/\/github.com\/pandas-dev\/pandas\/issues\/17790. Accessed: 2022-01-27.  2017. Use Turbodbc\/Arrow for read_sql_table. https:\/\/github.com\/pandas-dev\/pandas\/issues\/17790. Accessed: 2022-01-27."},{"key":"e_1_2_1_9_1","unstructured":"2017-2021. Turbodbc - Turbocharged database access for data scientists. https:\/\/turbodbc.readthedocs.io\/en\/latest\/. Accessed: 2022-01-27.  2017-2021. Turbodbc - Turbocharged database access for data scientists. https:\/\/turbodbc.readthedocs.io\/en\/latest\/. Accessed: 2022-01-27."},{"key":"e_1_2_1_10_1","unstructured":"2018. AWS CLI s3. https:\/\/awscli.amazonaws.com\/v2\/documentation\/api\/latest\/reference\/s3\/index.html. Accessed: 2022-05-01.  2018. AWS CLI s3. https:\/\/awscli.amazonaws.com\/v2\/documentation\/api\/latest\/reference\/s3\/index.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_11_1","unstructured":"2021. ConnectorX for ETL workload. https:\/\/github.com\/sfu-db\/connector-x\/discussions\/133. Accessed: 2022-01-27.  2021. ConnectorX for ETL workload. https:\/\/github.com\/sfu-db\/connector-x\/discussions\/133. Accessed: 2022-01-27."},{"key":"e_1_2_1_12_1","unstructured":"2021. ConnectorX for ML feature fetching. https:\/\/github.com\/sfu-db\/connector-x\/issues\/140#issuecomment-948918848. Accessed: 2022-01-27.  2021. ConnectorX for ML feature fetching. https:\/\/github.com\/sfu-db\/connector-x\/issues\/140#issuecomment-948918848. Accessed: 2022-01-27."},{"key":"e_1_2_1_13_1","unstructured":"2021. ConnectorX integrates with dataframe system. https:\/\/pola-rs.github.io\/polars-book\/user-guide\/howcani\/io\/read_db.html. Accessed: 2022-01-27.  2021. ConnectorX integrates with dataframe system. https:\/\/pola-rs.github.io\/polars-book\/user-guide\/howcani\/io\/read_db.html. Accessed: 2022-01-27."},{"key":"e_1_2_1_14_1","unstructured":"2021. DataPrep: The easiest way to prepare data in Python. https:\/\/dataprep.ai\/. Accessed: 2021-01-27.  2021. DataPrep: The easiest way to prepare data in Python. https:\/\/dataprep.ai\/. Accessed: 2021-01-27."},{"key":"e_1_2_1_15_1","unstructured":"2021. DDoS Dataset. https:\/\/www.kaggle.com\/devendra416\/ddos-datasets. Accessed: 2022-01-27.  2021. DDoS Dataset. https:\/\/www.kaggle.com\/devendra416\/ddos-datasets. Accessed: 2022-01-27."},{"key":"e_1_2_1_16_1","unstructured":"2021. Google BigQuery. https:\/\/cloud.google.com\/bigquery. Accessed: 2022-01-27.  2021. Google BigQuery. https:\/\/cloud.google.com\/bigquery. Accessed: 2022-01-27."},{"key":"e_1_2_1_17_1","unstructured":"2021. Polars: Fast multi-threaded DataFrame library in Rust and Python. https:\/\/github.com\/pola-rs\/polars. Accessed: 2022-01-27.  2021. Polars: Fast multi-threaded DataFrame library in Rust and Python. https:\/\/github.com\/pola-rs\/polars. Accessed: 2022-01-27."},{"key":"e_1_2_1_18_1","unstructured":"2021. TPC-H Homepage. http:\/\/www.tpc.org\/tpch. Accessed: 2022-01-27.  2021. TPC-H Homepage. http:\/\/www.tpc.org\/tpch. Accessed: 2022-01-27."},{"key":"e_1_2_1_19_1","unstructured":"2022. Amazon Redshift. https:\/\/aws.amazon.com\/redshift\/. Accessed: 2022-05-01.  2022. Amazon Redshift. https:\/\/aws.amazon.com\/redshift\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_20_1","unstructured":"2022. Amazon Redshift: Unloading data to Amazon S3. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/t_Unloading_tables.html. Accessed: 2022-05-01.  2022. Amazon Redshift: Unloading data to Amazon S3. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/t_Unloading_tables.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_21_1","unstructured":"2022. Amazon S3. https:\/\/aws.amazon.com\/s3\/. Accessed: 2022-05-01.  2022. Amazon S3. https:\/\/aws.amazon.com\/s3\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_22_1","unstructured":"2022. Apache Avro. https:\/\/avro.apache.org\/. Accessed: 2022-05-01.  2022. Apache Avro. https:\/\/avro.apache.org\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_23_1","unstructured":"2022. Apache ORC. https:\/\/orc.apache.org\/. Accessed: 2022-05-01.  2022. Apache ORC. https:\/\/orc.apache.org\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_24_1","unstructured":"2022. Apache Parquet. https:\/\/parquet.apache.org\/. Accessed: 2022-05-01.  2022. Apache Parquet. https:\/\/parquet.apache.org\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_25_1","unstructured":"2022. Azure Blob Storage. https:\/\/azure.microsoft.com\/en-us\/services\/storage\/blobs\/. Accessed: 2022-05-01.  2022. Azure Blob Storage. https:\/\/azure.microsoft.com\/en-us\/services\/storage\/blobs\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_26_1","unstructured":"2022. Azure Data Lake. https:\/\/azure.microsoft.com\/en-us\/solutions\/data-lake\/. Accessed: 2022-05-01.  2022. Azure Data Lake. https:\/\/azure.microsoft.com\/en-us\/solutions\/data-lake\/. Accessed: 2022-05-01."},{"key":"e_1_2_1_27_1","unstructured":"2022. DistCp. https:\/\/hadoop.apache.org\/docs\/r3.1.3\/hadoop-distcp\/DistCp.html. Accessed: 2022-05-01.  2022. DistCp. https:\/\/hadoop.apache.org\/docs\/r3.1.3\/hadoop-distcp\/DistCp.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_28_1","unstructured":"2022. Google BigQuery: Exporting table data. https:\/\/cloud.google.com\/bigquery\/docs\/exporting-data. Accessed: 2022-05-01.  2022. Google BigQuery: Exporting table data. https:\/\/cloud.google.com\/bigquery\/docs\/exporting-data. Accessed: 2022-05-01."},{"key":"e_1_2_1_29_1","unstructured":"2022. Google Cloud Storage. https:\/\/cloud.google.com\/storage. Accessed: 2022-05-01.  2022. Google Cloud Storage. https:\/\/cloud.google.com\/storage. Accessed: 2022-05-01."},{"key":"e_1_2_1_30_1","unstructured":"2022. Ray Data. https:\/\/docs.ray.io\/en\/latest\/data\/getting-started.html. Accessed: 2022-05-01.  2022. Ray Data. https:\/\/docs.ray.io\/en\/latest\/data\/getting-started.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_31_1","unstructured":"2022. The Rust Programming Language - Macro. https:\/\/doc.rust-lang.org\/book\/ch19-06-macros.html. Accessed: 2022-01-27.  2022. The Rust Programming Language - Macro. https:\/\/doc.rust-lang.org\/book\/ch19-06-macros.html. Accessed: 2022-01-27."},{"key":"e_1_2_1_32_1","unstructured":"2022. Spark SQL DataFrames and Datasets Guide. https:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html. Accessed: 2022-05-01.  2022. Spark SQL DataFrames and Datasets Guide. https:\/\/spark.apache.org\/docs\/latest\/sql-programming-guide.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_33_1","unstructured":"2022. Unloading Data from Snowlake. https:\/\/docs.snowlake.com\/en\/user-guide-data-unload.html. Accessed: 2022-05-01.  2022. Unloading Data from Snowlake. https:\/\/docs.snowlake.com\/en\/user-guide-data-unload.html. Accessed: 2022-05-01."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_35_1","unstructured":"SQLAlchemy authors and contributors. 2007--2022. Using Server Side Cursors (a.k.a. stream results). https:\/\/docs.sqlalchemy.org\/en\/14\/core\/connections.html#using-server-side-cursors-a-k-a-stream-results. Accessed: 2022-01-27.  SQLAlchemy authors and contributors. 2007--2022. Using Server Side Cursors (a.k.a. stream results). https:\/\/docs.sqlalchemy.org\/en\/14\/core\/connections.html#using-server-side-cursors-a-k-a-stream-results. Accessed: 2022-01-27."},{"key":"e_1_2_1_36_1","volume-title":"Juliusz Sompolski and Reynold Xin","author":"Bogdan Ionut Ghit Stefania Leone","year":"2021","unstructured":"Stefania Leone Bogdan Ionut Ghit , Juliusz Sompolski and Reynold Xin . 2021 . How We Achieved High-bandwidth Connectivity With BI Tools. https:\/\/databricks.com\/blog\/2021\/08\/11\/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html. Accessed : 2022-01-27. Stefania Leone Bogdan Ionut Ghit, Juliusz Sompolski and Reynold Xin. 2021. How We Achieved High-bandwidth Connectivity With BI Tools. https:\/\/databricks.com\/blog\/2021\/08\/11\/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html. Accessed: 2022-01-27."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137633"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807275"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236194"},{"key":"e_1_2_1_42_1","unstructured":"The Apache Software Foundation. 2016--2021. Apache Arrow Flight. https:\/\/arrow.apache.org\/docs\/format\/Flight.html. Accessed: 2022-01-27.  The Apache Software Foundation. 2016--2021. Apache Arrow Flight. https:\/\/arrow.apache.org\/docs\/format\/Flight.html. Accessed: 2022-01-27."},{"key":"e_1_2_1_43_1","unstructured":"The Python Software Foundation. 2001. PEP 249 - Python Database API Specification v2.0. https:\/\/www.python.org\/dev\/peps\/pep-0249\/. Accessed: 2022-01-27.  The Python Software Foundation. 2001. PEP 249 - Python Database API Specification v2.0. https:\/\/www.python.org\/dev\/peps\/pep-0249\/. Accessed: 2022-01-27."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402764"},{"key":"e_1_2_1_45_1","volume-title":"11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper07","author":"Hagedorn Stefan","year":"2021","unstructured":"Stefan Hagedorn , Steffen Kl\u00e4be , and Kai-Uwe Sattler . 2021 . Putting Pandas in a Box . In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper07 .pdf Stefan Hagedorn, Steffen Kl\u00e4be, and Kai-Uwe Sattler. 2021. Putting Pandas in a Box. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper07.pdf"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-020-2649-2"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2007.55"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380575"},{"key":"e_1_2_1_49_1","volume-title":"Andreas Mueller, et al.","author":"Jindal Alekh","year":"2021","unstructured":"Alekh Jindal , K Venkatesh Emani , Maureen Daum , Olga Poppe , Brandon Haynes , Anna Pavlenko , Ayushi Gupta , Karthik Ramachandra , Carlo Curino , Andreas Mueller, et al. 2021 . Magpie : Python at speed and scale using cloud backends. In CIDR. Alekh Jindal, K Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, et al. 2021. Magpie: Python at speed and scale using cloud backends. In CIDR."},{"key":"e_1_2_1_50_1","volume-title":"10th Conference on Innovative Data Systems Research, CIDR","author":"Karanasos Konstantinos","year":"2020","unstructured":"Konstantinos Karanasos , Matteo Interlandi , Fotis Psallidas , Rathijit Sen , Kwanghyun Park , Ivan Popivanov , Doris Xin , Supun Nakandala , Subru Krishnan , Markus Weimer , Yuan Yu , Raghu Ramakrishnan , and Carlo Curino . 2020. Extending Relational Query Processing with ML Inference . In 10th Conference on Innovative Data Systems Research, CIDR 2020 , Amsterdam, The Netherlands , January 12--15, 2020, Online Proceedings . www.cidrdb.org. http:\/\/cidrdb.org\/cidr2020\/papers\/p24-karanasos-cidr20.pdf Konstantinos Karanasos, Matteo Interlandi, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Doris Xin, Supun Nakandala, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, and Carlo Curino. 2020. Extending Relational Query Processing with ML Inference. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2020\/papers\/p24-karanasos-cidr20.pdf"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196959.3196960"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2618243.2618265"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436913"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137812"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415568"},{"key":"e_1_2_1_56_1","article-title":"MLlib: Machine Learning in Apache Spark","volume":"17","author":"Meng Xiangrui","year":"2016","unstructured":"Xiangrui Meng , Joseph K. Bradley , Burak Yavuz , Evan R. Sparks , Shivaram Venkataraman , Davies Liu , Jeremy Freeman , D. B. Tsai , Manish Amde , Sean Owen , Doris Xin , Reynold Xin , Michael J. Franklin , Reza Zadeh , Matei Zaharia , and Ameet Talwalkar . 2016 . MLlib: Machine Learning in Apache Spark . J. Mach. Learn. Res. 17 (2016), 34:1--34:7. http:\/\/jmlr.org\/papers\/v17\/15-237.html Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17 (2016), 34:1--34:7. http:\/\/jmlr.org\/papers\/v17\/15-237.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_57_1","volume-title":"Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018","author":"Moritz Philipp","year":"2018","unstructured":"Philipp Moritz , Robert Nishihara , Stephanie Wang , Alexey Tumanov , Richard Liaw , Eric Liang , Melih Elibol , Zongheng Yang , William Paul , Michael I. Jordan , and Ion Stoica . 2018 . Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018 , Carlsbad, CA, USA, October 8--10 , 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 561--577. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/nishihara Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 561--577. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/nishihara"},{"key":"e_1_2_1_58_1","unstructured":"The pandas development team. 2008--2021. pandas.read_sql. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.read_sql.html. Accessed: 2022-01-27.  The pandas development team. 2008--2021. pandas.read_sql. https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.read_sql.html. Accessed: 2022-01-27."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.3509134"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407807"},{"key":"e_1_2_1_62_1","volume-title":"R: A Language and Environment for Statistical Computing","author":"Team R Core","year":"2021","unstructured":"R Core Team . 2021 . R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing , Vienna, Austria . https:\/\/www.R-project.org\/ R Core Team. 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https:\/\/www.R-project.org\/"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115408"},{"key":"e_1_2_1_64_1","volume-title":"10th Conference on Innovative Data Systems Research, CIDR","author":"Raasveldt Mark","year":"2020","unstructured":"Mark Raasveldt and Hannes M\u00fchleisen . 2020. Data Management for Data Science - Towards Embedded Analytics . In 10th Conference on Innovative Data Systems Research, CIDR 2020 , Amsterdam, The Netherlands , January 12--15, 2020, Online Proceedings . www.cidrdb.org. http:\/\/cidrdb.org\/cidr2020\/papers\/p23-raasveldt-cidr20.pdf Mark Raasveldt and Hannes M\u00fchleisen. 2020. Data Management for Data Science - Towards Embedded Analytics. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2020\/papers\/p23-raasveldt-cidr20.pdf"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-013"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3329486.3329494"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"e_1_2_1_68_1","unstructured":"Itamar Turner-Trauring. 2021. Loading SQL data into Pandas without running out of memory. https:\/\/pythonspeed.com\/articles\/pandas-sql-chunking\/. Accessed: 2022-01-27.  Itamar Turner-Trauring. 2021. Loading SQL data into Pandas without running out of memory. https:\/\/pythonspeed.com\/articles\/pandas-sql-chunking\/. Accessed: 2022-01-27."},{"key":"e_1_2_1_69_1","unstructured":"Jinze Wu Yizhou Chen Nick Zrymiak Changbo Qu Lampros Flokas George Chow Jiannan Wang Tianzheng Wang Eugene Wu Qingqing Zhou Xiaoying Wang Weiyuan Wu. 2021. [Technical Report] ConnectorX: Accelerating Data Loading From Database to Dataframe. http:\/\/raw.githubusercontent.com\/sfu-db\/connector-x\/main\/assets\/Technical_Report__ConnectorX.pdf.  Jinze Wu Yizhou Chen Nick Zrymiak Changbo Qu Lampros Flokas George Chow Jiannan Wang Tianzheng Wang Eugene Wu Qingqing Zhou Xiaoying Wang Weiyuan Wu. 2021. [Technical Report] ConnectorX: Accelerating Data Loading From Database to Dataframe. http:\/\/raw.githubusercontent.com\/sfu-db\/connector-x\/main\/assets\/Technical_Report__ConnectorX.pdf."},{"key":"e_1_2_1_70_1","volume-title":"11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper17","author":"Zaharia Matei","year":"2021","unstructured":"Matei Zaharia , Ali Ghodsi , Reynold Xin , and Michael Armbrust . 2021 . Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics . In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper17 .pdf Matei Zaharia, Ali Ghodsi, Reynold Xin, and Michael Armbrust. 2021. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11--15, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper17.pdf"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3551793.3551847","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:48:32Z","timestamp":1672224512000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3551793.3551847"}},"subtitle":["accelerating data loading from databases to dataframes"],"short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":70,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.14778\/3551793.3551847"],"URL":"https:\/\/doi.org\/10.14778\/3551793.3551847","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,7]]}}}