{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T01:48:59Z","timestamp":1756000139637},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:p>\n            Data scientists spend most of their time dealing with data preparation, rather than doing what they know best: build machine learning models and algorithms to solve previously unsolvable problems. In this paper, we describe the Visual Data Management System (VDMS), and demonstrate how it can be used to simplify the data preparation process and consequently gain in efficiency simply because we are using a system designed for the job. To demonstrate this, we use one of the largest available public datasets (YFCC100M), with 100 million images and videos, plus additional data including machine-generated tags, for a total of about ~12TB of data. VDMS differs from existing data management systems due to its focus on supporting machine learning and data analytics pipelines that rely on images, videos, and feature vectors, treating these as first class citizens. We demonstrate how VDMS outperforms well-known and widely used systems for data management by up to ~364x, with an average improvement of about 85x for our use-cases, and particularly at scale, for a\n            <jats:italic>image search<\/jats:italic>\n            engine implementation. At the same time, VDMS simplifies the process of data preparation and data access, and provides functionalities non-existent in alternative options.\n          <\/jats:p>","DOI":"10.14778\/3476311.3476381","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T22:48:43Z","timestamp":1635461323000},"page":"3240-3252","source":"Crossref","is-referenced-by-count":5,"title":["Using VDMS to index and search 100M images"],"prefix":"10.14778","volume":"14","author":[{"given":"Luis","family":"Remis","sequence":"first","affiliation":[{"name":"ApertureData"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaunt\u00e9 W.","family":"Lacewell","sequence":"additional","affiliation":[{"name":"Intel Labs"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46759-7_15"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/276305.276386"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1924943.1924947"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2523356"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807271"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"e_1_2_1_7_1","volume-title":"Adalbert Gerald Soosai Raj, and Jignesh M Patel","author":"Fan Jing","year":"2015"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.142"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063444"},{"key":"e_1_2_1_11_1","volume-title":"Retrieved","author":"Hoydalsvik Geir","year":"2019"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1096673.1096686"},{"key":"e_1_2_1_13_1","volume-title":"Retrieved","author":"PR.","year":"2015"},{"key":"e_1_2_1_14_1","volume-title":"International Journal of Engineering Research and Technology 1","author":"Jatana Nishtha","year":"2012"},{"key":"e_1_2_1_15_1","volume-title":"Billion-scale similarity search with GPUs. CoRR abs\/1702.08734","author":"Johnson Jeff","year":"2017"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3054775"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.22224\/gistbok\/2018.2.10"},{"key":"e_1_2_1_21_1","unstructured":"Libffmpeg. [n.d.]. FFMPEG Library. Retrieved July 23 2021 from http:\/\/source.ffmpeg.org  Libffmpeg. [n.d.]. FFMPEG Library. Retrieved July 23 2021 from http:\/\/source.ffmpeg.org"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3363554"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Southern Association for Information Systems Conference","volume":"2324","author":"Miller Justin J","year":"2013"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685078"},{"key":"e_1_2_1_25_1","volume-title":"Retrieved","author":"Oracle Co. [n.d.]. The world's","year":"2021"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025117"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415564"},{"key":"e_1_2_1_28_1","volume-title":"VDMS: An Efficient Big-Visual-Data Access for Machine Learning Workloads. Systems for Machine Learning Workshop (SysML) at NIPS, Montreal, Canada abs\/1810","author":"Remis Luis","year":"2018"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13174-010-0001-z"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969519"},{"key":"e_1_2_1_31_1","volume-title":"Retrieved","author":"Inc. [n.d.]. SingleStore","year":"2021"},{"key":"e_1_2_1_32_1","volume-title":"Retrieved","author":"Software Foundation The Apache","year":"2021"},{"key":"e_1_2_1_33_1","volume-title":"Retrieved","author":"Software Foundation The Apache","year":"2021"},{"key":"e_1_2_1_34_1","volume-title":"Retrieved","author":"The PostgreSQL Global Development Group","year":"2021"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_2_1_36_1","volume-title":"Available at least as early as Jul 72","author":"Varda Kenton","year":"2008"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213957"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3476311.3476381","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:37:16Z","timestamp":1672227436000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3476311.3476381"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":38,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.14778\/3476311.3476381"],"URL":"https:\/\/doi.org\/10.14778\/3476311.3476381","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}