{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:28:40Z","timestamp":1777854520143,"version":"3.51.4"},"reference-count":22,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2015,7,22]],"date-time":"2015-07-22T00:00:00Z","timestamp":1437523200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2016,4]]},"abstract":"<jats:p>Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.<\/jats:p>","DOI":"10.1177\/0165551515594728","type":"journal-article","created":{"date-parts":[[2015,7,22]],"date-time":"2015-07-22T21:20:12Z","timestamp":1437600012000},"page":"274-288","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["A novel algorithm for scalable\n                    <i>k<\/i>\n                    -nearest neighbour graph construction"],"prefix":"10.1177","volume":"42","author":[{"given":"Youngki","family":"Park","sequence":"first","affiliation":[{"name":"Seoul National University, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heasoo","family":"Hwang","sequence":"additional","affiliation":[{"name":"University of Seoul, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sang-goo","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2015,7,22]]},"reference":[{"key":"bibr1-0165551515594728","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2003.1167344"},{"key":"bibr2-0165551515594728","first-page":"293","volume-title":"Proceedings of the 4th ACM conference on recommender systems","author":"Davidson J"},{"key":"bibr3-0165551515594728","first-page":"230","volume-title":"Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval","author":"Herlocker JL"},{"key":"bibr4-0165551515594728","first-page":"285","volume-title":"Proceedings of the 10th international conference on World Wide Web","author":"Sarwar B"},{"key":"bibr5-0165551515594728","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071"},{"key":"bibr6-0165551515594728","first-page":"2957","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Heo JP"},{"key":"bibr7-0165551515594728","first-page":"461","volume-title":"Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval","author":"Smucker M"},{"key":"bibr8-0165551515594728","first-page":"1989","volume":"10","author":"Chen J","year":"2009","journal-title":"The Journal of Machine Learning Research"},{"key":"bibr9-0165551515594728","first-page":"430","volume-title":"Proceedings of the 17th international conference on pattern recognition","author":"Hautamaki V"},{"key":"bibr10-0165551515594728","first-page":"327","volume-title":"In: Proceedings of the 19th international conference on database systems for advanced applications","author":"Park Y"},{"key":"bibr11-0165551515594728","first-page":"422","volume-title":"Proceedings of the 21st international conference on database and expert systems applications","author":"Lee D"},{"key":"bibr12-0165551515594728","first-page":"518","volume-title":"Proceedings of the 25th international conference on very large data bases","author":"Gionis A"},{"key":"bibr13-0165551515594728","first-page":"231","volume-title":"Proceedings of the 48th annual meeting of the Association for Computational Linguistics","author":"Durme B"},{"key":"bibr14-0165551515594728","first-page":"380","volume-title":"Proceedings of the 34th annual ACM symposium on theory of computing","author":"Charikar M"},{"key":"bibr15-0165551515594728","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(97)00031-7"},{"key":"bibr16-0165551515594728","first-page":"577","volume-title":"Proceedings of the 20th international World Wide Web conference","author":"Dong W"},{"key":"bibr17-0165551515594728","first-page":"92","volume-title":"Proceedings of the international conference on big data and smart computing","author":"Park Y"},{"key":"bibr18-0165551515594728","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"},{"key":"bibr19-0165551515594728","first-page":"510","volume-title":"Proceedings of the 28th international conference on data engineering","author":"Kim Y"},{"key":"bibr20-0165551515594728","first-page":"916","volume-title":"Proceedings of the 25th international conference on data engineering","author":"Xiao C"},{"key":"bibr21-0165551515594728","first-page":"2035","volume-title":"Proceedings of the 27th symposium on applied computing","author":"Said A"},{"key":"bibr22-0165551515594728","first-page":"131","volume-title":"Proceedings of the 16th international World Wide Web conference","author":"Bayardo R"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551515594728","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551515594728","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551515594728","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:09:14Z","timestamp":1777504154000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551515594728"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,22]]},"references-count":22,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,4]]}},"alternative-id":["10.1177\/0165551515594728"],"URL":"https:\/\/doi.org\/10.1177\/0165551515594728","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,7,22]]}}}