{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T21:47:59Z","timestamp":1762033679763},"reference-count":34,"publisher":"University of Zielona G\u00f3ra, Poland","issue":"3","license":[{"start":{"date-parts":[[2019,9,1]],"date-time":"2019-09-01T00:00:00Z","timestamp":1567296000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.<\/jats:p>","DOI":"10.2478\/amcs-2019-0034","type":"journal-article","created":{"date-parts":[[2019,10,1]],"date-time":"2019-10-01T02:53:12Z","timestamp":1569898392000},"page":"467-476","source":"Crossref","is-referenced-by-count":2,"title":["Efficient Astronomical Data Condensation Using Approximate Nearest Neighbors"],"prefix":"10.61822","volume":"29","author":[{"given":"Szymon","family":"\u0141ukasik","sequence":"first","affiliation":[{"name":"Faculty of Physics and Applied Computer Science , AGH University of Science and Technology , al. Mickiewicza 30, 30-059 Cracow , Poland"},{"name":"Systems Research Institute , Polish Academy of Sciences , ul. Newelska 6, 01-447 Warsaw , Poland , e-mail: slukasik@ibspan.waw.pl, e-mail: pakowal@ibspan.waw.pl, e-mail: kulpi@ibspan.waw.pl"}]},{"given":"Konrad","family":"Lalik","sequence":"additional","affiliation":[{"name":"Faculty of Physics and Applied Computer Science , AGH University of Science and Technology , al. Mickiewicza 30, 30-059 Cracow , Poland"}]},{"given":"Piotr","family":"Sarna","sequence":"additional","affiliation":[{"name":"Faculty of Physics and Applied Computer Science , AGH University of Science and Technology , al. Mickiewicza 30, 30-059 Cracow , Poland"}]},{"given":"Piotr A.","family":"Kowalski","sequence":"additional","affiliation":[{"name":"Faculty of Physics and Applied Computer Science , AGH University of Science and Technology , al. Mickiewicza 30, 30-059 Cracow , Poland"},{"name":"Systems Research Institute , Polish Academy of Sciences , ul. Newelska 6, 01-447 Warsaw , Poland , e-mail: slukasik@ibspan.waw.pl, e-mail: pakowal@ibspan.waw.pl, e-mail: kulpi@ibspan.waw.pl"}]},{"given":"Ma\u0142gorzata","family":"Charytanowicz","sequence":"additional","affiliation":[{"name":"Systems Research Institute , Polish Academy of Sciences , ul. Newelska 6, 01-447 Warsaw , Poland , e-mail: slukasik@ibspan.waw.pl, e-mail: pakowal@ibspan.waw.pl, e-mail: kulpi@ibspan.waw.pl"},{"name":"Faculty of Electrical Engineering and Computer Science , Lublin University of Technology , ul. Nadbystrzycka 38D, 20-618 Lublin , Poland , e-mail: m.charytanowicz@pollub.pl"}]},{"given":"Piotr","family":"Kulczycki","sequence":"additional","affiliation":[{"name":"Faculty of Physics and Applied Computer Science , AGH University of Science and Technology , al. Mickiewicza 30, 30-059 Cracow , Poland"},{"name":"Systems Research Institute , Polish Academy of Sciences , ul. Newelska 6, 01-447 Warsaw , Poland , e-mail: slukasik@ibspan.waw.pl, e-mail: pakowal@ibspan.waw.pl, e-mail: kulpi@ibspan.waw.pl"}]}],"member":"37438","published-online":{"date-parts":[[2019,9,28]]},"reference":[{"key":"2023050302354231638_j_amcs-2019-0034_ref_001_w2aab3b7b3b1b6b1ab1ab1Aa","doi-asserted-by":"crossref","unstructured":"Abraham, S., Philip, N.S., Kembhavi, A., Wadadekar, Y.G. and Sinha, R. (2012). A photometric catalogue of quasars and other point sources in the Sloan Digital Sky Survey, Monthly Notices of the Royal Astronomical Society419(1): 80\u201394, DOI: 10.1111\/j.1365-2966.2011.19674.x.10.1111\/j.1365-2966.2011.19674.x","DOI":"10.1111\/j.1365-2966.2011.19674.x"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_002_w2aab3b7b3b1b6b1ab1ab2Aa","doi-asserted-by":"crossref","unstructured":"Arefin, A.S., Riveros, C., Berretta, R. and Moscato, P. (2012). GPU-FS-kNN: A software tool for fast and scalable kNN computation using GPUs, PLoS One7: e44000, DOI: 10.1371\/journal.pone.0044000.10.1371\/journal.pone.0044000","DOI":"10.1371\/journal.pone.0044000"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_003_w2aab3b7b3b1b6b1ab1ab3Aa","doi-asserted-by":"crossref","unstructured":"Breunig, M.M., Kriegel, H.-P., Ng, R.T. and Sander, J. (2000). LOF: Identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD\u201900, Dallas, TX, USA, pp. 93\u2013104, DOI: 10.1145\/342009.335388.10.1145\/342009.335388","DOI":"10.1145\/342009.335388"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_004_w2aab3b7b3b1b6b1ab1ab4Aa","unstructured":"Bubeck, S. and von Luxburg, U. (2009). Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions, Journal of Machine Learning Research10: 657\u2013698."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_005_w2aab3b7b3b1b6b1ab1ab5Aa","doi-asserted-by":"crossref","unstructured":"Burgess, R., Falc\u00e3o, A.J., Fernandes, T., Ribeiro, R.A., Gomes, M., Krone-Martins, A. and de Almeida, A.M. (2015). Selection of large-scale 3d point cloud data using gesture recognition, in L. Camarinha-Matos et al. (Eds), Technological Innovation for Cloud-Based Engineering Systems, Springer International Publishing, Cham, pp. 188\u2013195, DOI: 10.1007\/978-3-319-16766-4_20.10.1007\/978-3-319-16766-4_20","DOI":"10.1007\/978-3-319-16766-4_20"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_006_w2aab3b7b3b1b6b1ab1ab6Aa","doi-asserted-by":"crossref","unstructured":"Castro-Ginard, A., Jordi, C., Luri, X., Julbe, F., Morvan, M., Balaguer-N\u00fa\u00f1ez, L. and Cantat-Gaudin, T. (2018). A new method for unveiling open clusters in Gaia: New nearby open clusters confirmed by DR2, Astronomy and Astrophysics618: A59.10.1051\/0004-6361\/201833390","DOI":"10.1051\/0004-6361\/201833390"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_007_w2aab3b7b3b1b6b1ab1ab7Aa","doi-asserted-by":"crossref","unstructured":"Chung, Y.-Y., Tirthapura, S. and Woodruff, D.P. (2016). A simple message-optimal algorithm for random sampling from a distributed stream, IEEE Transactions on Knowledge and Data Engineering28(6): 1356\u20131368, DOI: 10.1109\/TKDE.2016.2518679.10.1109\/TKDE.2016.2518679","DOI":"10.1109\/TKDE.2016.2518679"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_008_w2aab3b7b3b1b6b1ab1ab8Aa","doi-asserted-by":"crossref","unstructured":"Czarnowski, I. and Jedrzejowicz, P. (2017). Learning from examples with data reduction and stacked generalization, Journal of Intelligent & Fuzzy Systems32(2): 1401\u20131411.10.3233\/JIFS-169137","DOI":"10.3233\/JIFS-169137"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_009_w2aab3b7b3b1b6b1ab1ab9Aa","doi-asserted-by":"crossref","unstructured":"Dutta, H., Giannella, C., Borne, K. and Kargupta, H. (2005). Distributed top-K outlier detection from astronomy catalogs using the DEMAC system, 2007 SIAM International Conference on Data Minning, Minneapolis, MN, USA, pp. 473\u2013478, DOI: abs\/10.1137\/1.9781611972771.47.10.1137\/1.9781611972771.47","DOI":"10.1137\/1.9781611972771.47"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_010_w2aab3b7b3b1b6b1ab1ac10Aa","doi-asserted-by":"crossref","unstructured":"Eastman, C. and Weiss, S. (1982). Tree structures for high dimensionality nearest neighbor searching, Information Systems7(2): 115\u2013122.10.1016\/0306-4379(82)90023-0","DOI":"10.1016\/0306-4379(82)90023-0"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_011_w2aab3b7b3b1b6b1ab1ac11Aa","doi-asserted-by":"crossref","unstructured":"Freudling, W. and Romaniello, M. (2016). Delivering data reduction pipelines to science users, SPIE Proceedings9910: 99101U, DOI: 10.1117\/12.2232734.10.1117\/12.2232734","DOI":"10.1117\/12.2232734"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_012_w2aab3b7b3b1b6b1ab1ac12Aa","doi-asserted-by":"crossref","unstructured":"Freudling, W. Romaniello, M., Bramich, D.M., Ballester, P., Forchi, V., Garcia-Dablo, C.E., Moehler, S. and Neeser, M.J. (2013). Automated data reduction workflows for astronomy. The ESO Reflex environment, Astronomy and Astrophysics559: A96, DOI: 10.1051\/0004-6361\/201322494.10.1051\/0004-6361\/201322494","DOI":"10.1051\/0004-6361\/201322494"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_013_w2aab3b7b3b1b6b1ab1ac13Aa","unstructured":"GAIA (2018), GAIA Mission, https:\/\/www.cosmos.esa.int\/gaia."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_014_w2aab3b7b3b1b6b1ab1ac14Aa","unstructured":"Grandinetti, L., Joubert, G., Kunze, M. and Pascucci, V. (2015). Big Data and High Performance Computing, Advances in Parallel Computing, IOS Press, Amsterdam."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_015_w2aab3b7b3b1b6b1ab1ac15Aa","doi-asserted-by":"crossref","unstructured":"Hassan, A. and Fluke, C.J. (2011). Scientific visualization in astronomy: Towards the petascale astronomy era, Publications of the Astronomical Society of Australia28(2): 150\u2013170.10.1071\/AS10031","DOI":"10.1071\/AS10031"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_016_w2aab3b7b3b1b6b1ab1ac16Aa","doi-asserted-by":"crossref","unstructured":"Huang, D. and Chow, T.W.S. (2006). Enhancing density-based data reduction using entropy, Neural Computation18(2): 470\u2013495, DOI: 10.1162\/089976606775093927.10.1162\/08997660677509392716378523","DOI":"10.1162\/089976606775093927"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_017_w2aab3b7b3b1b6b1ab1ac17Aa","doi-asserted-by":"crossref","unstructured":"Kulczycki, P. (2008). Kernel estimators in industrial applications, in B. Prasad (Ed.), Soft Computing Applications in Industry, Springer, Berlin\/Heidelberg, pp. 69\u201391, DOI: org\/10.1007\/978-3-540-77465-5_4.10.1007\/978-3-540-77465-5_4","DOI":"10.1007\/978-3-540-77465-5_4"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_018_w2aab3b7b3b1b6b1ab1ac18Aa","doi-asserted-by":"crossref","unstructured":"Li, L., Zhang, Y. and Zhao, Y. (2008). k-Nearest neighbors for automated classification of celestial objects, Science in China G: Physics, Mechanics and Astronomy51(7): 916\u2013922, DOI: 10.1007\/s11433-008-0088-4.10.1007\/s11433-008-0088-4","DOI":"10.1007\/s11433-008-0088-4"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_019_w2aab3b7b3b1b6b1ab1ac19Aa","doi-asserted-by":"crossref","unstructured":"\u0141ukasik, S., Lalik, K., Sarna, P., Kowalski, P.A., Charytanowicz, M. and Kulczycki, P. (2019). Efficient astronomical data condensation using fast nearest neighbor search, in P. Kulczycki et al. (Eds), Information Technology, Systems Research and Computational Physics, Advances in Intelligent Systems and Computing, Vol. 945, Springer, Berlin\/Heidelberg, pp. 107\u2013115.","DOI":"10.1007\/978-3-030-18058-4_9"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_020_w2aab3b7b3b1b6b1ab1ac20Aa","doi-asserted-by":"crossref","unstructured":"\u0141ukasik, S., Moitinho, A., Kowalski, P.A., Falc\u00e3o, A., Ribeiro, R.A. and Kulczycki, P. (2016). Survey of object-based data reduction techniques in observational astronomy, Open Physics14(1): 64, DOI: 10.1515\/phys-2016-0064.10.1515\/phys-2016-0064","DOI":"10.1515\/phys-2016-0064"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_021_w2aab3b7b3b1b6b1ab1ac21Aa","unstructured":"MPI Forum (2015). MPI: A Message-passing Interface Standard: Version 3.1, High-Performance Computing Center, Stuttgart, https:\/\/www.mpi-forum.org\/docs\/mpi-3.1\/mpi31-report.pdf."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_022_w2aab3b7b3b1b6b1ab1ac22Aa","doi-asserted-by":"crossref","unstructured":"Mitra, P., Murthy, C. and Pal, S.K. (2002). Density-based multiscale data condensation, IEEE Transactions on Pattern Analysis and Machine Intelligence24(6): 734\u2013747, DOI: 10.1109\/TPAMI.2002.1008381.10.1109\/TPAMI.2002.1008381","DOI":"10.1109\/TPAMI.2002.1008381"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_023_w2aab3b7b3b1b6b1ab1ac23Aa","doi-asserted-by":"crossref","unstructured":"Muja, M. and Lowe, D.G. (2014). Scalable nearest neighbor algorithms for high dimensional data, IEEE Transactions on Pattern Analysis & Machine Intelligence36(11): 2227\u20132240, DOI: 10.1109\/TPAMI.2014.2321376.10.1109\/TPAMI.2014.232137626353063","DOI":"10.1109\/TPAMI.2014.2321376"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_024_w2aab3b7b3b1b6b1ab1ac24Aa","doi-asserted-by":"crossref","unstructured":"Olvera-L\u00f3pez, J., Ariel Carrasco-Ochoa, J., Martnez-Trinidad, J.F. and Kittler, J. (2010). A review of instance selection methods, Artificial Intelligence Review34(2): 133\u2013143.10.1007\/s10462-010-9165-y","DOI":"10.1007\/s10462-010-9165-y"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_025_w2aab3b7b3b1b6b1ab1ac25Aa","unstructured":"OpenMP Architecture Review Boards (2015). OpenMP 4.5 Complete Specifications, https:\/\/www.openmp.org\/wp-content\/uploads\/openmp-4.5.pdf."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_026_w2aab3b7b3b1b6b1ab1ac26Aa","doi-asserted-by":"crossref","unstructured":"Rocke, D.M. and Dai, J. (2003). Sampling and subsampling for cluster analysis in data mining: With applications to sky survey data, Data Mining and Knowledge Discovery7(2): 215\u2013232, DOI: 10.1023\/A:1022497517599.10.1023\/A:1022497517599","DOI":"10.1023\/A:1022497517599"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_027_w2aab3b7b3b1b6b1ab1ac27Aa","doi-asserted-by":"crossref","unstructured":"Ros, F. and Guillaume, S. (2017). DIDES: A fast and effective sampling for clustering algorithm, Knowledge and Information Systems50(2): 543\u2013568, DOI: 10.1007\/s10115-016-0946-8.10.1007\/s10115-016-0946-8","DOI":"10.1007\/s10115-016-0946-8"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_028_w2aab3b7b3b1b6b1ab1ac28Aa","doi-asserted-by":"crossref","unstructured":"Schirmer, M. (2013). THELI: Convenient reduction of optical, near-infrared, and mid-infrared imaging data, The Astrophysical Journal Supplement Series209(2): 21, DOI: 10.1088\/0067-0049\/209\/2\/21.10.1088\/0067-0049\/209\/2\/21","DOI":"10.1088\/0067-0049\/209\/2\/21"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_029_w2aab3b7b3b1b6b1ab1ac29Aa","doi-asserted-by":"crossref","unstructured":"Szalay, A. and Gray, J. (2001). The world-wide telescope, Science293(5537): 2037\u20132040.10.1126\/science.293.5537.203711557879","DOI":"10.1126\/science.293.5537.2037"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_030_w2aab3b7b3b1b6b1ab1ac30Aa","doi-asserted-by":"crossref","unstructured":"Wang, D., Shi, L. and Cao, J. (2013). Fast algorithm for approximate k-nearest neighbor graph construction, 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, pp. 349\u2013356, DOI: 10.1109\/ICDMW.2013.50.10.1109\/ICDMW.2013.50","DOI":"10.1109\/ICDMW.2013.50"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_031_w2aab3b7b3b1b6b1ab1ac31Aa","doi-asserted-by":"crossref","unstructured":"Wang, X., Tino, P., Fardal, M.A., Raychaudhury, S. and Babul, A. (2009). Fast Parzen window density estimator, 2009 International Joint Conference on Neural Networks, Atlanta, GA, USA, pp. 3267\u20133274, DOI: 10.1109\/IJCNN.2009.5178637.10.1109\/IJCNN.2009.5178637","DOI":"10.1109\/IJCNN.2009.5178637"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_032_w2aab3b7b3b1b6b1ab1ac32Aa","unstructured":"Yianilos, P.N. (1993). Data structures and algorithms for nearest neighbor search in general metric spaces, Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA\u201993, Austin, TX, USA, Vol. 93, pp. 311\u2013321."},{"key":"2023050302354231638_j_amcs-2019-0034_ref_033_w2aab3b7b3b1b6b1ab1ac33Aa","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chung, F. and Wang, S. (2018). Fast reduced set-based exemplar finding and cluster assignment, IEEE Transactions on Systems, Man, and Cybernetics: Systems49(5): 1\u201315.10.1109\/TSMC.2017.2689789","DOI":"10.1109\/TSMC.2017.2689789"},{"key":"2023050302354231638_j_amcs-2019-0034_ref_034_w2aab3b7b3b1b6b1ab1ac34Aa","doi-asserted-by":"crossref","unstructured":"Zhang, Y.-M., Huang, K., Geng, G. and Liu, C.-L. (2013). Fast kNN graph construction with locality sensitive hashing, in H. Blockeel et al. (Eds), Machine Learning and Knowledge Discovery in Databases, Springer, Berlin\/Heidelberg, pp. 660\u2013674, DOI: 10.1007\/978-3-642-40991-2_42.10.1007\/978-3-642-40991-2_42","DOI":"10.1007\/978-3-642-40991-2_42"}],"container-title":["International Journal of Applied Mathematics and Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/content.sciendo.com\/view\/journals\/amcs\/29\/3\/article-p467.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.sciendo.com\/pdf\/10.2478\/amcs-2019-0034","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T10:29:16Z","timestamp":1709202556000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciendo.com\/article\/10.2478\/amcs-2019-0034"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,1]]},"references-count":34,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,9,28]]},"published-print":{"date-parts":[[2019,9,1]]}},"alternative-id":["10.2478\/amcs-2019-0034"],"URL":"https:\/\/doi.org\/10.2478\/amcs-2019-0034","relation":{},"ISSN":["2083-8492"],"issn-type":[{"value":"2083-8492","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,1]]}}}