{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:13:11Z","timestamp":1757617991467,"version":"3.44.0"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T00:00:00Z","timestamp":1748822400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T00:00:00Z","timestamp":1748822400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001636","name":"University College Cork","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001636","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Machine Vision and Applications"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Locating multiple objects has become an important task in multimedia research and applications due to the common nature of real-world images. Object localization requires a large number of visual annotations, such as bounding boxes or segmentation, but the annotation process is labour-intensive and sometimes inextricable for human experts in complex domains such as manufacturing and medical fields. Moving beyond single object localization, this paper presents a weakly semi-supervised learning framework based on Graph Transformer Networks using Class Activation Maps to LOcate Multiple Objects (GraphLOMO) in images without visual annotations. Our method overcomes the computational challenges of gradient-based CAM while integrating topological information and prior knowledge into object localization. Moreover, we investigate the higher order of object inter-dependencies with the use of 3D adjacency matrix for better performance. Extensive empirical experiments are conducted on MS-COCO and Pascal VOC to establish a suitable performance measure and baselines, as well as a state-of-the-art for weakly semi-supervised multi-object localization.<\/jats:p>","DOI":"10.1007\/s00138-025-01707-7","type":"journal-article","created":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T08:36:30Z","timestamp":1748853390000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["GraphLOMO: LOcating multiple objects without visual annotations"],"prefix":"10.1007","volume":"36","author":[{"given":"Alex","family":"To","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph G.","family":"Davis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hoang D.","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,6,2]]},"reference":[{"key":"1707_CR1","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921\u20132929 (2016)","DOI":"10.1109\/CVPR.2016.319"},{"issue":"11","key":"1707_CR2","doi-asserted-by":"publisher","first-page":"1213","DOI":"10.3390\/jpm11111213","volume":"11","author":"M Esmaeili","year":"2021","unstructured":"Esmaeili, M., Vettukattil, R., Banitalebi, H., Krogh, N.R., Geitung, J.T.: Explainable artificial intelligence for human-machine interaction in brain tumor localization. J. Personal. Med. 11(11), 1213 (2021)","journal-title":"J. Personal. Med."},{"key":"1707_CR3","doi-asserted-by":"publisher","DOI":"10.1016\/j.jneumeth.2021.109098","volume":"353","author":"Y Zhang","year":"2021","unstructured":"Zhang, Y., Hong, D., McClement, D., Oladosu, O., Pridham, G., Slaney, G.: Grad-cam helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging. J. Neurosci. Methods 353, 109098 (2021)","journal-title":"J. Neurosci. Methods"},{"key":"1707_CR4","doi-asserted-by":"crossref","unstructured":"Ng, H.G., Kerzel, M., Mehnert, J., May, A., Wermter, S.: Classification of MRI migraine medical data using 3D convolutional neural network. In: International Conference on Artificial Neural Networks, pp. 300\u2013309. Springer (2018)","DOI":"10.1007\/978-3-030-01424-7_30"},{"key":"1707_CR5","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248\u2013255. IEEE (2009)","DOI":"10.1109\/CVPR.2009.5206848"},{"issue":"2","key":"1707_CR6","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303\u2013338 (2010)","journal-title":"Int. J. Comput. Vision"},{"key":"1707_CR7","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740\u2013755. Springer 2014)","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1707_CR8","doi-asserted-by":"crossref","unstructured":"Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 2219\u20132228 (2019)","DOI":"10.1109\/CVPR.2019.00232"},{"key":"1707_CR9","doi-asserted-by":"crossref","unstructured":"Singh, K.K., Lee, Y.J.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3544\u20133553. IEEE (2017)","DOI":"10.1109\/ICCV.2017.381"},{"key":"1707_CR10","doi-asserted-by":"crossref","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 6023\u20136032 (2019)","DOI":"10.1109\/ICCV.2019.00612"},{"key":"1707_CR11","doi-asserted-by":"crossref","unstructured":"Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.S.: Adversarial complementary learning for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1325\u20131334 (2018)","DOI":"10.1109\/CVPR.2018.00144"},{"key":"1707_CR12","doi-asserted-by":"crossref","unstructured":"Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 597\u2013613 (2018)","DOI":"10.1007\/978-3-030-01258-8_37"},{"key":"1707_CR13","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618\u2013626 (2017)","DOI":"10.1109\/ICCV.2017.74"},{"key":"1707_CR14","doi-asserted-by":"crossref","unstructured":"Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839\u2013847. IEEE (2018)","DOI":"10.1109\/WACV.2018.00097"},{"key":"1707_CR15","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., Hu, X.: Score-cam: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 24\u201325 (2020)","DOI":"10.1109\/CVPRW50498.2020.00020"},{"key":"1707_CR16","unstructured":"Ramaswamy, H.G., et al.: Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp. 983\u2013991 (2020)"},{"key":"1707_CR17","first-page":"5866","volume":"44","author":"D Zhang","year":"2021","unstructured":"Zhang, D., Han, J., Cheng, G., Yang, M.-H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5866 (2021)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"1707_CR18","doi-asserted-by":"crossref","unstructured":"Choe, J., Oh, S.J., Lee, S., Chun, S., Akata, Z., Shim, H.: Evaluating weakly supervised object localization methods right. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3133\u20133142 (2020)","DOI":"10.1109\/CVPR42600.2020.00320"},{"issue":"2","key":"1707_CR19","doi-asserted-by":"publisher","first-page":"38","DOI":"10.3390\/diagnostics9020038","volume":"9","author":"I Kim","year":"2019","unstructured":"Kim, I., Rajaraman, S., Antani, S.: Visual interpretation of convolutional neural network predictions in classifying medical image modalities. Diagnostics 9(2), 38 (2019)","journal-title":"Diagnostics"},{"key":"1707_CR20","doi-asserted-by":"crossref","unstructured":"Yu, X., Gong, Y., Jiang, N., Ye, Q., Han, Z.: Scale match for tiny person detection. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp. 1257\u20131265 (2020)","DOI":"10.1109\/WACV45572.2020.9093394"},{"issue":"4","key":"1707_CR21","doi-asserted-by":"publisher","first-page":"076","DOI":"10.1093\/pnasnexus\/pgad076","volume":"2","author":"S Li","year":"2023","unstructured":"Li, S., Brandt, M., Fensholt, R., Kariryaa, A., Igel, C., Gieseke, F., Nord-Larsen, T., Oehmcke, S., Carlsen, A.H., Junttila, S., et al.: Deep learning enables image-based tree counting, crown segmentation, and height prediction at national scale. PNAS Nexus 2(4), 076 (2023)","journal-title":"PNAS Nexus"},{"issue":"1","key":"1707_CR22","doi-asserted-by":"publisher","first-page":"903","DOI":"10.1038\/s41598-020-79653-9","volume":"11","author":"M Onishi","year":"2021","unstructured":"Onishi, M., Ise, T.: Explainable identification and mapping of trees using UAV RGB image and deep learning. Sci. Rep. 11(1), 903 (2021)","journal-title":"Sci. Rep."},{"key":"1707_CR23","doi-asserted-by":"crossref","unstructured":"Hwang, S., Kim, H.-E.: Self-transfer learning for weakly supervised lesion localization. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 239\u2013246. Springer (2016)","DOI":"10.1007\/978-3-319-46723-8_28"},{"key":"1707_CR24","unstructured":"Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)"},{"key":"1707_CR25","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)"},{"key":"1707_CR26","unstructured":"Yun, S., Jeong, M., Kim, R., Kang, J., Kim, H.J.: Graph transformer networks. arXiv preprint arXiv:1911.06455 (2019)"},{"key":"1707_CR27","unstructured":"Zhang, J., Zhang, H., Xia, C., Sun, L.: Graph-bert: only attention is needed for learning graph representations. arXiv preprint arXiv:2001.05140 (2020)"},{"key":"1707_CR28","doi-asserted-by":"crossref","unstructured":"Chen, Z.-M., Wei, X.-S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 5177\u20135186 (2019)","DOI":"10.1109\/CVPR.2019.00532"},{"key":"1707_CR29","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532\u20131543 (2014)","DOI":"10.3115\/v1\/D14-1162"},{"key":"1707_CR30","doi-asserted-by":"crossref","unstructured":"Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1081\u20131089 (2015)","DOI":"10.1109\/CVPR.2015.7298711"},{"key":"1707_CR31","doi-asserted-by":"crossref","unstructured":"Jain, P., Kapoor, A.: Active learning for large multi-class problems. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 762\u2013769. IEEE (2009)","DOI":"10.1109\/CVPR.2009.5206651"},{"key":"1707_CR32","doi-asserted-by":"crossref","unstructured":"Yao, A., Gall, J., Leistner, C., Van\u00a0Gool, L.: Interactive object detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3242\u20133249. IEEE (2012)","DOI":"10.1109\/CVPR.2012.6248060"},{"issue":"1","key":"1707_CR33","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s11263-014-0721-9","volume":"108","author":"S Vijayanarasimhan","year":"2014","unstructured":"Vijayanarasimhan, S., Grauman, K.: Large-scale live active learning: training object detectors with crawled data and crowds. Int. J. Comput. Vision 108(1), 97\u2013114 (2014)","journal-title":"Int. J. Comput. Vision"},{"key":"1707_CR34","doi-asserted-by":"crossref","unstructured":"Brust, C.-A., K\u00e4ding, C., Denzler, J.: Active learning for deep object detection. arXiv preprint arXiv:1809.09875 (2018)","DOI":"10.5220\/0007248601810190"},{"key":"1707_CR35","doi-asserted-by":"crossref","unstructured":"Papadopoulos, D.P., Uijlings, J.R., Keller, F., Ferrari, V.: We don\u2019t need no bounding-boxes: training object class detectors using only human verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 854\u2013863 (2016)","DOI":"10.1109\/CVPR.2016.99"},{"key":"1707_CR36","doi-asserted-by":"crossref","unstructured":"Batchelor, O., Green, R.: Object detection for verification based annotation. In: 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1\u20136. IEEE (2019)","DOI":"10.1109\/IVCNZ48456.2019.8961012"},{"key":"1707_CR37","unstructured":"Zhou, X., Wang, D., Kr\u00e4henb\u00fchl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)"},{"key":"1707_CR38","doi-asserted-by":"crossref","unstructured":"Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 850\u2013859 (2019)","DOI":"10.1109\/CVPR.2019.00094"},{"key":"1707_CR39","doi-asserted-by":"crossref","unstructured":"Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 9657\u20139666 (2019)","DOI":"10.1109\/ICCV.2019.00975"},{"key":"1707_CR40","doi-asserted-by":"crossref","unstructured":"Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734\u2013750 (2018)","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"1707_CR41","doi-asserted-by":"crossref","unstructured":"Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479\u20136489 (2019)","DOI":"10.1109\/CVPR.2019.00664"},{"key":"1707_CR42","unstructured":"Su, H., Deng, J., Fei-Fei, L.: Crowdsourcing annotations for visual object detection. In: Workshops at the 26h AAAI Conference on Artificial Intelligence (2012)"},{"key":"1707_CR43","unstructured":"Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)"},{"key":"1707_CR44","unstructured":"Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)"},{"key":"1707_CR45","doi-asserted-by":"crossref","unstructured":"Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 685\u2013694 (2015)","DOI":"10.1109\/CVPR.2015.7298668"},{"key":"1707_CR46","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492\u20131500 (2017)","DOI":"10.1109\/CVPR.2017.634"},{"key":"1707_CR47","unstructured":"Yalniz, I.Z., J\u00e9gou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019)"},{"key":"1707_CR48","unstructured":"Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105\u20136114. PMLR (2019)"},{"key":"1707_CR49","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)"},{"key":"1707_CR50","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"1707_CR51","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., Guestrin, C.: Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135\u20131144 (2016)","DOI":"10.1145\/2939672.2939778"},{"issue":"1","key":"1707_CR52","first-page":"1","volume":"10","author":"A Gahramanova","year":"2019","unstructured":"Gahramanova, A.: Locating centers of mass with image processing. Undergrad. J. Math. Model One+ Two 10(1), 1 (2019)","journal-title":"Undergrad. J. Math. Model One+ Two"}],"container-title":["Machine Vision and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00138-025-01707-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00138-025-01707-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00138-025-01707-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T16:48:35Z","timestamp":1757177315000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00138-025-01707-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,2]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["1707"],"URL":"https:\/\/doi.org\/10.1007\/s00138-025-01707-7","relation":{},"ISSN":["0932-8092","1432-1769"],"issn-type":[{"type":"print","value":"0932-8092"},{"type":"electronic","value":"1432-1769"}],"subject":[],"published":{"date-parts":[[2025,6,2]]},"assertion":[{"value":"21 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 April 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 May 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"83"}}