{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T05:25:16Z","timestamp":1761110716958,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003593","name":"Brazilian funding Agency National Council for Scientific and Technological Development (CNPq)","doi-asserted-by":"publisher","award":["177169\/2023-0"],"award-info":[{"award-number":["177169\/2023-0"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>While object detection and recognition have been extensively adopted by many applications in decision-making, new algorithms and methodologies have emerged to enhance the automatic identification of target objects. In particular, the rise of deep learning and language models has opened many possibilities in this area, although challenges in contextual query analysis and human interactions persist. This article presents a novel neuro-symbolic object detection framework that aligns object proposals with textual prompts using a deep learning module while enabling logical reasoning through a symbolic module. By integrating deep learning with symbolic reasoning, object detection and scene understanding are considerably enhanced, enabling complex, query-driven interactions. Using a synthetic 3D image dataset, the results demonstrate that this framework effectively generalizes to complex queries, combining simple attribute-based descriptions without explicit training on compound prompts. We present the numerical results and comprehensive discussions, highlighting the potential of our approach for emerging smart applications.<\/jats:p>","DOI":"10.3390\/s25072258","type":"journal-article","created":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T05:51:26Z","timestamp":1743659486000},"page":"2258","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Integrating Textual Queries with AI-Based Object Detection: A Compositional Prompt-Guided Approach"],"prefix":"10.3390","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6269-494X","authenticated-orcid":false,"given":"Silvan","family":"Ferreira","sequence":"first","affiliation":[{"name":"Graduate Program in Electrical and Computer Engineering, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9486-4509","authenticated-orcid":false,"given":"Allan","family":"Martins","sequence":"additional","affiliation":[{"name":"Graduate Program in Electrical and Computer Engineering, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3988-8476","authenticated-orcid":false,"given":"Daniel G.","family":"Costa","sequence":"additional","affiliation":[{"name":"SYSTEC-ARISE, Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0116-6489","authenticated-orcid":false,"given":"Ivanovitch","family":"Silva","sequence":"additional","affiliation":[{"name":"Graduate Program in Electrical and Computer Engineering, Federal University of Rio Grande do Norte, Natal 59078-970, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"124664","DOI":"10.1016\/j.eswa.2024.124664","article-title":"Artificial intelligence based object detection and traffic prediction by autonomous vehicles\u2014A review","volume":"255","author":"Preeti","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.jmsy.2022.06.011","article-title":"Deep learning methods for object detection in smart manufacturing: A survey","volume":"64","author":"Ahmad","year":"2022","journal-title":"J. Manuf. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"12253","DOI":"10.1007\/s11042-023-15981-y","article-title":"A systematic review of object detection from images using deep learning","volume":"83","author":"Kaur","year":"2024","journal-title":"Multimed. Tools Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Manakitsa, N., Maraslidis, G.S., Moysis, L., and Fragulis, G.F. (2024). A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision. Technologies, 12.","DOI":"10.3390\/technologies12020015"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"101153","DOI":"10.1016\/j.iot.2024.101153","article-title":"Internet of Intelligent Things: A convergence of embedded systems, edge computing and machine learning","volume":"26","author":"Oliveira","year":"2024","journal-title":"Internet Things"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_7","unstructured":"Vaswani, A. (2017). Attention is all you need. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.procs.2024.02.148","article-title":"An Artificial Intelligence-based Application for Recognizing and Identifying Aerial Objects based on Voice Input","volume":"234","author":"Affandi","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"100283","DOI":"10.1016\/j.dajour.2023.100283","article-title":"An end-to-end pollution analysis and detection system using artificial intelligence and object detection algorithms","volume":"8","author":"Hossain","year":"2023","journal-title":"Decis. Anal. J."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., and Parikh, D. (2015, January 7\u201313). VQA: Visual question answering. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.279"},{"key":"ref_11","unstructured":"Lu, J., Batra, D., Parikh, D., and Lee, S. (2019). ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tan, H., and Bansal, M. (2019, January 3\u20137). LXMERT: Learning cross-modality encoder representations from transformers. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1514"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"12809","DOI":"10.1007\/s00521-024-09960-z","article-title":"Neuro-symbolic artificial intelligence: A survey","volume":"36","author":"Bhuyan","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_14","unstructured":"Garcez, A.D., Gori, M., Lamb, L.C., Serafini, L., Spranger, M., and Tran, P. (2019). Neuro-symbolic AI: The third wave. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lamb, L.C., Garcez, A., Gori, M., Prates, M., Avelar, P., and Vardi, M. (2020). Graph neural networks meet neural-symbolic computing: A survey and perspective. Front. Artif. Intell., 3.","DOI":"10.24963\/ijcai.2020\/679"},{"key":"ref_16","unstructured":"Mao, J., Gan, C., Kohli, P., Tenenbaum, J.B., and Wu, J. (2019). Neuro-symbolic concept learner. arXiv."},{"key":"ref_17","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2420","DOI":"10.1007\/s11263-024-01989-w","article-title":"Oriented R-CNN and beyond","volume":"132","author":"Xie","year":"2024","journal-title":"Int. J. Comput. Vis."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"15171","DOI":"10.1109\/TPAMI.2023.3319634","article-title":"Mutual-assistance learning for object detection","volume":"45","author":"Xie","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"NAI-240719","DOI":"10.3233\/NAI-240719","article-title":"A Survey of Neurosymbolic Visual Reasoning with Scene Graphs and Common Sense Knowledge","volume":"1","author":"Khan","year":"2025","journal-title":"Neurosymb. Artif. Intell."},{"key":"ref_24","unstructured":"Colelough, B., and Regli, W. (2024). Neuro-Symbolic AI in 2024: A Systematic Review. arXiv."},{"key":"ref_25","first-page":"512","article-title":"Neuro-Symbolic Reasoning for Complex Visual Domains","volume":"71","author":"Mashayekhi","year":"2024","journal-title":"J. Artif. Intell. Res."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhang, F., and Xu, C. (2023, January 17\u201324). VQACL: A Novel Visual Question Answering Continual Learning Setting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01831"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, P., Yang, Q., Geng, X., Zhou, W., Ding, Z., and Nian, Y. (2024). Exploring Diverse Methods in Visual Question Answering. arXiv.","DOI":"10.1109\/ICECAI62591.2024.10674838"},{"key":"ref_28","first-page":"88","article-title":"Attention-Driven Multimodal Alignment for Visual Question Answering","volume":"177","author":"Xiao","year":"2023","journal-title":"Pattern Recognit. Lett."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Biten, A.F., Litman, R., Xie, Y., Appalaraju, S., and Manmatha, R. (2022, January 18\u201324). LaTr: Layout-Aware Transformer for Scene-Text VQA. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01605"},{"key":"ref_30","unstructured":"Sundar, R., and Goel, P. (2023, January 7\u201314). LiT-4-RSVQA: Lightweight Transformer for Resource-Constrained VQA. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhang, Y., Long, D., Xie, W., Dai, Z., Tang, J., Lin, H., Yang, B., Xie, P., and Huang, F. (2024). mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval. arXiv.","DOI":"10.18653\/v1\/2024.emnlp-industry.103"},{"key":"ref_32","unstructured":"Rocktaschel, T., and Riedel, S. (2017). End-to-End Differentiable Proving. arXiv."},{"key":"ref_33","unstructured":"Dong, H., Mao, J., Lin, T., Wang, C., Li, L., and Zhou, D. (2019). Neural Logic Machines. arXiv."},{"key":"ref_34","unstructured":"Serafini, L., and Garcez, A.S. (2016, January 19\u201323). Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge. Proceedings of the European Conference on Machine Learning, Riva del Garda, Italy."},{"key":"ref_35","unstructured":"Yi, K., Wu, J., Gan, C., and Tenenbaum, J.B. (2020). CLEVRER: CoLlision Events for Video REasoning. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Marino, K., Rastegari, M., Farhadi, A., and Mottaghi, R. (2021, January 20\u201325). KRISP: Integrating implicit and symbolic knowledge for open-domain knowledge-based VQA. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01389"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_38","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (2019). Searching for MobileNetV3. arXiv.","DOI":"10.1109\/ICCV.2019.00140"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_41","unstructured":"Zhang, H., Li, K., Zhang, L., Li, Y., Zhang, L., Li, Y., Zhang, L., Li, Y., Zhang, L., and Li, Y. (2023). mGTE: Multilingual Generative Text Encoder for Cross-Lingual Text-Image Retrieval. arXiv."},{"key":"ref_42","unstructured":"Loshchilov, I., and Hutter, F. (2018). Decoupled weight decay regularization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/25\/7\/2258\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:09:22Z","timestamp":1760029762000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/25\/7\/2258"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,3]]},"references-count":42,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["s25072258"],"URL":"https:\/\/doi.org\/10.3390\/s25072258","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2025,4,3]]}}}