{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T08:23:20Z","timestamp":1761899000255,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T00:00:00Z","timestamp":1761868800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62201142"],"award-info":[{"award-number":["62201142"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJGI"],"abstract":"<jats:p>Accurate and robust feature matching across multi-view urban imagery is fundamental for urban mapping, 3D reconstruction, and large-scale spatial alignment. Real-world urban scenes involve significant variations in viewpoint, illumination, and occlusion, as well as repetitive architectural patterns that make correspondence estimation challenging. To address these issues, we propose the Cross-Context Aggregation Matcher (CCAM), a detector-free framework that jointly leverages multi-scale local features, long-range contextual information, and geometric priors to produce spatially consistent matches. Specifically, CCAM integrates a multi-scale local enhancement branch with a parallel self- and cross-attention Transformer, enabling the model to preserve detailed local structures while maintaining a coherent global context. In addition, an independent positional encoding scheme is introduced to strengthen geometric reasoning in repetitive or low-texture regions. Extensive experiments demonstrate that CCAM outperforms state-of-the-art methods, achieving up to +31.8%, +19.1%, and +11.5% improvements in AUC@{5\u00b0, 10\u00b0, 20\u00b0} over detector-based approaches and up to 1.72% higher precision compared with detector-free counterparts. These results confirm that CCAM delivers reliable and spatially coherent matches, thereby facilitating downstream geospatial applications.<\/jats:p>","DOI":"10.3390\/ijgi14110425","type":"journal-article","created":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:29:42Z","timestamp":1761895782000},"page":"425","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Cross-Context Aggregation for Multi-View Urban Scene and Building Facade Matching"],"prefix":"10.3390","volume":"14","author":[{"given":"Yaping","family":"Yan","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Southeast University, Nanjing 210096, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuhang","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Automation, Southeast University, Nanjing 210096, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"773","DOI":"10.3390\/rs16050773","article-title":"Large-Scale 3D Reconstruction from Multi-View Imagery: A Comprehensive Review","volume":"16","author":"Luo","year":"2024","journal-title":"Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/j.isprsjprs.2021.03.019","article-title":"Urban functional zone mapping by integrating high spatial resolution nighttime light and daytime multi-view imagery","volume":"175","author":"Huang","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2784","DOI":"10.1109\/TAES.2024.3478169","article-title":"Deep Feature Matching of Different-Modal Images for Visual Geo-Localization of AAVs","volume":"61","author":"Zhang","year":"2025","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2025.3623128","article-title":"Adaptive Differentiation Siamese Fusion Network for Remote Sensing Change Detection","volume":"22","author":"Zhang","year":"2025","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vision"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Bay, H., Tuytelaars, T., and Van Gool, L. (2006, January 7\u201313). SURF: Speeded Up Robust Features. Proceedings of the European Conference on Computer Vision, Graz, Austria.","DOI":"10.1007\/11744023_32"},{"key":"ref_7","unstructured":"Rocco, I., Cimpoi, M., Arandjelovi\u0107, R., Torii, A., Pajdla, T., and Sivic, J. (2018, January 3\u20138). Neighbourhood consensus networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_8","unstructured":"Li, X., Han, K., Li, S., and Prisacariu, V. (2020, January 6\u201312). Dual-Resolution Correspondence Networks. Proceedings of the Advances in Neural Information Processing Systems, Virtual."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021, January 20\u201325). LoFTR: Detector-Free Local Feature Matching with Transformers. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Xu, C., Wang, B., Ye, Z., and Mei, L. (2025). ETQ-Matcher: Efficient Quadtree-Attention-Guided Transformer for Detector-Free Aerial\u2013Ground Image Matching. Remote Sens., 17.","DOI":"10.3390\/rs17071300"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. (2011, January 6\u201313). ORB: An efficient alternative to SIFT or SURF. Proceedings of the International Conference on Computer Vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126544"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., and Sattler, T. (2019, January 15\u201320). D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00828"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"DeTone, D., Malisiewicz, T., and Rabinovich, A. (2018, January 18\u201323). Superpoint: Self-supervised interest point detection and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00060"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5669","DOI":"10.1109\/TNNLS.2021.3130655","article-title":"RDLNet: A Regularized Descriptor Learning Network","volume":"34","author":"Zhang","year":"2023","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.isprsjprs.2025.05.009","article-title":"A novel real-time matching and pose reconstruction method for low-overlap agricultural UAV images with repetitive textures","volume":"226","author":"Xiao","year":"2025","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, H., Xiong, W., Zhou, F., He, Z., Zhang, T., and Sheng, Z. (2025). Topology-Aware Multi-View Street Scene Image Matching for Cross-Daylight Conditions Integrating Geometric Constraints and Semantic Consistency. ISPRS Int. J. Geo-Inf., 14.","DOI":"10.3390\/ijgi14060212"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., DeTone, D., Malisiewicz, T., and Rabinovich, A. (2020, January 13\u201319). SuperGlue: Learning Feature Matching With Graph Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00499"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lu, X., Yan, Y., Kang, B., and Du, S. (2023, January 7\u201314). ParaFormer: Parallel Attention Transformer for Efficient Feature Matching. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i2.25275"},{"key":"ref_19","first-page":"1","article-title":"UAV Image Stitching With Transformer and Small Grid Reformation","volume":"20","author":"Cui","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_20","first-page":"1","article-title":"Sequence Matching for Image-Based UAV-to-Satellite Geolocalization","volume":"62","author":"Wang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yu, J., Chang, J., He, J., Zhang, T., Yu, J., and Wu, F. (2023, January 18\u201322). Adaptive Spot-Guided Transformer for Consistent Local Feature Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.02097"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Edstedt, J., Sun, Q., B\u00f6kman, G., Wadenb\u00e4ck, M., and Felsberg, M. (2024, January 16\u201322). RoMa: Robust Dense Feature Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01871"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, D., Yan, Y., Liang, D., and Du, S. (2023, January 4\u201310). MSFORMER: Multi-Scale Transformer with Neighborhood Consensus for Feature Matching. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10095240"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Cheng, X., Zhai, X., Xue, L., and Du, S. (2023, January 2\u20134). CSFormer: Cross-Scale Transformer for Feature Matching. Proceedings of the International Conference on Sensing, Measurement and Data Analytics in the Era of Artificial Intelligence, Xi\u2019an, China.","DOI":"10.1109\/ICSMD60522.2023.10490714"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2024.3510781","article-title":"Multimodal Remote Sensing Image Matching via Learning Features and Attention Mechanism","volume":"62","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","unstructured":"Cuturi, M. (2013, January 5\u201310). Sinkhorn distances: Lightspeed computation of optimal transport. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, Z., and Snavely, N. (2018, January 18\u201323). MegaDepth: Learning Single-View Depth Prediction from Internet Photos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00218"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-Motion Revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., and Nie\u00dfner, M. (2017, January 21\u201326). ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.261"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bian, J., Lin, W.Y., Matsushita, Y., Yeung, S.K., Nguyen, T.D., and Cheng, M.M. (2017, January 21\u201326). GMS: Grid-Based Motion Statistics for Fast, Ultra-Robust Feature Correspondence. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.302"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., and Fua, P. (2018, January 18\u201323). Learning to Find Good Correspondences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00282"},{"key":"ref_35","unstructured":"Zhang, J., Sun, D., Luo, Z., Yao, A., Zhou, L., Shen, T., Chen, Y., Liao, H., and Quan, L. (November, January 27). Learning Two-View Correspondences and Geometry Using Order-Aware Network. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Luo, Z., Shen, T., Zhou, L., Zhang, J., Yao, Y., Li, S., Fang, T., and Quan, L. (2019, January 15\u201320). ContextDesc: Local Descriptor Augmentation With Cross-Modality Context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00263"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., and Sivic, J. (2018, January 18\u201323). Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00897"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sarlin, P.E., Cadena, C., Siegwart, R., and Dymczyk, M. (2019, January 15\u201320). From Coarse to Fine: Robust Hierarchical Localization at Large Scale. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01300"}],"container-title":["ISPRS International Journal of Geo-Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/11\/425\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:41:38Z","timestamp":1761896498000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2220-9964\/14\/11\/425"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,31]]},"references-count":38,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["ijgi14110425"],"URL":"https:\/\/doi.org\/10.3390\/ijgi14110425","relation":{},"ISSN":["2220-9964"],"issn-type":[{"value":"2220-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,31]]}}}