{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T03:18:12Z","timestamp":1773112692820,"version":"3.50.1"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM J. Comput. Sustain. Soc."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>Critical infrastructure, such as roads and electricity, are core systems that enable economic development. However, these crucial systems are frequently under-monitored in developing regions, resulting in lost opportunities for growth. Recent advances in remote sensing and machine learning have enabled monitoring and measurement of infrastructure faster and more frequently than traditional methods. However, ground data are often unavailable, resulting in a disconnect between labels and remotely sensed data. Furthermore, data from industrialized regions can only sometimes be transferred to regions with sparse data due to differences in the concept of quality between regions. Additionally, inconsistency in data and the complexity of ML models can introduce bias due to learned characteristics across diverse regions, leading to inaccurate predictions and recommendations for action. In this study, we train and compare traditional neural networks and vision transformers to predict road quality from medium-resolution satellite imagery and apply them to realistic data conditions: heterogeneous temporal-spatial resolutions. The best models (vision transformers) achieve AUROC scores of 0.934 and 0.685 for binary and five-class classification tasks, respectively, exhibiting results appealing for inference in otherwise unmeasured areas. Furthermore, these experiments and results showed that proper training techniques could produce accurate models from limited, heterogeneous, and low-resolution data.<\/jats:p>","DOI":"10.1145\/3608112","type":"journal-article","created":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T13:54:59Z","timestamp":1690811699000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Pixel Perfect: Using Vision Transformers to Improve Road Quality Predictions from Medium Resolution and Heterogeneous Satellite Imagery"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0546-9444","authenticated-orcid":false,"given":"Aggrey","family":"Muhebwa","sequence":"first","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, University of Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4756-517X","authenticated-orcid":false,"given":"Gabriel","family":"Cadamuro","sequence":"additional","affiliation":[{"name":"Proco Innovation Inc, CA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7590-2509","authenticated-orcid":false,"given":"Jay","family":"Taneja","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, University of Massachusetts, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,9,22]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Quantification of sand dune movements in the south western part of Egypt, using remotely sensed data and GIS","volume":"2013","author":"El-Magd Islam Abou","year":"2013","unstructured":"Islam Abou El-Magd, Osman Hassan, and Sayed Arafat. 2013. Quantification of sand dune movements in the south western part of Egypt, using remotely sensed data and GIS. J. Geog. Inf. Syst. 2013 (2013).","journal-title":"J. Geog. Inf. Syst."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gloenvcha.2019.101975"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8667.00263"},{"key":"e_1_3_1_5_2","article-title":"Neural machine translation by jointly learning to align and translate","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).","journal-title":"arXiv preprint arXiv:1409.0473"},{"key":"e_1_3_1_6_2","first-page":"26831","article-title":"Are transformers more robust than CNNs?","volume":"34","author":"Bai Yutong","year":"2021","unstructured":"Yutong Bai, Jieru Mei, Alan L. Yuille, and Cihang Xie. 2021. Are transformers more robust than CNNs? Adv. Neural Inf. Process. Syst. 34 (2021), 26831\u201326843.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2019.04.007"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0253370"},{"key":"e_1_3_1_9_2","article-title":"Assigning a grade: Accurate measurement of road quality using satellite imagery","author":"Cadamuro Gabriel","year":"2018","unstructured":"Gabriel Cadamuro, Aggrey Muhebwa, and Jay Taneja. 2018. Assigning a grade: Accurate measurement of road quality using satellite imagery. arXiv preprint arXiv:1812.01699 (2018).","journal-title":"arXiv preprint arXiv:1812.01699"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3314344.3332493"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jenvman.2020.110344"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_13_2","unstructured":"arXiv preprint arXiv:2010.11929 2020 An image is worth 16x16 words: Transformers for image recognition at scale"},{"key":"e_1_3_1_14_2","unstructured":"Google Earth. 2017. Google Earth Pro. Retrieved from https:\/\/earth.google.com\/"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1017\/S1481803500013336"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.10.013"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8685"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaf7894"},{"key":"e_1_3_1_20_2","first-page":"702","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Jiang Heinrich","year":"2020","unstructured":"Heinrich Jiang and Ofir Nachum. 2020. Identifying and correcting label bias in machine learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 702\u2013712."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2011.6082921"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa8415"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505244"},{"key":"e_1_3_1_24_2","unstructured":"Bohdan T. Kulakowski and James C. Wambold. 1989. Development of procedures for the calibration of profilographs. https:\/\/api.semanticscholar.org\/CorpusID:107792831"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fcr.2012.08.008"},{"key":"e_1_3_1_27_2","article-title":"Bridging the gap between vision transformers and convolutional neural networks on small datasets","author":"Lu Zhiying","year":"2022","unstructured":"Zhiying Lu, Hongtao Xie, Chuanbin Liu, and Yongdong Zhang. 2022. Bridging the gap between vision transformers and convolutional neural networks on small datasets. arXiv preprint arXiv:2210.05958 (2022).","journal-title":"arXiv preprint arXiv:2210.05958"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rse.2006.06.018"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.gecco.2020.e01194"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3457607"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.conbuildmat.2020.119397"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/541177"},{"key":"e_1_3_1_33_2","first-page":"83","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Nachmany Yoni","year":"2019","unstructured":"Yoni Nachmany and Hamed Alemohammad. 2019. Detecting roads from satellite imagery in the developing world. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 83\u201389."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1136\/amiajnl-2011-000464"},{"key":"e_1_3_1_35_2","first-page":"23296","article-title":"Intriguing properties of vision transformers","volume":"34","author":"Naseer Muhammad Muzammal","year":"2021","unstructured":"Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H. Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2021. Intriguing properties of vision transformers. Adv. Neural Inf. Process. Syst. 34 (2021), 23296\u201323308.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_36_2","article-title":"An introduction to convolutional neural networks","author":"O\u2019Shea Keiron","year":"2015","unstructured":"Keiron O\u2019Shea and Ryan Nash. 2015. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 (2015).","journal-title":"arXiv preprint arXiv:1511.08458"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tra.2020.09.018"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2119890119"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11416"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aap.2020.105657"},{"key":"e_1_3_1_41_2","first-page":"90","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Piaggesi Simone","year":"2019","unstructured":"Simone Piaggesi, Laetitia Gauvin, Michele Tizzoni, Ciro Cattuto, Natalia Adler, Stefaan Verhulst, Andrew Young, Rhiannan Price, Leo Ferres, and Andr\u00e9 Panisson. 2019. Predicting city poverty using satellite imagery. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 90\u201396."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00235-5"},{"key":"e_1_3_1_43_2","first-page":"12116","article-title":"Do vision transformers see like convolutional neural networks?","volume":"34","author":"Raghu Maithra","year":"2021","unstructured":"Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. 2021. Do vision transformers see like convolutional neural networks? Adv. Neural Inf. Process. Syst. 34 (2021), 12116\u201312128.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-022-05322-8"},{"key":"e_1_3_1_45_2","article-title":"A meta-analysis of overfitting in machine learning","volume":"32","author":"Roelofs Rebecca","year":"2019","unstructured":"Rebecca Roelofs, Vaishaal Shankar, Benjamin Recht, Sara Fridovich-Keil, Moritz Hardt, John Miller, and Ludwig Schmidt. 2019. A meta-analysis of overfitting in machine learning. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-24638-z"},{"key":"e_1_3_1_47_2","article-title":"Development as freedom (1999)","volume":"525","author":"Sen Amartya","year":"2014","unstructured":"Amartya Sen. 2014. Development as freedom (1999). Globaliz. Devel. Read.: Perspect. Devel. Global Change 525 (2014).","journal-title":"Globaliz. Devel. Read.: Perspect. Devel. Global Change"},{"key":"e_1_3_1_48_2","article-title":"A higher purpose: Measuring electricity access using high-resolution daytime satellite imagery","author":"Shah Zeal","year":"2022","unstructured":"Zeal Shah, Simone Fobi, Gabriel Cadamuro, and Jay Taneja. 2022. A higher purpose: Measuring electricity access using high-resolution daytime satellite imagery. arXiv preprint arXiv:2210.03909 (2022).","journal-title":"arXiv preprint arXiv:2210.03909"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.119237"},{"key":"e_1_3_1_50_2","unstructured":"Planet Team. 2017. Planet Application Program Interface: In Space for Life on Earth. Retrieved from https:\/\/api.planet.com"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41893-019-0256-8"},{"key":"e_1_3_1_52_2","unstructured":"World Bank Group. 2017. Africa\u2019s pulse. World Bank. Retrieved from http:\/\/documents.worldbank.org\/curated\/en\/348741492463112162\/Africas-pulse"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01771-y"},{"key":"e_1_3_1_54_2","first-page":"10347","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning. PMLR, 10347\u201310357."},{"key":"e_1_3_1_55_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1002\/2014GL060641"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-14108-y"},{"key":"e_1_3_1_58_2","unstructured":"Bichen Wu Chenfeng Xu Xiaoliang Dai Alvin Wan Peizhao Zhang Zhicheng Yan Masayoshi Tomizuka Joseph Gonzalez Kurt Keutzer and Peter Vajda. 2020. Visual Transformers: Token-based Image Representation and Processing for Computer Vision. arxiv:2006.03677"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.9906"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-16185-w"}],"container-title":["ACM Journal on Computing and Sustainable Societies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608112","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3608112","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:32Z","timestamp":1750178792000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608112"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3608112"],"URL":"https:\/\/doi.org\/10.1145\/3608112","relation":{},"ISSN":["2834-5533"],"issn-type":[{"value":"2834-5533","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,22]]},"assertion":[{"value":"2023-03-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}