{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T15:48:39Z","timestamp":1753890519132,"version":"3.41.2"},"reference-count":17,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T00:00:00Z","timestamp":1701129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Linking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>In this article, we propose a generalisable solution (A <jats:bold>F<\/jats:bold>ramework for <jats:bold>L<\/jats:bold>inking free-text <jats:bold>A<\/jats:bold>ddresses to Ordnance Survey U<jats:bold>P<\/jats:bold>RN database, <jats:italic>FLAP<\/jats:italic>) based on a machine learning\u2013based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at <jats:ext-link><jats:italic>Link<\/jats:italic><\/jats:ext-link>). We tested the framework in a real-world scenario of linking individual\u2019s (<jats:inline-formula><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"IM1\"><mml:mi>n<\/mml:mi><mml:mo>=<\/mml:mo><mml:mn>771,588<\/mml:mn><\/mml:math><\/jats:inline-formula>) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB).<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (<jats:inline-formula><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"IM2\"><mml:mi>n<\/mml:mi><mml:mo>=<\/mml:mo><mml:mn>3<\/mml:mn><mml:mo>,<\/mml:mo><mml:mn>876<\/mml:mn><\/mml:math><\/jats:inline-formula>) from NHS Tayside and NHS Fife CHI addresses. <jats:italic>FLAP<\/jats:italic> showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top <jats:inline-formula><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"IM3\"><mml:mi>n<\/mml:mi><\/mml:math><\/jats:inline-formula> candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>In conclusion, we have developed a framework, <jats:italic>FLAP<\/jats:italic>, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdgth.2023.1186208","type":"journal-article","created":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T09:59:29Z","timestamp":1701165569000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FLAP: a framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database"],"prefix":"10.3389","volume":"5","author":[{"given":"Huayu","family":"Zhang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arlene","family":"Casey","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Imane","family":"Guellil","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"V\u00edctor","family":"Su\u00e1rez-Paniagua","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Clare","family":"MacRae","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Charis","family":"Marwick","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Honghan","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bruce","family":"Guthrie","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Beatrice","family":"Alex","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,11,28]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1804","DOI":"10.1093\/ije\/dyab176","article-title":"Penetration, impact of COVID-19 in long term care facilities in England: population surveillance study","volume":"50","author":"Chudasama","year":"2021","journal-title":"Int J Epidemiol"},{"key":"B2","doi-asserted-by":"publisher","first-page":"e26","DOI":"10.1016\/j.jinf.2021.04.029","article-title":"Household clustering of SARS-CoV-2 variant of concern B.1.1.7 (VOC-202012\u201301) in England","volume":"83","author":"Chudasama","year":"2021","journal-title":"J Infect"},{"year":"","author":"Scotland","key":"B3"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1666","DOI":"10.23889\/ijpds.v5i4.1666","article-title":"A novel method for identifying care home residents in England: a validation study","volume":"5","author":"Santos","year":"2021","journal-title":"Int J Popul Data Sci"},{"key":"B5","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1055\/s-0038-1634840","article-title":"Record linkage strategies. Part I. Estimating information, evaluating approaches","volume":"30","author":"Roos","year":"1991","journal-title":"Methods Inf Med"},{"key":"B6","doi-asserted-by":"publisher","first-page":"1246","DOI":"10.1093\/ije\/31.6.1246","article-title":"Probabilistic record linkage, a method to calculate the positive predictive value","volume":"31","author":"Blakely","year":"2002","journal-title":"Int J Epidemiol"},{"key":"B7","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1109\/IJCNN.2011.6033192","article-title":"Beyond probabilistic record linkage using neural networks and complex features to improve genealogical record linkage","volume-title":"The 2011 International Joint Conference on Neural Networks","author":"Wilson","year":"2011"},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.23889\/ijpds.v7i3.1946","article-title":"Adding a residential dimension to the Scottish population Spine\u2013CHI-UPRN residential linkage (CURL)","volume-title":"Int J Popul Data Sci","author":"Clark","year":"2022"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1674","DOI":"10.23889\/ijpds.v6i1.1674","article-title":"Evaluation of the assign open-source deterministic address-matching algorithm for allocating unique property reference numbers to general practitioner-recorded patient addresses","volume":"6","author":"Harper","year":"2021","journal-title":"Int J Popul Data Sci"},{"year":"","key":"B10"},{"year":"","author":"GBG","key":"B11"},{"key":"B12","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J Mol Biol"},{"key":"B13","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Phys Dokl"},{"key":"B14","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1109\/TKDE.2014.2349916","article-title":"An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage","volume":"27","author":"Karapiperis","year":"2015","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"B15","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"article-title":"Efficient estimation of word representations in vector space","year":"2013","author":"Mikolov","key":"B16"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1679","DOI":"10.1109\/TAES.2016.140952","article-title":"On implementing 2D rectangular assignment algorithms","volume":"52","author":"Crouse","year":"2016","journal-title":"IEEE Trans Aerosp Electron Syst"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2023.1186208\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T09:59:33Z","timestamp":1701165573000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2023.1186208\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,28]]},"references-count":17,"alternative-id":["10.3389\/fdgth.2023.1186208"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2023.1186208","relation":{},"ISSN":["2673-253X"],"issn-type":[{"type":"electronic","value":"2673-253X"}],"subject":[],"published":{"date-parts":[[2023,11,28]]},"article-number":"1186208"}}