key: cord-1009282-kzxqnma8 authors: Santos, Filipe; Conti, Stefano; Wolters, Arne title: A novel method for identifying care home residents in England: a validation study date: 2021-09-15 journal: International journal of population data science DOI: 10.23889/ijpds.v5i4.1666 sha: 1a0339726538794b0332284272846193d0a48dd4 doc_id: 1009282 cord_uid: kzxqnma8 INTRODUCTION: The ability to identify residents of care homes in routinely collected health care data is key to informing healthcare planning decisions and delivery initiatives targeting the older and frail population. Health-care planning and delivery implications at national level concerning this population subgroup have considerably and suddenly grown in urgency following the onset of the COVID-19 pandemic, which has especially hit care homes. The range of applicability of this information has widened with the increased availability in England of retrospectively collected administrative databases, holding rich patient-level details on health and prognostic status who have made or are in contact with the National Health Service. In practice lack of a national registry of care homes residents in England complicates assessing an individual’s care home residency status, which has been typically identified via manual address matching from pseudonymised patient-level healthcare databases linked with publicly availably care home address information. OBJECTIVES: To examine a novel methodology based on linking unique care home address identifiers with primary care patient registration data, enabling routine identification of care home residents in health-care data. METHODS: This study benchmarks the proposed strategy against the manual address matching standard approach through a diagnostic assessment of a stratified random sample of care home post codes in England. RESULTS: Derived estimates of diagnostic performance, albeit showing a non-insignificant false negative rate (21.98%), highlight a remarkable true negative rate (99.69%) and positive predictive value (99.35%) as well as a satisfactory negative predictive value (88.25%). CONCLUSIONS: The validation exercise lends confidence to the reliability of the novel address matching method as a viable and general alternative to manual address matching. The ability to identify residents of care homes in routinely collected health care data is key to informing healthcare planning decisions and delivery initiatives targeting the older and frail population. Health-care planning and delivery implications at national level concerning this population subgroup have considerably and suddenly grown in urgency following the onset of the COVID-19 pandemic, which has especially hit care homes. The range of applicability of this information has widened with the increased availability in England of retrospectively collected administrative databases, holding rich patient-level details on health and prognostic status who have made or are in contact with the National Health Service. In practice lack of a national registry of care homes residents in England complicates assessing an individual's care home residency status, which has been typically identified via manual address matching from pseudonymised patient-level healthcare databases linked with publicly availably care home address information. To examine a novel methodology based on linking unique care home address identifiers with primary care patient registration data, enabling routine identification of care home residents in health-care data. This study benchmarks the proposed strategy against the manual address matching standard approach through a diagnostic assessment of a stratified random sample of care home post codes in England. Derived estimates of diagnostic performance, albeit showing a non-insignificant false negative rate (21.98%), highlight a remarkable true negative rate (99.69%) and positive predictive value (99.35%) as well as a satisfactory negative predictive value (88.25%). The validation exercise lends confidence to the reliability of the novel address matching method as a viable and general alternative to manual address matching. Keywords care home; routine data; test accuracy Introduction Older care home residents (aged 65 or over) have varied and complex social and healthcare needs, with information on both being required by healthcare systems and providers to offer the right care for this patient group [1] [2] [3] [4] [5] . Since the onset in March 2020 of the COVID-19 pandemic, it has become especially important for national healthcare and governmental authorities to feed this information into their healthcare policy planning and delivery strategies, in the light of the severity of the burden of the disease among the care home resident population. However, the lack of a central register of care home residents in England [6] limits the potential of using routinely collected administrative data to understand the healthcare needs of this population. This reflects in part how healthcare and social care services in England are funded and commissioned independently of each other. The former is publicly funded and provided by the English National Health Service (NHS) free of charge at the point of use. The latter is (partially) funded with public money by local councils via means-tested criteria and is supplemented by self-funding; it is mainly delivered by private or not-for-profit providers [7, 8] . Consequently, patient data generated from social care does not flow through a centralised system, in contrast with data on hospital care which is instead held in a central repository for Secondary Use Services (SUS) [9] . As the NHS is implementing the Enhanced Health in Care Homes framework across England via Primary Care Networks (PCNs) as part of a 10-year strategic plan [10] , policy-and decision-makers are striving to profile and understand the care home population and how to design the delivery of the framework, as well as how this model is impacting on the care provided to care home residents. This study presents a novel data linkage approach at national level to support tackling this challenge for the older people living in a care home. This methodology allows for care home residents to be identified from healthcare data and for multiple sources of information on care homes to be linked. The study also includes a validation analysis of the methodology presented. This study illustrates an algorithm as a solution to the challenge of identifying older care home residents of 65 years of age or over in healthcare data by retrospectively using routinely collected administrative data. An index test is hereby considered and tested against a reference standard, following the 2015 Standards for Reporting of Diagnostic Accuracy Studies (STARD-2015) guidelines [11, 12] . Protocols for the index test and the standard reference were established before investigating the data. People registered with a General Practitioner (GP) practice in England on 17 June 2018, of 65 years of age or over and whose registered address was in a postcode area that was shared with a care home were eligible to be included in the study. The list of postcodes for care homes was drawn from the publicly available care home register kept by the Care Quality Commission [13] , the health and social care services inspector and regulator in England. The study cohort was selected based on a stratified random sample of care home postcodes. To account for the heterogeneity of the distribution of key care homes characteristics around England, 80 care homes were randomly selected from the CQC registry by: care home type (i.e. nursing or residential); rurality (i.e. located in a rural or urban setting according to the 2011 UK Census' Rural Urban Classification [14] ); bed stock size (i.e. above or below the national CQC average of approximately 25 beds [13] ). For each combination of these three categories, 10 care homes were randomly sampled. The sample to be used for validating the index test was then created by selecting all addresses with the same postcode as these care homes, therefore yielding a stratified random sample of potential care home residents (i.e. individuals residing at a postcode shared by one of the sampled care homes). As the actual number of care home residents is unknown, the number of care home beds (Table A) was used as conservative proxy for the computation of the finite population corrections required to account for the sampling taking place without replacement from each care homes stratum. A commercially available address cleaning software from GBG Loqate named Matchcode [15] underpinned by the Ordnance Survey AddressBase Premium database of all addresses in the UK [16] was used to standardise addresses by assigning a Unique Property Reference Number (UPRN, that is the official unique identifier of every spatial address in Great Britain [17] ) to each addressable location in the administrative data available. The index test and the reference standard were applied to addresses on the national records of patients registered with a GP practice as well as to addresses of care homes. Such processing and subsequent pseudonymisation of patient data was conducted by NHS Digital as data processor on behalf of NHS England and NHS Improvement (the Data Controller) under its mandate to fulfil commissioning responsibilities for the NHS in England as per the Health and Social Care Act 2012 [18] . Pseudonymised data were securely transferred from the National Commissioning Data Repository (NCDR [19] ), which is the national pseudonymised patient-level data repository managed by NHS England and NHS Improvement, to the Improvement Analytics Unit [20] , which acts as a data processor on behalf of NHS England and NHS Improvement. The administrative data used are regularly cleaned, maintained and updated; information on any missing data is outside of the control of the research team. An index test was developed and validated with the aim of producing a dichotomous outcome as to whether a given person was a care home resident or not. The method, based on UPRN matching, identified people of 65 years of age or over living in a UPRN associated with a care home; in cases where the matching software was not able to allocate a UPRN to a specific spelling of address this was assumed as not being a care home address spelling (thus leading to a non-care home residents outcome). There is no established source of information identifying care home residents in England, whether based on patients' addresses or on other records such as care home residents' registers. The UPRN index test was validated against manual address matching as the reference standard. Pseudonymised list of spellings of addresses in a care home postcode were compared to the care home address by an analyst in the Improvement Analytics Unit following a pre-defined study protocol. When applying the protocol to the reference standard, ambiguous cases where it was not clear which address the spelling belonged to were resolved via direct assessment by the analyst, resulting in either a positive or negative identification. Address information was not standardised, with the possibility of the same address having more than one spelling. Manual matching of addresses allows to handle situations where information was entered in the wrong field or typos occurred at the data entry stage, leading to unambiguously identifying an address. Both the reference standard and the index test were performed retrospectively and independently of each other. Standard diagnostic test parameters -that is sensitivity, specificity, positive (PPV) and negative (NPV) predictive values -were estimated and reported [21] . In the collected sample of 80 care homes (Table A) , 5,721 people of 65 years of age or over in 2,199 distinct spelling of addresses were identified as living in a care home postcode. From all patients identified as care home residents in the sample according to manual address matching (1,812), as per Table B , 78.02% with a 95% confidence interval (CI) of 75.52%-80.34%) were also flagged with the UPRN index test as care home residents (sensitivity). This means that just over 3/4 of individuals living in a postcode area for one of the sampled care homes that were identified as a care home resident via manual matching were also flagged as care home residents by the UPRN index test. The agreement between the UPRN index test and manual matching was found to be nearly complete in terms of specificity (99.69%; 95% CI: 99.37%-99.85%), which indicates the likelihood of an individual living in a postcode area where a sampled care home is located not being flagged as a care home resident by the UPRN index test given that it was also identified as a non-care home resident via manual matching. The PPV, which represent the chance of an individual flagged as a care home resident via the UPRN index test being also identified as such through manual matching, is also estimated from the data as near perfect (PPV = 99.35%; 95% CI: 98.67%-99.68%). Lastly the NPV, which denotes the probability of an individual living in a sampled care home's postcode and not identified via UPRN matching as a care home resident being also identified accordingly by manual matching, is estimated at 88.25% (95% CI: 86.84%-89.53%). An assumption implicitly underpinning the diagnostic exercise presented in the Results section concerns the classification of care home residents living in care homes not forming part of the accrued sample. These individuals may be excluded from the analysis -thus being classified as non-care home residents -on the grounds of not living in any of the sampled care homes, but rather in a care home located in the postcode area of a sampled care home (as spotted via manual address matching). Indeed figures populating Table B are derived from this assumption. On the other hand, a more liberal approach might treat such individuals as care home residents (which they in fact are as revealed by manual address matching): the result of implementing such a strategy is shown in Table C , which reports residents' numbers as well as (weighted) estimates of diagnostic test performance revised to reflect the alternative assumption. Age and postcode are often used to identify care home residents because address data are either not standardised or not available [22, 23] . This leads to exaggerated care home population size estimates since, according to findings illustrated in the Results section, only 31.67% (1,812/5,721) of people of 65 years of age or over in a care home postcode do live in a care home. This is a recognised challenge and efforts are being made to use address cleaning algorithms locally [24, 25] . Nevertheless, no commonly accepted solution exists at a national level in England. The proposed approach in this paper maximises the value of the information in address data available and is applicable at national level. The UPRN index test presents a software-based solution hardly producing any false positives (i.e. specificity of 99.69%), meaning that pretty much everyone not identified as a care home resident from the data was indeed not a care home resident. The index test is, however, not perfect in that it erroneously flagged as non-residents a non-negligible proportion of individuals instead identified through manual matching as care home residents (21.98%). The UPRN method's accuracy in terms of spotting, in particular, genuine care home residents (PPV of 99.35%), but also non-residents (NPV of 88.25%) is also remarkable. Quite reassuringly these findings remain substantially unaltered if individuals flagged as residents in a care home falling outside of the accrued care homes sample are nevertheless classified as care home residents. It is also worth noting the small proportion of individuals for whom no UPRN could be found, which were conservatively classed as non-care home residents; given their reduced prevalence, any different handling is not expected to significantly alter the diagnostic performance measures derived in Tables B, C. Quantifying the reliability of manual address matching as a reference standard is made problematic by the unavailability of a gold standard on the list of care home residents [6] , such as a care home resident census. To further emphasise that there is no consensus on the overall size of the care home population, it is worth noting the 11.26% difference for the overall nursing and residential care home population size in England between the ONS 2011 Census estimate (that is 360,353 1 ) and the corresponding figure reported for that year by LaingBuisson [26] (that is 406,100). The latter source is often referenced by a number of third-sector organisations (e.g. Age UK). The ability of the UPRN index test to scale well with national administrative datasets allows to estimate the size of the care home resident population in England as it changes over time in response to evolving demographic patterns. Additional research, which falls outside the scope of this study, will be needed to properly identify and account for the drivers of variations in this population subgroup's size. The uneven distribution of the selected care home types across England is shown by figures in Table A . Tabulated statistics emphasise in particular the higher prevalence in the country of residential care homes located in an urban, as 1 Source: http://www.nomisweb.co.uk/census/2011/dc4210ewla opposed to rural, setting. This imbalance alone highlights the appropriateness of the adopted stratified sampling design for the purpose of estimating diagnostic measures for the proposed UPRN index test. Furthermore, the narrow confidence intervals obtained for all estimated diagnostic parameters (Tables B, C) provide additional confidence in their degree of accuracy as descriptors of the full care home population. A limitation that, depending on the scale of application, may hamper the use of the examined UPRN index test lies in its improvable sensitivity, as it is expected to erroneously flag as a care home resident approximately 1 individual out of 5. A previous iteration of this research documented a validation exercise based on an algorithm developed by a different software provider (namely Experian) and three care home residents identification index tests: that is manual matching (again as a standard of reference), the '3+' method (which classed an individual to be a care home resident if she/he shared the same address with at least 2 people of 65 years of age or over) and a hybrid of the UPRN and 3+ approaches. The availability of the BGB Logate software described in the Data Sources section rendered the three-pronged comparison meaningless due to the clear inferiority of the 3+ method to the updated UPRN-based method. In the Experian validation exercise the UPRN-based approach to care home residence status identification performed well against manual address matching; the Loqate UPRN-based version examined in the present work produced generally improved diagnostic measures. As such, although no roadmap for development of the address matching algorithm is available from Loqate, it would be reasonable to expect further diagnostic performance improvements -especially in sensitivity -from subsequent algorithm development. While additional index tests exist for the purpose of assessing care home residency status (e.g. phonic matching and Markov matching), these were not considered here as they are less common and no more robust than manual address matching [25] . Further limitations in the proposed UPRN method, and also with manual address matching, are typically linked to those intrinsic to the use of patient registration data. Address information is updated when patients update their GP practice registration about a new address or when patients interact with other NHS services and report their address change. Furthermore, there may be variations in the timeliness of how this information is updated. Therefore, the UPRN index test is likely to only identify permanent care home residents, excluding temporary ones or those only receiving respite care on a temporary basis. An additional limitation relates to the structure of addresses, whereby a residential building is identified by a house number or name, in that different methods using address data may process this information differently. In the spelling of addresses for care home residents this problem was compounded by the fact that care homes use their commercial name -which can change over time -and residents may include their room numbers in the address. Although technically possible, it is however unlikely that a single care home has multiple UPRNs associated with it. Some care homes will have individual addresses for care home residents (e.g. private occupancies), but the hierarchical nature of UPRNs means that these addresses would be associated by design with the parent UPRN uniquely linked to the care home. UPRNs are assigned with addressable properties registered with the UK Land Registry; cases where a care home comprises multiple physical buildings are often assigned a single UPRN. Anecdotally, care homes in England exist which comprise multiple buildings in close proximity to each other, often providing different specialised services. As a result, these are registered as separate entries with the CQC, even if belonging in reality to the same care home. Ultimately there are rare instances of care homes with multiple UPRNs, with only one UPRN registered with the CQC. For these reasons, the UPRN index test is anticipated to perform worse than manual matching with care home addresses compared to other household addresses. It is also to improve upon these limitations that the Improvement Analytics Unit and the NCDR are further developing the algorithm assigning UPRNs to spelling of addresses, which is expected to drastically improve the performance of the UPRN index test. The UPRN index test for identifying care home residents at a given address examined in this study reflects a tradeoff between correctly identifying the care home resident population without wrongly capturing non-residents as care home residents. For applications where it is essential that the target population excludes non-residents, the UPRN method would be especially suitable if the sample size is sufficiently large and where the identified population can be assumed to be fairly representative of the overall care home population. This would be the case in causal inference studies of health and social care interventions targeting care home residents [27] [28] [29] , as well as national analyses aimed at understanding the quality of care in care homes where sample size is not an issue [30] . The methods were assessed using a stratified sample of care homes in England and the algorithm can be applied to the English NHS patient list at national level. Bearing this in mind, the use of these algorithms can provide commissioners, providers and policy makers valuable insights in health-care use by an important population at a national level for the first time. The UPRN index test presented in this paper provides a means for identifying care home residents in administrative NHS datasets, illustrated by a detailed analysis of its strengths and limitations relative to standard manual address matching. The study highlights the potential of using linked datasets via spatial identifiers, such as bringing together health and social care data via UPRN. These linked datasets can provide a richer picture of the population and their needs, as well as allow investigating the impact of health and social care interventions from them. The validation exercise carried out in this study lends confidence to the robustness of the proposed algorithm and offers a dependable assessment of the identified care home residents' population. This new methodology can offer commissioners, policy makers and local leaders insights on the national care home population size and enables the evaluation and examination of the impact of interventions on this cohort of individuals. However, the use of this approach requires healthcare systems to invest in the quality and timeliness of the data they routinely collect as an enabler to understanding the population they cover and their respective health-care needs. Formative care: defining the purpose and clinical practice of care for the frail Alzheimer's Society. Fix Dementia Care: NHS and care homes Enhanced health in care homes: learning from experiences so far Introduction to Frailty, Fit for Frailty Part 1 Briefing 2: place and cause of death for permanent and temporary residents of care homes Developing research resources And minimum data set for Care Homes' Adoption and use (the DACHA study) House of Commons Health and Social Care and Housing, Communities and Local Government Committees. Long-term funding of adult social care, HC 768 Briefing: Health and Care of Older People in England URL: https:// digital.nhs.uk/services/secondary-uses-service-sus (last accessed Network Contract Directed Enhanced Service (DES) Contract Specification 2020/21 -PCN Entitlements and Requirements STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration Care Quality Commission. Using CQC data Department for Environment, Food & Rural Affairs Rural Urban Classification official statistics Address Verification & Geocoding. Loqate, a GBG solution AddressBase Premium UK Public General Acts. Health and Social Care Act National Commissioning Data Repository Improvement Analytics Unit Review of Diagnostic Test Accuracy (DTA) studies in older people The state of health care and adult social care in England Continuous monitoring of emergency admissions of older care home residents to hospital Accurate identification of hospital admissions from care homes; development and validation of an automated algorithm Identifying care-home residents in routine healthcare datasets: a diagnostic test accuracy study of five methods Care homes for older people: UK market report, 30 th edition Briefing: the impact of providing enhanced support for care home residents in Rushcliffe Briefing: the impact of providing enhanced support for Sutton Homes of Care residents Briefing: the impact of providing enhanced support for care home residents in Wakefield Emergency admissions to hospital from care homes: how often and what for? Confidence Interval COVID-19: COronaVIrus Disease 2019 CQC: Care Quality Commission GP: General Practitioner NCDR: National Commissioning Data Repository NHS: National Health Service NPV: Negative Predictive Value PCN: Primary Care Network PPV: Positive Predictive Value STARD-2015: Standards for Reporting Diagnostic Accuracy 2015 SUS: Secondary Uses Service UPRN: Unique Property Reference Number The authors wish to credit James Lockyer, Mark Marshall and data architects at NCDR for the provision of, and helpful discussions around, the data underpinning this study.The authors have no conflicts of interest to disclose. This study requires no ethics board approval in that it reports an analysis of pseudonymised data transferred by the National Commissioning Data Repository to the Improvement Analytics Unit, which is a data processor on behalf of NHS England and NHS Improvement.