key: cord-0202349-hh2mjtdf authors: Powell, Claire; Burns, Luke title: Digital Divide: Mapping the geodemographics of internet accessibility across Great Britain date: 2021-08-03 journal: nan DOI: nan sha: 6934c466716231070671ebc268339926ef05968e doc_id: 202349 cord_uid: hh2mjtdf Aim: This research proposes the first solely sociodemographic measure of digital accessibility for Great Britain. Digital inaccessibility affects circa 10 million people who are unable to access or make full use of the internet, particularly impacting the disadvantaged in society. Method: A geodemographic classification is developed, analysing literature-guided sociodemographic variables at the district level. Analysis: Resultant clusters are analysed against their sociodemographic variables and spatial extent. Findings suggest three at-risk clusters exist,"Metropolitan Minority Struggle","Indian Metropolitan Living"and"Pakistani-Bangladeshi Inequality". These are validated through nationwide Ofcom telecommunications performance data and specific case studies using Office for National Statistics internet usage data. Conclusion: Using solely contemporary and open-source sociodemographic variables, this paper enhances previous digital accessibility research. The identification of digitally inaccessible areas allows focussed local and national government resource and policy targeting, particularly important as a key data source and methodology post-2021, following the expected final nationwide census. Digital accessibility refers to the non-egalitarian divide between those with and those without internet access (Castells, 2002; Singleton, et al., 2020) . Overall, inaccessibility levels have declined steadily in recent years, but a considerable proportion of the Great Britain population remain without access (circa 10 million people in 2019 (Blank, et al., 2020) (Dutton & Blank, 2013) ). Even with internet access, a minimum speed and connectivity is required to enable multiple users using the same internet connection to carry out common daily tasks (UK Government, 2020a), many of which are taken for granted. According to the Office of Communications (Ofcom) and the UK Government (2020a) under the Broadband Universal Service Obligation, a 'decent broadband service' has an upload speed of 1 Mbit/s (megabit per second) and download speed of 10 Mbit/s. When compared to Europe as a whole, the UK ranks 5 th in internet services use (European Commission, 2020) . Digital accessibility can be split into three aspects: The first-level digital divide is the ability (or inability) to access the internet due to physical infrastructure or financial constraints. The second-level digital divide is how effectively people engage computer skills to exploit internet benefits (Hargittai, 2002; van Deursen & van Dijk, 2011) . The third-level digital divide addresses internet use consequences (Selwyn, 2004; . This research crosscuts all aspects, with a key focus on the first and second levels. Factors influencing nationwide digital accessibility are multidimensional, often interrelated and are generally directly associated with sociodemographic attributes that vary spatially. Geodemographics, regularly defined as "the analysis of people according to areas where they live" (Sleight, 1997, p. 16) , takes into account socio-economic and demographic similarities and differences, and has had considerable public and private sector success (Harris, et al., 2005; Webber & Burrows, 2018) . In this research, the first solely sociodemographic measure of digital accessibility for Great Britain will be presented, with findings analysed and validated. The most recent, freely available administrative data and open-source software will be used, ensuring that the output is both transparent and easy to replicate and update. Results are likely to aid future policy recommendations pertaining to internet accessibility (and inclusivity) and influence more broad internet connectivity debates. Geodemographic measures surrounding the topic of digital accessibility are limited but include a nationwide geodemographic 'Internet User Classification' created by Singleton et al (2020) . This work made use of exclusively transactional data, derived between 2013 and 2016. In other work, Blank et al (2017) used small area estimation from individual-level small scale survey data to determine the influence of demographic characteristics versus spatial differences on 2013 internet use. This research builds on the work of Blank et al (2017) , by creating a new bespoke classification, working with more recent and freely available data and using more complete, nationwide survey data collected and verified by the UK Government. Longley et al (2008) used well-known commercial geodemographic classification MOSAIC (Experian) to determine individual levels of engagement with electronic technologies and products for marketing. This research focusses specifically on sociodemographic factors as opposed to the narrow economic influence. Sociodemographic factors encompass multiple dimensions of the population (including direct and indirect impacts). Thus, this research will further develop knowledge and contemporise past works in the domain. Currently, no existing work has explored digital accessibility using only sociodemographic variables across Great Britain. A core focus is on the less well researched first and second digital divides. This research sets out to identify those at greatest risk of not gaining maximum benefit from the internet. Digital accessibility research is increasingly important globally, with the plethora of data generation and technological advances. The 5G network is a wireless mobile network with increased bandwidth (Médard, 2020) , higher data capacity and faster, more reliable (99.9%) latency (time between data transfer) than its predecessor 4G (Ilderem, 2020) . Since May 2019, 5G has been progressively set up across Great Britain. Banning new Huawei technology will slow and redact some 5G access in the short term, however, 5G is still set to hold the near future of internet connectivity (National Cyber Security Centre, 2020; UK Government, 2020b). This nationwide digital accessibility research will help identify the types of people at greatest impact of loss of digital access and their spatial extent, particularly important as a data source and methodology to follow after the last UK census in 2021. Use of government administrative data in this research could highlight accurate district-level data sources for data analysis at national and local level for future policy decisions and target setting. A thorough review of past related research was undertaken, exploring the current understanding and knowledge linked to sociodemographic factors. Sociodemography is the study of groups of the population that share characteristics (Lenormand, et al., 2015) . Here, variables that demonstrate fundamental disadvantage, both socially and demographically, can help highlight those potentially most at risk from digital inaccessibility and exclusion. Vicente and Gil-de-Bernabé (2010) , alongside Epstein et al (2011) , and White and Selwyn (2013) , suggest a more nuanced understanding of digital accessibility issues (backed by a collective societal responsibility) could aid future government digital policies. This could also aid advocation of the internet as a utility, a service the population cannot live without thus short-term faults merit immediate repair. This is popular with many (e.g. Skerratt et al (2008) , Townsend et al (2013) and Philip et al (2015) ) and already present in countries such as Sweden and Finland. Selwyn et al (2005) carried out an adult-focussed internet usage interview study which found sociodemographic factors influencing internet-derived knowledge and in turn impacting employment applications, job progression, health enquiries, social communication, business operations and more. Blank et al (2020) also analysed multiple demographic factors (age, education, income, functional literacy, gender, employment status, marital status, social grade, ethnicity, children in household, disability) to determine nationwide internet use in the 2019 Oxford Internet Survey. The most common digital access barrier identified in past work was educational qualifications/attainment. Such qualifications indicate likely digital skills, ambition and opportunities. A Dutch Internet Benefits survey by van Deursen and Helsper (2017) found well-educated individuals are likely to be connected and regular internet users with advanced internet skills and a greater quantity of positive digital experiences. Dutton and Blank (2013) support this in a UK context through the Oxford Internet Survey, with 95% of university graduates online compared to 40% with no educational qualifications. Those with graduatelevel qualifications, in line with National Vocational Qualification (NVQ) 4, and higher are more adept at grasping online opportunities than those less qualified (Blank & Lutz, 2016) . The type of internet usage differs by qualification. Drawing from the Bourdieu (1977) Cultural and Social Reproduction theory, Weber and Becker (2019) found well-educated European adolescents use the internet more for school and educational work than entertainment. Additionally, well-educated parents (particularly those using IT at work (Mesch & Talmud, 2011) ) encourage higher level IT activities (e.g. website creation) than their less educated, less supported peers. A supportive environment promotes internet exploration, experimentation and IT skills gain (Weber & Becker, 2019) . Townsend et al (2013) noted UK-wide influences on internet access, finding educated adults also gain from online employment opportunities and career development, this being fundamental in a fluctuating economy where increasing human capital through adaptability and retraining is vital to remain employable. Longley et al (2008) , through the UK e-Society National Classification, noted basic IT skills (defined in van Deursen et al (2016) internet skills framework as the ability to find, select, and evaluate online information) are considered employee responsibility and IT skills are as important as having higher educational qualifications. Without internet, those who are less educated are restricted from educational, employment and career development opportunities and limited in their employment potential (typically gaining unskilled, manual work) (Townsend, et al., 2013) . A further factor identified as influencing digital access is employment status, often linked to education and income. Higher qualifications often lead to higher salaried employment with more disposable income for broadband connectivity and internet-enabled devices. Highincome users tend to carry out capital-enhancing activities and are more expressive internet users (Blank & Lutz, 2016) . In contrast, those residing where education inequality exists tend to have income differentials and social disparities are perpetuated (Holsinger & Jacob, 2009 ). Blank and Lutz's (2016) research on internet benefits and harms in Great Britain built on Blumler and Katz (1974) 'Uses and Gratifications' theory to reveal internet use satisfaction. Findings showed young, highly educated, high income users benefit most from digital access followed by the elderly. Although the benefits are subjective, the Uses and Gratifications theory categorises benefit levels (e.g. goal-oriented use, fulfils needs and self-aware of reasons for use). Education-income benefits are not definite with different socioeconomic groups having different needs. Some benefit educationally, others by income, and many through both (Blank & Lutz, 2016) . Xiang et al (2018) challenged findings suggesting education-income benefits do provide equal benefit with education, a poverty reduction catalyst, promoting social mobility and producing a skilled workforce. This paper focussed on Central Beijing education inequality. Other research focussing on the employment, education, and internet accessibility link include: Milanovic (2016) , building on earlier work by Piketty and Saez (2003) who took a different approach, proposing the Kuznets waves theory, where technology developments, globalisation and public policy cause income inequality fluctuations. Milanovic (2016) provided evidence of the transfer from manufacturing to skill-heterogenous services (e.g. Big Data) causing rising inequality. Links between employment, education, and internet accessibility are established (Blank & Lutz, 2016; Milanovic, 2016) . However, the majority of research evidences these variables as providing varying benefit. Age can measure different life experiences, skills and knowledge. Longley et al (2008) noted young adults gained internet experience when susceptible to learning and able to afford technology. Livingstone and Helsper (2009) developed this further in their teenager-based internet skills and self-efficacy study. Internet introduction in one context, such as work or education, boosts spare time internet usage and exploration. Older community members and all disadvantaged groups tend to have fewer internet-enabled devices and lower broadband connectivity (Townsend, et al., 2014; Blank, et al., 2020) . Most not having grown up using digital devices have less internet experience, lowering online skills (instrumental rather than experimental use) (Longley, et al., 2008) . Fewer opportunities are grasped, potentially adding financial and health burdens. Web products have efficient supply chains, cutting costs (saving the UK £18 billion in 2009) (Kalapesi, et al., 2010) . Online official health advice can promote late-life wellness (Hargittai, et al., 2019) . Online social or work network exclusion can marginalise those unable to keep updated (Longley, et al., 2008) . Multiple global ethnicity-focused internet access studies, including Chen and Wellman (2004) and Mesch and Talmud (2011) , found ethnic minorities tend to report less internet access. More minority workers are employed in manual jobs, where internet exposure and learning IT skills are deemed less important and are unsupported. Blank et al (2020) reinforced these findings, adding UK minorities are more likely in disadvantaged groups. Scheerder et al (2017) researched determinants of internet skills from 126 global journal articles and found preconceived negative judgements of minority groups resulting from their disadvantaged position, a leading factor in lack of internet confidence and a disincentive to internet usage. This paper adopts a seven-step structure to developing a geodemographic area-level classification, similar to that proposed by Gibson and See (2006) and Burns et al. (2018) . Figure 1 summarises each of the seven replicable phases and the discussion that follows provides additional contextual information. Geodemographic System Framework, adapted from Gibson and See (2006, p. 214) and Burns et al (2018, p. 421) . The classification put forward in this research is the first solely sociodemographic measure of digital accessibility for Great Britain. Guided by literature intelligence, the variables presented in Table 1 were determined as being most effective at showing districts 'at risk' from digital inaccessibility. These can be divided into two broad categories: 'Demographic' and 'Social'. Ofcom is the UK regulator for communications. Performance data comprises broadband and mobile 4G, 3G and 2G networks, from four mobile network operators (EE, O2, Three and Vodafone) with the largest UK coverage (Ofcom, 2019b) . Annual data releases are published alongside a report analysing current state of the UK communications infrastructure (Ofcom, 2019c This internet usage data is from January to March 2019 (ONS, 2019c). Use of opensource data alongside the following clear, detailed methodology enables scientifically reproducible research to be open to scrutiny (Singleton & Longley, 2009 ). Noncensus data allows frequent area analysis between decadal censuses and post-2021 (Leventhal, 2016) . Recent and regular data releases allow funding and resource targeting of most spatially and temporally relevant results (Singleton, et al., 2016) . Specifically, administrative data can generally provide a wide variety of data that can be mined to extract potentially useful information (Singleton & Spielman, 2014) . Data are aggregated to Local Authority District (317 in England), Unitary Authority (22 in Wales) and Council Areas (32 in Scotland) level (as of December 2019) (Blank, et al., 2017) . Local Authority Districts vary between 2,300 and 1.1 million people, and include metropolitan districts, London boroughs, non-metropolitan districts and unitary authorities. Welsh Unitary Authority populations vary between 90,000 and 370,000 people, and Scottish Council Areas vary between 22,000 and 650,000 people (Blank, et al., 2017) . Aggregation level is referred to as districts hereafter. Following variable identification from a thorough review of contemporary academic literature, 11 variables influencing digital accessibility were collated. The number of variables were limited to 11 to avoid noise, prevent misrepresentation of districts and reduce inaccuracies from poorly fitting districts in clusters (Vickers & Rees, 2007) . Prior to analysis, multiple pre-processing steps were undertaken to ensure data suitability, using opensource software, R. Where the number of survey responses was below 500, data was omitted to preserve confidentiality. This was only the case for identifying the number of survey respondents who were of minority ethnicity status. However, total ethnicity was available so missing values were able to be calculated in most cases. Where multiple data were missing, averages were calculated following known data subtraction. Data were transformed onto a continuous scale suitable for classification and polarity determined (Leventhal, 2016) (Riekkinen & Burns, 2018) . At this stage, data were assessed for suitability with regards to their inclusion in the classification algorithm. Multicollinearity was evaluated ( Figure 2 ) with highly correlating variables further explored and consequently removed if variable impact was deemed less important than any compounding impact on results. Retaining just one variable of a highly correlating pair of variables enables each variable to contribute its own unique dimension in the geodemographic classification (Lucy & Burns, 2017 ). An arbitrary multicollinearity threshold of >±0.7 was selected in line with past academic research (Judge, et al., 1982; Halkos & Tsilika, 2018) . Variables undergo normalisation by Z-scores therefore allowing direct comparability (Vickers & Rees, 2007) . Percentage values are transformed into Z-scores by: . Z indicates the standard score, ( is the observed value, ) is the sample mean and * is the sample standard deviation (Milligan & Cooper, 1988) . Z-scores do not produce normalised data with consistently the same scale. Adjustment of the scale enables an effective spread of outliers and non-outlier data to be presented (Shinwell & Cohen, 2020) . For this nationwide research where there is a large dataset and outliers are likely, Z-scores tend to perform most effectively when compared to alternative approaches, such as Principle Components Analysis or Min-Max (0-1) scaling. This research opted to use K-Means classification as the route to partition the multidimensional dataset. K-Means is an unsupervised, hard (crisp) partitioning clustering algorithm that uses machine learning to group large volumes of data based on variable similarity (MacQueen, 1967; Hartigan & Wong, 1979) . K-Means has its starting seeds and the number of clusters predetermined (Major, et al., 2018) . Starting seeds highly influence final cluster solution, therefore repeat-clustering (with randomised seeds) ensures more accurate and valid results (Burns, 2017; Xiang, et al., 2018) . K-Means is computationally fast, accurate and sensitive to outliers (Cardot, et al., 2012; Gupta & Panda, 2018) . Guided by the literature, statistical R package, clValid, and similar previous commercial and academic success, K-Means was deemed the most appropriate algorithm for the dataset. clustering method (Brock, et al., 2008) . Following K-Means selection, the number of clusters required identification. Statistical algorithms, including Gap Statistic and Clustergram, determine cluster numbers in R. All evaluate the whole dataset (globally) rather than analysing individual pairs of clusters (locally) to test if amalgamation improves clusters (Gordon, 1999) , and all test a range of different cluster numbers and are well suited to the 9,646 data points (de Amorim & Hennig, 2015) . The Gap Statistic explores partitions in the dip of a normalised performance plot. A suitable number of clusters is achieved when the smallest number of clusters (where gain is not higher than expected on the normalised performance curve) is identified (Tibshirani, et al., 2002) . The Gap Statistic works well alongside K-Means. Clustergram, also effective for non-hierarchical clustering, plots a series of potential cluster frequency values alongside the weighted mean of their first principal component (Wierzchoń & Kłopotek, 2017 ). The resultant graph shows the relative separation of clusters. Distinctive and well-separated clusters are deemed most suitable given their homogenous nature (Schonlau, 2002) . In this research, the number of clusters was most accurately determined by using the Gap Statistic approach (Tibshirani, et al., 2002) (Figure 3) and Clustergram (Schonlau, 2002) ( Figure 4) . The Gap Statistic was run 500 times and Clustergram repeated 100 times, both with different cluster 'starting points' each time, ensuring initial seeds were randomly allocated and different combinations could be created (Singleton, et al., 2020) . After being run 500 times, Figure 3 suggested 6 clusters as most suitable for those specific variables. Seven clusters also appear viable with a very similar gap statistic (~0.975) and confidence intervals. A seven-cluster classification was deemed most statistically suitable. Greater cluster numbers decrease cluster-cases association strength, cases are less representative of clusters (Harris, et al., 2005) . K-Means involves an iterative process of moving one district (hereafter case) from one cluster to another to evaluate if a move enhances the sum of squared deviations within each cluster (Aldenderfer & Blashfield, 1984; Burns, et al., 2018) . Cases are allocated (or re-allocated) to clusters until all cases are stable in clusters and provide maximum improvement to the cluster. Cases with similar variables group together and dissimilar variables exist in different clusters (Kaufman & Rousseeuw, 1990) . Clusters created should represent discrete categories and reflect similar districts (Spielman & Thill, 2008) . Starting cluster centres and initial seeds can determine different solutions depending on data order, therefore most accurate cluster solution requires running of the algorithm multiple times (here 1,000) with different initial cluster centres each time (Singleton, et al., 2016) . K-Means attempts to minimise 'within' cluster variability and maximise 'between' cluster variability (Vickers & Rees, 2007) . Cluster centres and Analysis of Variance (ANOVA) F value gauges distinctiveness and robustness of cluster-cases fit (Everitt, et al., 2011) . Final cluster centres show all classification variables. Values above zero show a variable is above the population mean in the districts within that particular cluster. Values sub-zero, below the population mean, show variables that are less prevalent in those districts for that particular cluster. ANOVA F values reflect variables which provide greatest contributions to resultant clusters, as shown in Boxplot showing distance of cases from cluster centres against cluster numbers shows the relative allocation of districts into clusters. The boxplot ( Figure 5 ) is the final visualisation of SPSS cluster effectiveness. Data points residing further from the mean (or outliers) show districts that fit less suitably into clusters. In such clusters, variables are likely to be misaligned with the majority of districts in that cluster. However, a district can be placed there as it is the 'best fit' of clusters available. A classification containing many outliers across multiple clusters may suggest re-running the K-Means algorithm with more clusters may be beneficial. Upon K-Means completion, district codes and cluster numbers were mapped in a geographical information system (using opensource QGIS). London was mapped separately to Great Britain showing its distinct geographical patterns. Clusters were described using pen portraits -short summaries of distinctive sociodemographic and spatial features in each cluster. The classification was validated by matching case (district) codes to Ofcom (2019a) broadband performance dataset and the ONS Internet Users (2019b) dataset. Nationwide Ofcom upload and download data were compared against the Great Britain classification results to see actual areas where low upload and download speeds were present. ONS (2019b) Internet Usage data validated specific case studies representing each of the clusters. Linking the final classification to other datasets also corroborates classification success given how additional data can improve discrimination between districts (Longley, et al., 2008) . Areas where download or upload speeds are slow or where internet usage is already low could identify at-risk internet inaccessibility areas. A K-Means geodemographic classification was deemed most suitable for the 11 literatureguided sociodemographic variables. Overall statistics, the Gap Statistic and Clustergram, assessed the optimum number of clusters as 7. Greater London, known as being geographically and socio-demographically distinctive, was mapped separately to Great Britain. Methods were run hundreds of times to ensure reliable results. Findings were mapped to show spatial extent and allow validation with datasets from well-regarded organisations, Office for National Statistics and Ofcom. London is known to be socio-economically and demographically distinct to the remainder of the country, hence why London is mapped separately. Dean et al (2012) found London and the South East to be the most prosperous and 'wired' parts of the country with regards to connectivity, and later research by Dutton and Blank (2013) supported this, finding that regional Internet use varies from 60% in the North East, 71% in Wales to 86% in London and 83% in the South East, in the 2013 Oxford Internet Survey. In Figure 6 , London is very clearly distinct in the classification, containing the majority of 'Indian Metropolitan Living' and 'Ethnically Diverse Career Climbers' clusters. The Great Britain population is a mix of different sociodemographic groupings and divides (ONS, 2019a), and thus the classification provides a good district level representation. District level data also matches the level of UK Government statistics; therefore, results are easily transferable into policy (Grupp & Mogee, 2004) and match census results, potentially providing further future validation. District level is a finer spatial resolution than other coarse scales, e.g. Nomenclature of Territorial Units for Statistics (NUTS) level where large significantly different socioeconomic and demographic cities are merged (Longley, 2012) . However, it is important to note that no spatial scale is ever representative of every individual. Presumption of individual characteristics about where they live from large scale results defines the ecological fallacy (Voas & Williamson, 2002) . Classification outputs repositioned out of their original spatial context fall into the modifiable areal unit problem (Openshaw & Wymer, 1995) . Policy leaders must be aware of aggregated data, underlying variables or spatial patterns at the original spatial extent when creating policies, to avoid potentially inaccurate presumptions (Riekkinen & Burns, 2018) . Presumed below are pen portrait (qualitative descriptions) of each of the seven clusters as shown in Figure 6 . All groups are based on literature-guided sociodemographic variables from Table 1 When analysing cluster outputs, the ANOVA F number, number of cases in each cluster and boxplots are known determiners of cluster effectiveness. Similarities in mean cluster centre distances (particularly clusters 4, 6 and 7) and extreme values exist ( Figure 5 ). Table 3 highlights where differences exist between clusters in terms of sociodemographic characteristics and distribution of individual districts within clusters. Cluster 2 has the greatest number of districts, 147 more than the lowest number of districts in Cluster 4. The first two clusters contain the largest quantity of districts. Although an even spread of districts in clusters is optimum (Sarstedt & Mooi, 2019) , districts across Great Britain are likely to have variation and some characteristics will be present in more districts than others due to the changing sociodemographic nature of this research. All 370 districts across England, Scotland and Wales are accounted for effectiveness (Everitt, et al., 2011) . Digital Accessibility Classification Boxplo Figure 5 (boxplots) denotes overall cluster effectiveness. The mean cluster centre distances vary between 1.5 and 3.5. Most (3 out of 7 clusters) have mean cluster centre distances between 2.1 and 2.5. Cluster centres differ as every district in each cluster is likely to have differing levels of each of the 11 sociodemographic variables. Clusters form when districts have sufficiently similar variable levels to group (Kaufman & Rousseeuw, 1990) . Clusters 1, 2, 6 and 7 have outliers, 3, 4 and 5 do not. Cluster 2 has the most outliers and the highest overall outlier at 6.7. Clusters 1 and 6 contain the most variation, with the highest upper extreme and lowest lower extreme values, perhaps unsurprising since retirees and students are likely a diverse mix of people. Despite these variations, overall mean distances, extremes and outliers tend to be relatively low, a sign of better district-cluster suitability. To evaluate the digital accessibility classification and validate pen portraits, Great Britain-wide Ofcom (2019a) Telecommunications Operator Performance data, followed by cluster case study-specific analysis with ONS (2019b) Internet Users data was undertaken. Ofcom data refers to wireless mobile internet access, a good access indicator where wired is unavailable. Download and upload speed, used in calculating internet speed, was used at district level using all internet line types for validation. Download speed indicates speed data is obtainable from a server (e.g. video streaming), whereas upload speed determines how fast data is sent to others (e.g. sending emails) (Riddlesden & Singleton, 2014) . Greater London upload speeds vary. London is geographically and demographically distinct (Dean, et al., 2012) . Therefore, analysis and validation are focussed on the nationwide results rather than London-centric differences. Classification. Great Britain's average download speed is 'superfast' at 58 Mbit/s (Ofcom, 2019c (Longley, 2012) . Generally, this would not be usable as validation. However, the cluster case studies have the same physical geographic boundaries thus results are relatively comparative. There are 174 total areas in the NUTS scale. In contrast, at-risk areas are ranked lower in internet usage over the last 3 months. Wolverhampton, a 'Metropolitan Minority Struggle' cluster, is ranked lowest for internet usage in the last 3 months against the other case study areas and ranked 138 th out of 174 UK areas. Wolverhampton is ranked highest regionally and nationwide in the at-risk clusters for those who have never accessed the internet or not accessed in over 3 months. Leicester, an 'Indian Metropolitan Living' cluster, has next lowest 3-month internet usage followed by Bradford. Clear differences exist between case study clusters. At-risk clusters have more residents not having used the internet and the lowest level of internet users in the last 3 months. Statistics to determine the number of clusters, the Gap Statistic and Clustergram were run 500 and 100 times respectively. Number of iterations was determined from previous geodemographic classification literature (Xiang, et al., 2018; Singleton, et al., 2020) . For the K-Means classification this could be increased, however, this is unlikely to change the number of clusters identified as most suitable nor the K-Means classification F number, number of cases per cluster or any other outcomes. Upload speed validation supports overall classification findings. Greater London variations do not consistently match against cluster divisions; however, London is known to be demographically distinct (Dean, et al., 2012) . Rerunning of the classification to analyse specific locations, such as London, may further distinguish differences. Additional rerunning at a smaller spatial scale (e.g. postcode) could enable more specific digital access and resource targeting. Ofcom validation data is freely available at postcode level, however ONS data is not. Accessing secure ONS data to postcode level would lead to more specific spatial analysis, however without all data being freely, publicly available the classification cannot be scrutinised in-depth. Yet with postcode level ONS data released publicly, individuals most socially disadvantaged and vulnerable could potentially be identified and be at-risk from targeting by commercial and criminal organisations. This is the overarching reason for maintaining data to district level, allowing a high spatial resolution over Great Britain while being able to highlight specific districts at-risk from digital inaccessibility. Download speed validation less definitively supports classification findings. There is support for a physical digital access divide with higher download speeds in Metropolitan clusters compared to generally rural-based clusters. However, download speeds are low across the majority of Great Britain, specific differences between sociodemographic digital accessible and inaccessible clusters are difficult to determine. Specific case study validation strengthens findings. The case study validation used data from a NUTS level Internet User survey. Generally, this would not be comparable with the district level ONS data for the classification. NUTS has a coarser spatial scale than district. However, for these case studies, the same physical geographic boundary exists. Other case studies may not be comparable at the same geographic level. Care must be taken if this validation is required for other areas. The Digital Accessibility research identifies at-risk clusters and their spatial extent to district level, which should aid future government policy. Districts requiring additional help could be targeted. Locations of inaccessibility are places where future infrastructure planning (e.g. 5G masts) or computer skills training would be beneficial. The sociodemographic variables selected are multidimensional, capturing key parts of society. The open source publicly available data and clear, transparent methodology used have enabled this in-depth analysis and break down of the classification. Methodology transparency allows the classification to be updated when new data releases are available and to sharper spatial scales using finer sociodemographic variable data, if local classifications are required to identify at-risk postcode zones for local councils. Data backlog and resource reallocation due to the Coronavirus pandemic have already seen ONS (2020) data releases delayed. The aforementioned delay, alongside what is expected to be the final nationwide census in 2021 means that this Digital Accessibility classification, a measure of social disadvantage and digital inaccessibility, is an important asset. It is perhaps one of the most up to date sociodemographic-focussed data classifications for a prolonged period, in a time when those who are already disadvantaged require the greatest support. The Digital Accessibility Classification successfully pinpoints districts across Great Britain requiring additional help to get connected to the internet or help gaining computer skills. Skill and knowledge improvement allow the internet to be used effectively to provide benefit and and qualifications should be able to access the internet and be taught the skills to use it. Cluster Analysis OXIS 2019: Digital Divides in Britain are Narrowing But Deepening Local Geographies of Digital Inequality Benefits and Harms from Internet Use: A Differentiated Analysis of Great Britain The Uses of Mass Communications: Current Perspectives on Gratifications Research The Global Digital Divide: Within and Between Countries clValid: An R Package for Cluster Validation Creating a Health/Deprivation Geodemographic Classification System using K-means Clustering Methods Developing an Individual-level Geodemographic Classification A Fast and Recursive Algorithm for Clustering Large Datasets with K-medians The Internet Galaxy: Reflections on the Internet, Business, and Society The Global Digital Divide: Within and Between Countries Recovering the Number of Clusters in Datasets with Noise Features using Feature Rescaling Factors The Connected World: The Internet Economy in the G-20 Cultures of the Internet: Five Clusters of Attitudes and Beliefs among Users in Britain Who's Responsible for the Digital Divide European Commission, 2020. Digital Economy and Society Index (DESI) 2020: United Kingdom Cluster Analysis Using Geodemographics and GIS for Sustainable Development Classification Indicators for National Science and Technology Policy: How Robust are Composite Indicators A Comparison of K-Means Clustering Algorithm and CLARA Clustering Algorithm on Iris Dataset Programming Correlation Criteria with Free Cas Software Second-Level Digital Divide: Differences in People's Online Skills From Internet Access to Internet Skills: Digital Inequality among Older Adults Geodemographics, GIS and Neighbourhood Targeting Algorithm AS 136: a K-means Clustering Algorithm Inequality in Education: Comparative and International Perspectives The Technology Underpinning 5G Introduction to the Theory and Practice of Econometrics The Connected Kingdom: How the Internet is Transforming the Finding Groups in Data: An Introduction to Cluster Analysis Geodemographics for Marketers: Using Location Analysis for Research and Marketing Balancing Opportunities and Risks in Teenagers' Use of the Internet: the Role of Online Skills and Internet Self-Efficacy Geodemographics and the Practices of Geographic Information Science The UK Geography of the e-Society: A National Classification Devising a Composite Index to Analyze and Model Loneliness and Related Health Risks in the United Kingdom Some Methods for Classification and Analysis of Multivariate Observations SNAPScapes: Using Geodemographic Segmentation to Classify the Food Access Landscape Is 5 Just What Comes After 4? Ethnic Differences in Internet Access: The Role of Occupation and Exposure. Information Global Inequality: A New Approach for the Age of Globalization A Study of Standardization of Variables in Cluster Analysis Huawei Advice: What you need to know Data Downloads: Connected Nations Connected Nations Connected Nations Area Classifications Annual Population/ Labour Force Survey Dataset Internet Users Internet Users Quality and Methodology Information Delay to GDP, Trade and Productivity Release Dates Classification and Regionalization, Census Users' Handbook GeoInformation International Two-Speed' Scotland: Patterns and Implications of the Digital Divide in Contemporary Scotland Income Inequality in the United States, 1913-1998 Broadband Speed Equity: A New Digital Divide Creating and Evidencing a Sustainable Commuting Index for London, United Kingdom Cluster Analysis Determinants of Internet Skills, Uses and Outcomes: A Systematic Review of the Second-and Third-Level Digital Divide The Clustergram: A Graph for Visualizing Hierarchical and Nonhierarchical Cluster Analyses Reconsidering Political and Popular Understandings of the Digital Divide Whose Internet is it Anyway?: Exploring Adults' (Non)Use of the Internet in Everyday Life Measuring Countries' Progress on the Sustainable Development Goals: Methodology and Challenges Mapping the Geodemographics of Digital Inequality in Great Britain: An Integration of Machine Learning into Small Area Estimation Geodemographics, Visualisation, and Social Networks in Applied Geography The Stability of Geodemographic Cluster Assignments over an Intercensal Period The Past, Present, and Future of Geodemographic Research in the United Kingdom The Persistence of Place: The Importance of Shared Participation Environments when deploying ICTs in Rural Areas Targeting Customers: How to Use Geodemographics and Lifestyle Data in Your Business Social Area Analysis, Data Mining, and GIS. Computers, Environment and Urban Systems Estimating the Number of Clusters in a Dataset via the Gap Statistic Enhanced Broadband Access as a Solution to the Social and Economic Problems of the Rural Digital Divide Stuck Out Here': The Critical Role of Broadband for Remote Rural Places Full-Fibre Broadband in the UK Huawei to be removed from UK 5G networks by 2027 The Compoundness and Sequentiality of Digital Inequality Collateral Benefits of Internet Use: Explaining the Diverse Outcomes of Engaging with the Internet Development and Validation of the Internet Skills Scale (ISS). Information Internet Skills and the Digital Divide Assessing the Broadband Gap: From the Penetration Divide to the Quality Divide Creating the UK National Statistics 2001 Output Area Classification The Diversity of Diversity: A Critique of Geodemographic Classification The Predictive Postcode: The Geodemographic Classification of British Society Browsing the Web for School: Social Inequality in Adolescents' School-Related Use of the Internet Moving On-Line? An Analysis of Patterns of Adult Internet Use in the UK Cluster Quality Versus Choice of Parameters A Geodemographic Classification of Sub-Districts to Identify Education Inequality in Central Beijing