key: cord-0871465-w2osqpi9
authors: Soltan, A. A.; Yang, J.; Pattanshetty, R.; Novak, A.; Rohanian, O.; Beer, S.; Soltan, M. A.; Thickett, D. R.; Fairhead, R.; CURIAL Translational Collaborative,; Zhu, T.; Eyre, D. W.; Clifton, D. A.
title: Real-world evaluation of AI driven COVID-19 triage for emergency admissions: External validation & operational assessment of lab-free and high-throughput screening solutions
date: 2021-08-31
journal: nan
DOI: 10.1101/2021.08.24.21262376
sha: eff1b227d1c3ea63f93e523068ad609dbc87b725
doc_id: 871465
cord_uid: w2osqpi9

Background Uncertainty in patients' COVID-19 status contributes to treatment delays, nosocomial transmission, and operational pressures in hospitals. However, typical turnaround times for batch-processed laboratory PCR tests remain 12-24h. Although rapid antigen lateral flow testing (LFD) has been widely adopted in UK emergency care settings, sensitivity is limited. We recently demonstrated that AI-driven triage (CURIAL-1.0) allows high-throughput COVID-19 screening using clinical data routinely available within 1h of arrival to hospital. Here we aimed to determine operational and safety improvements over standard-care, performing external/prospective evaluation across four NHS trusts with updated algorithms optimised for generalisability and speed, and deploying a novel lab-free screening pathway in a UK emergency department. Methods We rationalised predictors in CURIAL-1.0 to optimise separately for generalisability and speed, developing CURIAL-Lab with vital signs and routine laboratory blood predictors (FBC, U&E, LFT, CRP) and CURIAL-Rapide with vital signs and FBC alone. Models were calibrated during training to 90% sensitivity and validated externally for unscheduled admissions to Portsmouth University Hospitals, University Hospitals Birmingham and Bedfordshire Hospitals NHS trusts, and prospectively during the second-wave of the UK COVID-19 epidemic at Oxford University Hospitals (OUH). Predictions were generated using first-performed blood tests and vital signs and compared against confirmatory viral nucleic acid testing. Next, we retrospectively evaluated a novel clinical pathway triaging patients to COVID-19-suspected clinical areas where either model prediction or LFD results were positive, comparing sensitivity and NPV with LFD results alone. Lastly, we deployed CURIAL-Rapide alongside an approved point-of-care FBC analyser (OLO; SightDiagnostics, Israel) to provide lab-free COVID-19 screening in the John Radcliffe Hospital's Emergency Department (Oxford, UK), as trust-approved service improvement. Our primary improvement outcome was time-to-result availability; secondary outcomes were sensitivity, specificity, PPV, and NPV assessed against a PCR reference standard. We compared CURIAL-Rapide's performance with clinician triage and LFD results within standard-care. Results 72,223 patients met eligibility criteria across external and prospective validation sites. Model performance was consistent across trusts (CURIAL-Lab: AUROCs range 0.858-0.881; CURIAL-Rapide 0.836-0.854), with highest sensitivity achieved at Portsmouth University Hospitals (CURIAL-Lab:84.1% [95% Wilson's score CIs 82.5-85.7]; CURIAL-Rapide:83.5% [81.8 - 85.1]) at specificities of 71.3% (95% Wilson's score CIs: 70.9 - 71.8) and 63.6% (63.1 - 64.1). For 3,207 patients receiving LFD-triage within routine care for OUH admissions between December 23, 2021 and March 6, 2021, a combined clinical pathway increased sensitivity from 56.9% for LFDs alone (95% CI 51.7-62.0) to 88.2% with CURIAL-Rapide (84.4-91.1; AUROC 0.919) and 85.6% with CURIAL-Lab (81.6-88.9; AUROC 0.925). 520 patients were prospectively enrolled for point-of-care FBC analysis between February 18, 2021 and May 10, 2021, of whom 436 received confirmatory PCR testing within routine care and 10 (2.3%) tested positive. Median time from patient arrival to availability of CURIAL-Rapide result was 45:00 min (32-64), 16 minutes (26.3%) sooner than LFD results (61:00 min, 37-99; log-rank p<0.0001), and 6:52 h (90.2%) sooner than PCR results (7:37 h, 6:05-15:39; p<0.0001). Sensitivity and specificity of CURIAL-Rapide were 87.5% (52.9-97.8) and 85.4% (81.3-88.7), therefore achieving high NPV (99.7%, 98.2-99.9). CURIAL-Rapide correctly excluded COVID-19 for 58.5% of negative patients who were triaged by a clinician to COVID-19-suspected (amber) areas. Impact CURIAL-Lab & CURIAL-Rapide are generalisable, high-throughput screening tests for COVID-19, rapidly excluding the illness with higher NPV than LFDs. CURIAL-Rapide can be used in combination with near-patient FBC analysis for rapid, lab-free screening, and may reduce the number of COVID-19-negative patients triaged to enhanced precautions (amber) clinical areas.

Reducing nosocomial transmission of SARS-CoV-2 has been identified as a priority in safeguarding patient and healthcare staff safety, particularly as individuals with existing medical conditions are at greatest risk of severe illness and death [1] [2] [3] [4] [5] . However, as the early clinical course of infection is often characterised by weaklyspecific symptoms, and can be asymptomatic, viral testing is necessary to identify cases and is mandated for all UK hospital admissions. 4 The mainstay of testing is batch-processed laboratory polymerase chain assay (PCR), which is imperfectly sensitive and requires specialist equipment [6] [7] [8] . Turnaround times have shortened throughout the pandemic, typically to within 12-24h in hospitals in high-and middle-income countries, but the interim uncertainty around patients' COVID-19 status may contribute to treatment delays and postpone transfer to wards, thereby contributing to nosocomial transmission and operational strain. Novel rapid testing solutions have been adopted, including point-of-care (POC) PCR, loop mediated isothermal amplification, and lateral flow antigen testing (LFD), despite limitations in throughput and sensitivity 9, 10 . Where POC PCR is available, use is typically constrained to time-critical decisions, such as surrounding emergency or transplant surgery, due to supply 11, 12 . Moreover, although LFD is lab-free and highly specific (>99.5%) 13, 14 , allowing for a role in community case-finding 15 , multiple reports show more limited sensitivity (~40%-70%) [16] [17] [18] leading up to the U.S. Food and Drug Administration issuing a Class 1 recall on June 10, 2021 19 . A recent study evaluating performance amongst unscheduled hospital admissions confirmed high specificity (99.6%), but relatively low sensitivity (62%) 10 .

We recently demonstrated that an artificial-intelligence (AI) screening test (CURIAL-1.0) can rapidly detect COVID-19 amongst patients being admitted to hospital, by recognising SARS-CoV-2-induced abnormalities in routinely collected data 20 . A strength of our approach is the use of readily available blood test, blood gas & physiological measurements which are typically collected within 1h of presentation to hospitals in high-and middle-income countries, without requiring patient exposure to ionising radiation 21, 22 . Explainability analyses revealed that features most informative to predictions were components of the Full Blood Count (FBC) and vital signs (Basophil count, Eosinophil count and Oxygen requirements), offering promise for clinically-guided optimisation to reduce prediction time.

Whereas many studies have investigated AI applications for diagnosis and prognosis during the pandemic [23] [24] [25] [26] , key reviews highlight sector-wide methodological and reporting concerns that threaten generalisability, questioning the suitability of many models to-date for clinical use [27] [28] [29] . Reviewing the contribution of AI to the COVID-19 response, a recent editorial highlighted the promise of CURIAL-1.0 amongst other solutions to support patients during the pandemic, discussing the importance of highquality validation studies inclusive of diverse patient populations 30 . Moreover, additional work quantifying benefits in the real-world clinical setting would demonstrate the clinical added-value of such approaches.

Accordingly, in this study we investigate efficacy, generalisability and real-world operational benefits of AI-driven COVID-19 screening in emergency departments, using insight from explainability analyses to improve generalisability and exploit . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ;  https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint advances in near-patient diagnostics for lab-free COVID-19 screening 31 . First, we externally validate two models with rationalised predictors, optimised separately for throughput (CURIAL-Lab; vital signs & routine blood tests) and speed (CURIAL-Rapide; vital signs & FBC only), across three independent UK NHS hospital trusts and prospectively for the second wave of the UK COVID-19 epidemic at Oxford University Hospitals. Next, we propose and investigate a novel clinical triage pathway using our AI models to enhance sensitivity of rapid antigen test (LFD)based triage for unscheduled admissions. Lastly, we deploy CURIAL-Rapide alongside an approved point-of-care FBC analyser (OLO; SightDiagnostics, Israel) to provide rapid lab-free COVID-19 screening in the John Radcliffe Hospital's Emergency Department (Oxford, UK), evaluating real-world performance and operational characteristics at a time of falling COVID-19 community prevalence in the UK 32 .

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ;  https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

We updated our previously described model, designed to identify patients presenting to hospital with COVID-19 using vital signs, blood gas and routine laboratory blood tests (CURIAL-1.0 20 ), with additional training data. The model was trained previously using clinical data from patients presenting to emergency and acute medical services at Oxford University Hospitals (OUH) between Dec 1, 2017 and April 19, 2020;  additional data on all COVID-19-positive patient presentations to June 30, 2020 were added (Appendix B) 20 . This was performed to encompass all COVID-19 cases presenting to OUH during the 'first wave' of the COVID-19 pandemic (Supplementary Figure S1 ). OUH consist of four teaching hospitals, serving a population of 600 000, and provides tertiary referral services to the surrounding region. Routine blood tests were full blood count (FBC), urea, creatinine and electrolytes (U&Es), liver function tests (LFTs), coagulation and C-reactive protein (CRP), due to their ubiquity within existing emergency care pathways and rapid results, typically within around 1 h.

Next, we eliminated predictors with lower relative feature importances to improve generalisability across hospitals. CURIAL-Lab uses a focussed subset of routinely performed blood tests (FBC, U&Es, liver function tests, CRP) and vital signs (Table  1) , eliminating the coagulation panel and blood gas which are not universally performed and are less informative 20 . Separately, we optimised for result-time, developing a minimalist model (CURIAL-Rapide) considering only predictors that can be obtained by the patient bedside (FBC and vital signs). We selected the FBC due to recent approval of a point-of-care haematology analyser (OLO, SightDiagnostics, Israel) with a result-time of 10 minutes, and as explanability analyses showed FBC components were most informative 31 . Models were trained using the OUH first-wave dataset described above. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

We externally validated CURIAL-Rapide and CURIAL-Lab, calibrated during training to a sensitivity of 90%, across three independent UK National Health Service (NHS) hospital Trusts by comparing model predictions to confirmatory molecular testing (SARS-CoV-2 laboratory PCR, SAMBA-II and Panther). Participating hospital trusts were University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth University Hospitals NHS Trust (PUH), serving a total population of ~3.5 million. We evaluated the models for all patients aged over 18 who had an unscheduled admission via emergency or acute medical pathways and received a blood draw on arrival during the specified date ranges. Screening against eligibility criteria, followed by anonymisation, were performed by the respective NHS trusts. Patients who dissented to EHR research, did not have confirmatory molecular testing for COVID-19, or had only an invalid confirmatory test result with no subsequent valid result, were excluded. For trusts where blood-gas results were available for electronic extraction, we also evaluated CURIAL-1.0. Evaluation periods and confirmatory testing method are listed in Appendix C.

To investigate suitability of CURIAL-Rapide, CURIAL-Lab and CURIAL-1.0 as rapid screening tests for unscheduled admissions, we compared sensitivities and negative . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Figure S2 ). From December 23, 2020 patients admitted to OUH from acute and emergency care settings (Emergency Department, Ambulatory Medical Unit, Medical Assessment Unit) had LFDs performed routinely alongside PCR testing. Swabs of the nose and throat were collected for both tests by trained nursing or medical staff and LFDs were performed in the emergency/acute departments. Positive, negative and invalid LFD results were documented in the EHR. Swabs for PCR were transferred to the clinical laboratory in viral transport medium and tested by PCR (ThermoFisher TaqPath), forming the reference standard for evaluating model predictions and LFDs.

A combined algorithm to enhance the sensitivity of LFD testing Next, we investigated whether CURIAL-Rapide, CURIAL-Lab and CURIAL-1.0 could enhance the sensitivity of LFDs for identifying COVID-19 amongst patients being admitted to hospital. We proposed and retrospectively evaluated a novel clinical triage pathway ( Figure 1 ) labelling patients as COVID-19-suspected where they had either a positive CURIAL result or a positive LFD result. Due to high specificity, in our pathway patients with positive LFDs can be streamed directly to a COVID-19positive clinical area, meanwhile patients with a negative LFD but a positive CURIAL result would be managed in an enhanced-precautions area pending PCR adjudication. The pathway aimed to enhanced negative predictive value for patients receiving both negative LFD and CURIAL results, reduce the false-negative rate and therefore supporting safe and rapid triage directly to a 'green' zone.

We assessed performance of this novel pathway retrospectively for all unscheduled admissions to OUH where patients received LFD testing, from introduction on December 23, 2020 to March 6, 2021 . Comparison between the performances of CURIAL-Lab, CURIAL-Rapide and CURIAL-1.0 alone are performed with integrated clinical pathway for each model using McNemar's Chi-Square test.

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

To prospectively assess operational and predictive performance of CURIAL-Rapide in a lab-free setting, we deployed two OLO rapid haematology analysers [SightDiagnostics, Tel Aviv] in the Emergency Department (ED) at the John Radcliffe Hospital, Oxford, as part of an OUH-approved service evaluation (Ulysses ID: 6907) 31 . We simultaneously aimed to improve routine clinical care by reducing the time for routine blood test results to become available in ED. The analysis plan and data requirements for the CURIAL-Rapide evaluation were determined prospectively and registered with the Trust service evaluation database.

We estimated a suitable review-point using Buderer's standard formulae 35 . Predicting a sensitivity of CURIAL-Rapide of 80% (matching model calibration), specificity of 75%, and prevalence of COVID-19 at 15% amongst patients in ED, we estimated a minimum sample size of 410 enrolled patients to determine sensitivity and 85 patients to determine specificity (95% confidence, precision 10% 36 ). We therefore planned to review model performance once 500 patients had been enrolled, to allow for missing or invalid confirmatory tests.

The service evaluation operated from February 18, 2021 to May 10, 2021 between 8am and 8pm for patients meeting the eligibility criteria. Patients eligible were aged over 18, attending the ED with an acute illness and streamed to a bedded clinical area, and had consented to receive full blood count analysis and vital signs as part of their emergency care plan. We selected patients allocated to bedded clinical areas as non-ambulatory patients typically have higher acuity of illness, therefore being more likely to benefit from faster blood test results and having a higher probability of admission. Patients were identified on arrival using the FirstNet system (Cerner Millennium, Cerner, UK).

Eligible patients were enrolled for additional near-patient, lab-free full blood count analysis using the OLO, which in conjunction with vital signs were used to generate CURIAL-Rapide predictions. OLO FBC results were uploaded immediately to the EHR, making results available to clinicians and supporting routine care. We excluded patients with an invalid OLO result and no subsequent successful result, thereby ensuring data completeness. Routine COVID-19 testing was performed in line with trust policies, with LFDs (Innova SARS-CoV-2 Antigen Rapid Qualitative Test) performed in the department and paired multiplex PCR on-premises in a dedicated laboratory (ThermoFisher TaqPath). Patients who did not have confirmatory testing for COVID-19 within routine care were excluded from performance evaluation.

We recorded patients' arrival time to the hospital, measurement time of vital signs, and result times for LFD, PCR, OLO and laboratory FBC analysis. We also recorded the first-attending clinician's triage impression of COVID-19 status, using the locally adopted Green/Amber/Blue categorization system (Green representing a patient whose illness has no features of COVID-19, amber representing an illness with features potentially consistent with COVID-19, and blue representing laboratory confirmed COVID-19 infection) 37, 38 . Where COVID-19 triage category had not been documented by the first-assessing physician, adjudication was performed through clinical review of notes using rules-based determination. Patients having documentation of a new continuous cough, temperature ≥37.8°C, or loss or change in sense of smell or taste were adjudicated as an 'amber' (COVID-19-suspected) stream, matching UK Government guidance on definition of a possible COVID-19 . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint case 8 . Patients with PCR-confirmed COVID-19 in the 10 days preceding attendance were adjudicated to the 'Blue' (COVID-19-confirmed) stream. Patients with no features of COVID-19 and no documented clinical suspicion were adjudicated to the 'Green' stream.

We generated CURIAL-Rapide predictions using OLO results and vital signs, comparing predictions against results of confirmatory PCR testing. CURIAL-Rapide predictions were not made available to the attending clinician so as not to influence the clinical triage category or decisions to proceed to confirmatory testing for patients being discharged. Availability time for CURIAL-Rapide was the later of OLO result time and vital signs recording time as both are required to generate a prediction. The time-to-result for PCR, lateral flow, and CURIAL-Rapide tests were calculated as the time from a patient's first arrival in the ED to the time of a test result being available on the EHR.

We selected time-to-result as our primary outcome, recognising the role of rapid test results in reducing nosocomial transmission. Our secondary outcomes were sensitivity, specificity, PPV and NPV for CURIAL-Rapide & LFDs, and AUROC for CURIAL-Rapide, assessed against PCR results. Further detail is provided in Appendix D.

The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the manuscript. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint Table 2 : Summary population characteristics for (a) OUH pre-pandemic and COVID-19-cases training cohorts, (b) prospective validation cohort of patients attending OUH during the second wave of the UK COVID-19 epidemic, (c) independent validation cohorts of patients admitted to three independent NHS Trusts, (d) admissions to OUH during the second-wave receiving LFD testing, (e) patients enrolled to the CURIAL-Rapide lab-free service evaluation at OUH. The derivation of OUH cohorts is shown in Supplementary Figure S2 . * indicates merging for statistical disclosure control. Our updated training set (Supplementary Figure S2) is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint 46.7%; chi-square p<0.0001) and reported a South Asian ethnicity (13.2% versus 0.5% and 2.0%; chi-square p<0.0001).

Prevalence of COVID-19 was higher in the Bedfordshire cohort owing to the evaluation period matching the timeline of the second wave of the UK COVID-19 epidemic (11.1% versus 5.29% (PUH) and 4.27 (UHB); Fisher's exact test p<0.0001 & <0.0001). Summaries of vital signs, index routine blood tests and blood gases are presented in Supplementary Tables S2-S4 . A sensitivity analysis to assess susceptibility of our models to imputation strategy demonstrated consistent performance across multiple imputations (Table 3) . We therefore performed subsequent evaluation using a model trained solely using a single imputation strategy (population median).

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint survival analyses (Figure 4a ) showed CURIAL-Rapide results were available sooner than LFDs (log rank test, p<0.0001) and PCR results (p<0.0001). The median timeto-result for full blood count analysis was shorter with near-patient OLO analysis (44:00 min, 31:00-63:00 min) than laboratory analysis (76:00 min, 58:00-100:00 mins; p <0.0001), confirming an improvement to routine care. CURIAL-Rapide results had a negative predictive value of 99.7% (98.2-99.9), specificity of 85.4% (81.3-88.7), and AUROC of 0.907 (0.803-1.00). The point estimate of CURIAL-Rapide's sensitivity was 87.5%, however 95% CIs were wide owing to the lower-than-expected prevalence of COVID-19 (52.9-97.8)).

In one presentation, a patient given a 'negative' CURIAL-Rapide prediction went on to have a positive SARS-CoV-2 PCR test, although they had a negative LFD result and were triaged to a COVID-19-free ('green') clinical area. The patient did not have COVID-19 symptoms on this presentation. We noted that the patient had also been enrolled to the service evaluation 10 days prior; on that occasion having a positive CURIAL-Rapide prediction, positive Lateral Flow Test and a positive PCR test. This raises the possibility of a latent positive PCR result, detecting non-infectious residual viral fragments, on the date of the second presentation 39, 40 .

Rates of COVID-19 status misclassification were comparable between CURIAL-Rapide and clinician judgement (McNemar's Exact test; p=0.91). Moreover, of the 53 patients who were triaged to a 'COVID-19-suspected' (amber) pathway by the attending clinician but went on to test negative by PCR, 31 patients (58.5%) had a negative CURIAL-Rapide prediction demonstrating that the AI system could reduce operational strain by expediting clinical exclusion of infection. As all patients with positive LFD results also had positive CURIAL-Rapide predictions, a combined CURIAL-Rapide/LFD pathway did not impact classifier performance in this evaluation.

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

National health policy recognises that effective in-hospital triage is necessary to safeguard patient and staff safety, mandating COVID-19 testing for all admissions 4 . However, despite significant innovation leading to new near-patient testing options, alongside reduced PCR result-times to typically within 12-24 h, there remain significant performance and logistical limitations that contribute to nosocomial transmission and operational strain. While many hospitals have adopted LFDs within acute admissions pathways 10 , our study confirms a limited sensitivity (56.9%) indicating a clinically-meaningful false negative rate (Figure 3 , Supplementary Table  S6 ) [14] [15] [16] 41 .

In this study we demonstrate generalisability, efficacy, and real-world operational benefits of AI-driven COVID-19 screening in the acute care setting. Whereas rapid molecular testing options are frequently rationed 11,12 , we show that a high-throughput AI-solution, CURIAL-Lab, rapidly excludes COVID-19 using routine data and generalises across three independent hospital groups ( Figure 2 ). Moreover, we improve upon the speed of existing rapid testing solutions, demonstrating a median result-time of 45 minutes (32-64 min; CURIAL-Rapide) from patients' first arrival in an emergency department using near-patient haematology analysis. This decentralised approach may support time-critical decision making and assist triage in remote and primary care settings where laboratory facilities are less readily available.

In our external and prospective validation of CURIAL-Lab & CURIAL-Rapide, model performance was consistently high across four UK hospital groups (CURIAL-Lab: AUROC range 0.858-0.881; CURIAL-Rapide 0.836-0.854), with high negative predictive values confirming suitability as tests-of-exclusion for COVID-19. CURIAL-Lab expectedly achieved marginally superior performance to CURIAL-Rapide, representing a trade-off between result-time and specificity which would favour different clinical use-cases. Strengths of the validation include geographic breadth, including over 72,000 patients across three regions of the UK (Midlands, South East, and East of England), and temporal breadth across both waves of the UK COVID-19 epidemic therefore including vaccinated patients and patients with Coronavirus variants, across a range of prevalences (4.27%-12.2%). Moreover, the validation study considers the broad range of confirmatory COVID-19 tests (PCR, SAMBA-II & Panther) utilised across different centres, and address sector-wide concerns highlighted in key reviews of COVID-19 diagnostic and prognostic models by using external, representative cohorts of all unscheduled adult admissions 27, 28 . Notable limitations include that the external validation is solely UK-based and the limitations of the confirmatory testing methods, with PCR testing having been shown to be imperfectly sensitive 6, 7 . We were unable to quantify the number of vaccinated patients as we could not link our de-identified hospital datasets with vaccination records.

Our study finds that CURIAL-Rapide & CURIAL-Lab achieve significantly higher sensitivity and NPV than LFDs, improving upon standard-care by reducing risk of a COVID-19-positive patient being streamed to a COVID-19-free clinical area. Moreover, our study is the first to validate an application of an AI test to enhance sensitivity of LFDs in a real-world clinical setting, with our combined clinical pathway . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint (Figure 1 ; CURIAL-Lab/LFD) achieving both high sensitivity of 85.6% (81.6 -88.9) and overall classification performance with AUROC of 0.925 (0.905 -0.945). A significant beneficiary population includes patients streamed to COVID-19-free ('green') clinical areas on receiving negative LFD and CURIAL results, which are performed in parallel and available much sooner than PCR results. Moreover, our pathway identifies an enriched subpopulation at greater risk of testing positive for COVID-19 (negative LFD & positive CURIAL-Lab/Rapide compared to negative LFD alone), therefore enabling prioritisation for rapid confirmatory testing where availability is limited. Limitations of this analysis include that patients may have been inadvertently excluded if LFDs were recorded incorrectly on the EHR, and the analysis was performed only for a single trust as other participating sites did not electronically record LFD results.

We report the fastest result-time to date for AI-driven COVID-19 screening in a hospital emergency department, using lab-free haematology analysis to achieve a median reduction of 16 minutes (26.3%) over LFDs. Significantly, by demonstrating that CURIAL-Rapide correctly excluded COVID-19 for 58.5% of negative patients who were triaged by a clinician to a 'COVID-19-suspected' (amber) clinical area, we show a role for AI screening in reducing delays in transfers to wards. A strength of our service evaluation is its real-world context and operational focus, assessing time from first-arrival to result availability, and demonstrating added clinical value by comparison to LFD results and clinician impressions. In this study, we address the need for evidence of clinical utility that AI tools such as CURIAL offer to the pandemic response 30 ; demonstrating both operational and safety improvements to standard-care.

A significant limitation of our service evaluation is that, although the a priori target sample size was achieved, the desired precision and power levels were not achieved for the metric of sensitivity due to sharply falling prevalence of COVID-19 in the UK, associated with the national vaccination programme and public health measures 32, 42 . The evaluation was, however, adequate to determine specificity. As a service evaluation, we used a convenience series, with OLO operation limited to daytime and evening hours (8am-8pm) for logistical reasons. Moreover, although routine LFD testing was hospital policy, 33% of enrolled patients did not have a coded result in their EHR using the pre-specified form raising the possibility that these may have been recorded elsewhere or communicated verbally. As all patients who were LFD positive also had a positive CURIAL-Rapide result in our study, a larger evaluation is needed to assess whether integrating LFD results could further improve performance of CURIAL-Rapide in this context. Further evaluation would assess performance as a clinical decision support aid and for sensitivity across coronavirus variants.

A major strength of the CURIAL-Rapide and CURIAL-Lab solutions is the use of clinical data that is readily available and routinely collected for all patients admitted to hospital. Our approach optimises generalisability with CURIAL-Lab, applicable to virtually all unscheduled patient admissions to hospital, thereby facilitating COVID-19-screening without significant additional cost. Where faster exclusion of COVID-19 is helpful for operational or treatment reasons, CURIAL-Rapide can provide faster results by eliminating the need for blood sample transportation and laboratory processing, at an approximate cost of around ~£9 (~$12.50; inclusive of device rental and consumables). By contrast, alternative strategies for AI-assisted COVID- is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint 19 diagnosis largely focus on chest imaging 23, 28, 43 , which involve patient exposure to ionising radiation and have higher costs. Following successful COVID-19 vaccination programmes, falling community prevalence may reduce cost-effectiveness of universal PCR-testing for unscheduled admissions. Our results suggest that CURIAL-Lab could deliver significant cost savings by reducing the number of routine PCR tests by >85% (where prevalence <2%) while achieving high NPV, utilising data that would be collected within the routine course of a patients' care.

Our work demonstrates generalisability, efficacy, and real-world operational benefits of AI-driven COVID-19 screening for patients attending hospital. Future work would assess international generalisability, evaluate clinician-model interactions, and assess sensitivity of model performance across vaccination types and infection with variants of concern.

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

As previously, we included all patients attending acute and emergency care settings at Oxford University Hospitals NHS foundation trust who received routine blood tests on arrival, considering presentations before December 1, 2019, and thus before the pandemic, as the COVID-19-negative (control) cohort. We considered presentations during the 'first wave' of the UK COVID-19 pandemic (December 1, 2019 to June 30, 2020) with PCR confirmed SARS-CoV-2 infection as the COVID-19-positive (cases) cohort. We excluded patients who opted out of electronic health record (EHR) research and those who did not receive laboratory blood tests or were younger than 18 years of age. Due to incomplete penetrance of testing during the first wave of the pandemic, and imperfect sensitivity of the PCR test, there is uncertainty in the viral status of patients presenting during the pandemic who were untested or tested negative. We therefore selected a pre-pandemic control cohort during training to ensure absence of disease in patients labelled as COVID- 19- 

Data normalisation was implemented to mitigate overfitting and to avoid the reliance of the model on measurement units. Categorical data were handled by encoding as "1-hot" variables. Where a lab value was reported as being below the threshold of detection of the laboratory assay, the value was replaced with a numerical zero value. Where values were reported as being above the threshold of detection, clinically appropriate values were selected to maintain the significance of the high result. A summary of first-performed blood tests, vital signs, and blood gasses on arrival to hospital are shown Supplementary Tables S2-S4.

Multiple imputation strategies, population median, population mean, and age-based imputation, were separately used to impute missing data initially. As a sensitivity analysis to assess for effects of imputation strategy on model performance, we assessed performance of models trained using each imputation method prospectively for all patients attending emergency departments and acute medical services across OUH during the second-wave of the COVID-19 pandemic (October 01, 2020 and March 06, 2021; Table 2 ). Mean performance was reported alongside SD in Table 2 , with narrow standard deviations in all performance metrics demonstrating resilience to imputation method. We therefore subsequently only used models trained with missing data imputed using population median, reporting results alongside 95% confidence intervals (CIs).

We repeated training and optimisation of our eXtreme Gradient BOOSTed tree model (XGBoost) to discriminate COVID-19-positive cases from pre-pandemic COVID-19-negative controls, for each of the three feature-sets (Table 1) 44 . During training using 'first wave' case, controls were matched for age, gender, and ethnicity at a ratio of 1:20. For missing data, we initially used three independent imputation . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint methods during training -median, mean and age-based mean -and assessed sensitivity of model performance to imputation strategy during testing. Thresholds were calibrated to achieve sensitivities of 80% and 90% during training, using stratified 10-fold cross validation.

XGBoost is a generalisation of boosting to an arbitrary differentiable loss function. XGBoost is more robust to outliers and has high predictive power. The scikit-learn (v0.23.2), LIBLINEAR (v2.41) and XGBoost (v1.2.0) modules for Python were used during model development and classifier evaluation. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint For trusts where blood-gas results were available for electronic extraction, we also evaluated CURIAL-1.0. Patients meeting inclusion criteria had an unscheduled acute or emergency care admission, during the specified periods, received a blood draw on arrival and were aged over 18. We excluded patients who did not have a valid confirmatory test result within a prespecified period, or who had opted out of EHR research. Screening against eligibility criteria, followed by anonymisation, was performed by the respective NHS Trusts.

Evaluation at Portsmouth Hospitals NHS Foundation Trust (PUH) considered all patients admitted to the Queen Alexandria Hospital, serving a population of 675,000 and offering tertiary referral services to the surrounding region, between March 1, 2020 and February 28, 2021. Confirmatory COVID-19 testing was by laboratory SARS-CoV-2 RT-PCR assay, considering any positive PCR result within 48hrs of admission as a true positive. As blood gas results were not available for electronic extraction, we evaluated only CURIAL-Rapide and CURIAL-Lab at Portsmouth. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint We report sensitivity, specificity, positive and negative predictive values (PPV and NPV), AUROC and F1 alongside 95% CIs (Supplementary Table S5 & Figure 2 ), comparing model predictions to results of confirmatory viral testing (laboratory PCR and SAMBA-II). . Confidence intervals for sensitivity, specificity and predictive values were computed using Wilson's Method 33 , and for AUROC with DeLong's method 34 . 

We considered any positive lateral flow test which was followed by a positive PCR test within a +/-48hr window of a patient being admitted to hospital to represent a true positive infection. As previously, model predictions were generated using blood tests performed from the first blood draw on arrival and first-recorded vital signs. In the integrated clinical pathway (Figure 1 ), patients were considered COVID-19suspected if they had either a positive LFD result or CURIAL prediction. Results are show in Figure 3 and Supplementary Table S6 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Figure S3 : Instructions to trained operators, specifying eligibility criteria for the service evaluation, sample handling and processing techniques.

Confirmatory testing of patients enrolled in the OLO/CURIAL-Rapide service evaluation, and LFD comparison, followed OUH trust policies. Swabs of the nose and throat were routinely performed in the emergency department for all patients being admitted to OUH. Lateral Flow Testing (Innova SARS-CoV-2 Antigen Rapid Qualitative Test) was performed in the department, by trained nursing or medical staff, and results were documented on the electronic record. Swabs for PCR were transferred to the clinical laboratory in viral transport medium and tested by PCR (ThermoFisher TaqPath) . Where patients were not tested for COVID-19 by confirmatory PCR, or did not receive blood tests or vital signs as part of routine care, we excluded the patients from the CURIAL-Rapide evaluation. We also excluded patients with an invalid OLO result and no subsequent successful result, thereby ensuring data completeness.

Binary CURIAL-Rapide triage predictions (COVID-19-Suspected and COVID-19-Negative) were generated using a custom Python 3.0 application. Libraries used included scikit-learn, pandas, and NumPy. No other clinical data was made available to the algorithm. CURIAL-Rapide predictions were not made available to clinicians in this study, so as not to influence the clinical triage category or decisions to proceed to confirmatory testing.

We compared CURIAL-Rapide predictions, lateral flow results, and clinical triage category by first-assessing clinician against a PCR reference standard. We determined and report sensitivity, specificity, PPV, NPV and accuracy, alongside 95% confidence intervals. We calculated the time-to-result for each test, presenting mean with standard deviation for normally distributed data, and median with interquartile range for data with a skewed distributed (Table 6 ). Laboratory FBC . CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint samples were not processed for 2 of the 520 patients, owing to sample or labelling errors. For paired samples, we compared time-to-result between each test using a one-tailed Wilcoxon Signed Rank test. We additionally performed a Kaplan-Meier survival analysis (Figure 4 ).

We report our study in compliance with the "Standards for Reporting Diagnostic accuracy studies" (STARD) standards 46, 47 .

. CC-BY 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint

The Guardian UK. 40,600 people likely caught Covid while hospital inpatients in England

Factors associated with deaths due to COVID-19 versus other causes: population-based cohort analysis of UK primary care data and linked national death registrations within the OpenSAFELY platform. The Lancet Regional Health

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study

Chief People Officer & National Director for Emergency and Elective Care

Living risk prediction algorithm (QCOVID) for risk of hospital admission and mortality from coronavirus 19 in adults: national derivation and validation cohort study

Estimating the falsenegative test probability of SARSCoV-2 by RT-PCR

Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure

COVID-19: investigation and initial clinical management of possible cases

Performance evaluation of the SAMBA II SARS-CoV-2 test for point-of-care detection of SARS-CoV-2

Use of lateral flow devices allows rapid triage of patients with SARS-CoV-2 on admission to hospital

University Hospitals Birmingham NHS Foundation Trust. Pathway for processing urgent COVID-19 samples

Home-based SARS-CoV-2 lateral flow antigen testing in hospital workers

Lateral flow device specificity in phase 4 (post marketing) surveillance

Rapid evaluation of Lateral Flow Viral Antigen detection devices (LFDs) for mass community testing

Covid-19: Safety of lateral flow tests questioned after they are found to miss half of cases

Covid-19: Mass population testing is rolled out in Liverpool

Covid-19: MHRA is concerned over use of rapid lateral flow devices for mass testing

Covid-19: US regulator raises significant concerns over safety of rapid lateral flow tests

Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test

The COvid-19 Multi-omics Blood ATlas (COMBAT) Consortium. A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. medRxiv

Presenting Characteristics, Comorbidities, and Outcomes among 5700 Patients Hospitalized with COVID-19 in the

Artificial intelligence-enabled rapid diagnosis of patients with COVID-19

Artificial intelligence driven assessment of routinely collected healthcare data is an effective screening test for COVID-19 in patients presenting to hospital

Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study

COVID-19 Artificial Intelligence Diagnosis Using Only Cough Recordings

Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal

Cautions about radiologic diagnosis of COVID-19 infection driven by artificial intelligence

Artificial intelligence for COVID-19: saviour or saboteur? The Lancet Digital Health

An Artificial Intelligence-Assisted Diagnostic Platform for Rapid Near-Patient Hematology. medRxiv (2021)

Over half of UK adults vaccinated with second dose

Proportions and their differences

Fast implementation of DeLong's algorithm for comparing the areas under correlated receiver operating characteristic curves

Statistical Methodology: I. Incorporating the Prevalence of Disease into the Sample Size Calculation for Sensitivity and Specificity

Sample size calculator

Effective strategies to prevent in-hospital infection in the emergency department during the novel coronavirus disease 2019 pandemic

University Hospitals Plymouth NHS Trust. Ward and Department -Coronavirus Infectious Disease-19 Zone Status

Clarifying the evidence on SARS-CoV-2 antigen rapid tests in public health responses to COVID-19

Duration and key determinants of infectious virus shedding in hospitalized patients with coronavirus disease-2019 (COVID-19)

Rapid, point-of-care antigen and molecular-based tests for diagnosis of SARS-CoV-2 infection

Coronavirus (COVID-19) Infection Survey

Thoracic imaging tests for the diagnosis of COVID-19

XGBoost: A scalable tree boosting system

Point of Care Nucleic Acid Testing for SARS-CoV-2 in Hospitalized Patients: A Clinical Validation Trial and Implementation Study

An updated list of essential items for reporting diagnostic accuracy studies

We express our sincere thanks to all patients and staff across We thank all healthcare professionals and students who have supported the CURIAL-Rapide Service evaluation. In particular, we wish to acknowledge: 

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprintThe copyright holder for this this version posted August 31, 2021. ; https://doi.org/10.1101/2021.08.24.21262376 doi: medRxiv preprint Appendix C:Appendix D: CURIAL-Rapide lab-free service evaluation Service evaluation of the OLO haematology analyser/CURIAL-Rapide operated between February 18, 2021 and May 10, 2021 between 8am and 8pm.

We specified that clinical staff carrying out the service evaluation must ordinarily be employed by OUH, participate in the care of patients as part of their usual duties, have completed all statutory & mandatory training required by the trust for their role including for electronic health record systems, and be familiar and competent in using these systems as part of their usual role. We permitted student doctors meeting the above requirements to participate. Training to operate the OLO was provided by in-person device training, supported by demonstration and documentation from the device manufacturers, and a supporting online training video (made available at https://youtu.be/UofBAL7sAzc). Weekly quality-control checks were performed on the OLO analysers.

OUH sites for eligibility: John Radcliffe Hospital Inclusion: Adult patients (aged >18) Clinical areas for sampling eligibility were ED Assessment area, ED Majors Beds and ED Resus. Patients who are not receiving blood tests on presentation to the emergency department as part of their care were not eligible.

Eligible patients were identified to take part in the service evaluation using the locally-adopted Cerner FirstNet system. Vital signs and blood draws were performed on arrival to the emergency department by healthcare professionals as part of routine care. Following trust procedures, vital signs were documented on the trust electronic health record [SEND; Sensyne Health], and blood bottles were labelled using printed labels from the electronic record. Two drops of venous blood (27uL) from a routinely-collected EDTA blood tube were extracted using a single-use sampling device, and prepared for OLO analysis by trained operators directed by onscreen instructions 31 . OLO results were uploaded immediately to the electronic medical record using the POCcelerator Data Management System [Siemens Healthineers GmbH, Erlangen, Germany], making results available to clinicians and n=3207 prevalence 11.1%