key: cord-0823689-125o0o7x authors: Liu, Qibin; Fang, Xuemin; Tokuno, Shinichi; Chung, Ungil; Chen, Xianxiang; Dai, Xiyong; Liu, Xiaoyu; Xu, Feng; Wang, Bing; Peng, Peng title: Prediction of the clinical outcome of COVID-19 patients using T lymphocyte subsets with 340 cases from Wuhan, China: a retrospective cohort study and a web visualization tool date: 2020-04-11 journal: nan DOI: 10.1101/2020.04.06.20056127 sha: 0f721af678b7d9b8abb4b6446af908e6f19705d2 doc_id: 823689 cord_uid: 125o0o7x Background Wuhan, China was the epicenter of the 2019 coronavirus outbreak. As a designated hospital, Wuhan Pulmonary Hospital has received over 700 COVID-19 patients. With the COVID-19 becoming a pandemic all over the world, we aim to share our epidemiological and clinical findings with the global community. Methods In this retrospective cohort study, we studied 340 confirmed COVID-19 patients from Wuhan Pulmonary Hospital, including 310 discharged cases and 30 death cases. We analyzed their demographic, epidemiological, clinical and laboratory data and implemented our findings into an interactive, free access web application. Findings Baseline T lymphocyte Subsets differed significantly between the discharged cases and the death cases in two-sample t-tests: Total T cells (p < 2.2e-16), Helper T cells (p < 2.2e-16), Suppressor T cells (p = 1.8-14), and TH/TS (Helper/Suppressor ratio, p = 0.0066). Multivariate logistic regression model with death or discharge as the outcome resulted in the following significant predictors: age (OR 1.05, p 0.04), underlying disease status (OR 3.42, p 0.02), Helper T cells on the log scale (OR 0.22, p 0.00), and TH/TS on the log scale (OR 4.80, p 0.00). The McFadden pseudo R-squared for the logistic regression model is 0.35, suggesting the model has a fair predictive power. Interpretation While age and underlying diseases are known risk factors for poor prognosis, patients with a less damaged immune system at the time of hospitalization had higher chance of recovery. Close monitoring of the T lymphocyte subsets might provide valuable information of the patients condition change during the treatment process. Our web visualization application can be used as a supplementary tool for the evaluation. 2020 received treatment according to the fifth edition of the Ministry of Health guidelines [8] [9] : 1. Bed rest and supportive treatment; 2. Regular examination on blood routine, CRP, biochemistry, coagulation function, myocardial enzymes, lung CT; 3. Oxygen therapy, including nasal catheter oxygen, transposal high-flow oxygen therapy, ventilator; 4. Antiviral therapy, including Lopinavir/Ritonavir tablets, Arbidol Hydrochloride Tablets, etc. If the patient maintained a normal body temperature for at least three days, and tested negative with COVID-19 (including throat swab, stool, and sputum) at least twice and 24+ hours apart, as well as showing significant improvement in lung CT, the patient would be considered recovered and discharged from the hospital. The discharged patients were transported by the Chinese government using designated vehicles and they were further isolated for at least two weeks. 2 mL venous blood samples were obtained from each patient to measure T cell subsets, and all the analyses were completed within 4 hours of sampling. FACSCalibur flow cytometer (BD Biosciences, San Jose, CA) was used for flow cytometry acquisition and analysis. The absolute count of each lymphocyte subset was determined using CD3/CD4/CD8/CD45 BD Multitest reagents according to the manufacturer's protocol (BD Biosciences). From January 31 to March 8, 2020, a total of 721 patients who tested positive of COVID-19 were admitted to Wuhan Pulmonary Hospital. Among these patients, 430 completed the treatment and were discharged, 62 died, and 229 still remain in hospitalization. Excluding four patients whose direct cause of death was not COVID-19 infection, and selecting patients who had at least one T cell Subsets test available, we had a total of 340 patients in the study, including 310 discharged cases and 30 death cases. We reviewed laboratory test results and chest CT examinations of these 340 patients, and collected all the T lymphocyte subsets tests data. If multiple T lymphocyte subsets tests were performed, we chose the earliest one as the baseline. Two researchers independently reviewed the collected data to ensure data accuracy. There have been a number of descriptive analyses about the epidemiological, clinical, laboratory, and radiological characteristics of the COVID-19 patients 10-14 . Little as yet known about how these characteristics can be used in guiding the practice of the healthcare providers 15 . As the world face the expanding COVID-19 pandemic, with the number of infected cases increase exponentially every day, any quick and easy method of understanding the patient's condition can be valuable. With this in mind we seek to build a statistical model with only a few strong predictive characteristics. As several other research teams pointed out, the potential risk factors of older age, high SOFA score, and d-dimer greater than 1 µg/L could help clinicians to identify patients with poor prognosis at an early stage 12 . Older patients (>65 years) with comorbidities and ARDS are at increased risk of death 14 . Additionally, during our experience of treating the patients, we have noticed that the T lymphocyte subsets are closely correlated to the patient's progress. We found that all the patients showed varying degrees of decline in T lymphocyte subsets at hospital admission. And the patient's condition improved or worsened with the rise or fall of the T lymphocyte subsets. We have also considered some other immune indicators and inflammatory indicators, as well as the blood routine and lung CT, but they all . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20056127 doi: medRxiv preprint have certain limitations. Immune indicators and inflammatory indicators such as thyrotropin and white blood cells were not as sensitive as the T lymphocyte subsets; the blood routine is not specific enough to differentiate the condition of the patients 11 , and although lung CT is an accurate way to assess the patient's condition, it cannot be repeated too often especially for those critical patients who needed oxygen and ventilator. In conclusion, we decided to focus our research on patient's age, underlying disease status, and the T lymphocyte subsets measures: Helper T cells, Suppressor T cells, and TH/TS Among the 310 discharged cases, 155 were male and 155 were female. The average age was 56·4 years (SD 14·0). 107 (34·5%) of them had underlying diseases (the most common underlying disease is Hypertension, 23·9% for male and 20·0% for female). The average duration of the hospitalization was 11·1 days (SD 6·94), and it took 15·1 days (SD 10·3) on average for them to reach our hospital since their first symptoms (Table 1) . Elderly patients tended to take longer to recover and they also had a (Table 3) . Among the 30 death cases, 17 were male and 13 were female. The average age was 69·0 years (SD 7·87). 23 (76·7%) of them had underlying diseases (the most common underlying disease is also Hypertension, 64·7% for male and 38·5% for female). The average duration of the hospitalization was 15·1 days (SD 8·78), and it took 11·7 days (SD 8·20) on average for them to reach our hospital since their first symptoms ( Table 2 ). The death event occurred more quickly among elderly patients ( We also performed a multivariate logistic regression model using age, underlying disease status, and the baseline T lymphocyte subsets test as the predictors to predict the patient outcome (death or hospital discharge). The significant predictors are age (OR 1·05, p 0·04), underlying disease status (OR 3·42, p 0·02), Helper T cells on the log scale (OR 0·22, p 0·00), and TH/TS on the log scale (OR 4·80, p 0·00). The McFadden pseudo R-squared 16 for the logistic regression model is 0·35, suggesting the . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20056127 doi: medRxiv preprint model has a fair predictive power (The McFadden pseudo R-squared measure ranges from 0 to below 1, Total Lymphocyte did not turn out to be significant predictors in our logistic regression model (Table 5) . We believe by looking at some of the patient's basic characteristics such as age and underlying diseases, together with the T lymphocyte subsets measures, could be a quick way to shed light on the patient's prognosis, during the time of pressure and emergency. In order for our findings to be more applicable for public health workers fighting at the frontline, we have developed an interactive web data visualization tool to implement the algorithm and made it accessible for the world at the following web address: https://rpubs.com/mindyfang/covid19. We did not use other lab tests such as the regular blood test items in our analysis because they are less differentiative than the T cell subset measures; and 3. We hope our interactive web tool could be utilized for quick use, therefore keeping as few input items as needed seems to be a more practical choice. Figure 6b shows the k-means clustering result using the Wuhan Pulmonary Hospital data. After multi-dimensional data transformation, the algorithm separates the death group and the discharged group as shown in the graph. A proportion of the discharged cases had similar profiles with the death cases, which had made it difficult for the algorithm to differentiate them apart. However, by using this algorithm, it is possible to identify a large number of patients with relatively good prognosis. We have also uploaded a dummy date set with de-identified and randomly modified patient data , as well as all the source code used for the current analysis as well as the interactive web application to: https://github.com/mindy-fang/COVID-19. All of our source code were written with the R programming language and the interactive web application was developed with the shiny package [19] [20] [21] (Rstudio Version 1.2.1335, R version 3.6.0). Other fellow researchers can substitute the dummy data with their own data, or modify the source code to make their own applications. Meanwhile we will keep updating our reference panel as we include more patients, so that the algorithm would gain more and more statistical power over time. To our best knowledge, the current research is the largest retrospective study of COVID-19 patients . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20056127 doi: medRxiv preprint with known clinical outcomes so far. Significant reductions in T cells are very common in severe COVID-19 patients. Age-dependent deficits in T cell and B cell functions and overproduction of type 2 cytokines may cause inadequate viral replication control and longer pro-inflammatory responses, which may lead to poor prognoses 22 . Lymphopenia is a prominent part of SARS-CoV infection and lymphocyte counts may be useful in predicting the severity and clinical outcomes [23] [24] [25] . We know that 2019-nCoV was once called Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-20) [26] [27] . The level and the speed of T cell recovery are important factors of assessing disease prognoses and guiding early intervention in critically ill patients. In this study we have identified that older age, underlying diseases, and low T cell counts may be risk factors for poor clinical outcomes in COVID-10 positive patients. We have also implemented an interactive web tool to visualize a new patient's risk using these risk factors. When a new patient's data is entered through the web UI, it will be shown where he is as compared to the 310 patients included in this study. It should be noted that our intension is not for the web tool to be considered as a 100% gold standard for determining the final clinical outcome of the patients. The visualization result simply suggests whether the new patient is more likely to recover or to have adverse outcomes. Another important significance of our tool is the monitoring of patient's progress in real time during the treatment. For example, the patient's longitudinal profile gradually shifting towards the red centroid (poor outcome) or the green centroid (good outcome) may provide insight on the patient's progress ( Figure 5 ). However, the actual final prognosis of the patient depends on many other factors, including the starting time of the treatment, treatment compliance, degree of treatment, and so on. Our algorithm has the following limitations: It may not predict accurately for younger patients, or patients with no symptoms, since our training data contains relatively old and severe patients; Secondly, the model was built based on a sample size of 340, which is not a large number. But we will keep updating our web application as we collect more data. In addition, the reference panel used in our analysis were infected population in Wuhan, China. Although there was no clear evidence that the underlying mechanism of the T cell depletion under the COVID-19 infection is similar or different across ethnicity groups, it is only natural to assume that the baseline values and the degree of T cell depletion would be slightly different. We found that the T-lymphocytes, B-lymphocytes, and NK cells did differ among people in different regions [28] [29] [30] . We encourage researchers around the world to download the source code and customize it with their own data, as they accumulate more experience and knowledge with patients from their own hospitals or regions. To end our writing with the most recent update (March 13, 2020), the epidemic in Wuhan has gradually passed its peak. All the cabin hospitals were closed, and other hospitals in Wuhan started to return to their normal track. Medical volunteer teams have returned to their home cities. We give thanks to all the support that we have received during our most difficult time and hope the situation would improve quickly for other parts of the world. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.06.20056127 doi: medRxiv preprint Clinical management of severe acute respiratory infection when novel coronavirus (nCoV) infection is suspected Coronavirus disease 2019 Novel Coronavirus in the United States The novel coronavirus originating in Wuhan, China: challenges for global health governance Persistence and clearance of viral RNA in 2019 novel coronavirus disease rehabilitation patients Clinical management of severe acute respiratory infection when Novel coronavirus (nCoV) infection is suspected: interim guidance National Health Commission of the People's Republic of China. Chinese management guideline for COVID-19 (version 6.0) Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in Wuhan Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study Pathological findings of COVID-19 associated with acute respiratory distress syndrome The assessment of fit in the class of logistic regression models: A pathway out of the Jungle of Pseudo-R²s Application of K-Means Algorithm for Efficient Customer Segmentation: A Strategy for Targeted Customer Services Points of Significance: Principal component analysis Web Application Development with R Using Shiny 1st edn R: A language and Environment for Statistical Computing. (R Foundation for Statistical Computing RStudio Inc. shiny: Web Application Framework for R IL-23 and PSMA-targeted duo-CAR T cells in Prostate Cancer Eradication in a preclinical model Effects of severe acute respiratory syndrome (SARS) coronavirus infection on peripheral blood lymphocytes and their subsets Cellular Immune Responses to Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV CD4+ T Cells Are Important in Control of SARS-CoV Infection Long-lived memory T lymphocyte responses against SARS coronavirus nucleocapsid protein in SARS-recovered patients Longitudinal Intra-and Inter-individual variation in T-cell subsets of HIV-infected and uninfected men participating in the LA Multi-Center AIDS Cohort Study.Medicine (Baltimore) Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Normal Values of T, B and NK Lymphocyte Subpopulations in Peripheral Blood of Healthy Cuban Adults Our research was supported by medical teams from Shanxi, Inner Mongolia and other parts of China. The authors would like to thank Prof Li Ming, Prof Zhu Qi, Prof Yang Chengqing, Prof Guo Guangyun, Prof Du Juan, Prof Du Ronghui for their technical support and Prof Chen Xianxiang for guidance in interpretation of the results.