key: cord-1017142-y9waeq8g
authors: Patterson, Bruce K; Guevara-Coto, Jose; Yogendra, Ram; Francisco, Edgar; Long, Emily; Pise, Amruta; Rodrigues, Hallison; Parikh, Purvi; Mora, Javier; Mora-Rodríguez, Rodrigo A
title: Immune-Based Prediction of COVID-19 Severity and Chronicity Decoded Using Machine Learning
date: 2020-12-17
journal: bioRxiv
DOI: 10.1101/2020.12.16.423122
sha: 90f378b7542a98f763ee926a49dd9c98e23742c1
doc_id: 1017142
cord_uid: y9waeq8g

Individuals with systemic symptoms long after COVID-19 has cleared represent approximately ~10% of all COVID-19 infected individuals. Here we present a bioinformatics approach to predict and model the phases of COVID so that effective treatment strategies can be devised and monitored. We investigated 144 individuals including normal individuals and patients spanning the COVID-19 disease continuum. We collected plasma and isolated PBMCs from 29 normal individuals, 26 individuals with mild-moderate COVID-19, 25 individuals with severe COVID-19, and 64 individuals with Chronic COVID-19 symptoms. Immune subset profiling and a 14-plex cytokine panel were run on all patients. Data was analyzed using machine learning methods to predict and distinguish the groups from each other.Using a multi-class deep neural network classifier to better fit our prediction model, we recapitulated a 100% precision, 100% recall and F1 score of 1 on the test set. Moreover, a first score specific for the chronic COVID-19 patients was defined as S1 = (IFN-γ + IL-2)/ CCL4-MIP-1β. Second, a score specific for the severe COVID-19 patients was defined as S2 = (10*IL-10 + IL-6) - (IL-2 + IL-8). Severe cases are characterized by excessive inflammation and dysregulated T cell activation, recruitment, and counteracting activities. While chronic patients are characterized by a profile able to induce the activation of effector T cells with pro-inflammatory properties and the capacity of generating an effective immune response to eliminate the virus but without the proper recruitment signals to attract activated T cells. Summary Immunologic Modeling of Severity and Chronicity of COVID-19

Chronic COVID-19 is a group of previously infected individuals, so called "Long 92

Haulers", who experience a multitude of symptoms from several weeks to months after 93 recovering from their acute illness and presumably months after viral clearance. These 94 symptoms include joint pain, muscle aches, fatigue, "brain fog" and others. These 95 symptoms can commonly resemble rheumatic diseases such as rheumatoid arthritis, 96 autoimmune disorders, and others such as fibromyalgia and chronic fatigue syndrome 97

(1). Many of these common disorders are caused by inflammation, hyper-and/or auto-98 immunity and some such as chronic fatigue are associated with viral persistence after 99 an acute infection with pathogens such as Epstein Barr and Cytomegalovirus (2) . 100

Recent studies including those from our laboratory have suggested that (CC) may be 101 The deep neural network (DNN) classifier was constructed layers of neurons. Each 172 layer transformed the inputs inputs using the rectified linear activation function or ReLU. 173

The DNN model was constructed to have 1 input layer, 3 hidden layers with 10 neurons 174 each, followed by layer with 6 neurons. Finally, the output layer consists of 3 neuros, for 175 the outputs (classes) and the softmax (multi-class) or sigmoid (binary) function. This 176 architecture was used for the multi-class model and the binary models. 177

The results of the long hauler binary models, revealed differences of ~5% between the 179 metrics of the training and the test set (Table 3) . Such difference is not significant to 180 attribute overfitting to the training set. In contrast, the severe binary model had 181 significant differences between the performance metrics of the training and the test set 182 (Table 3) . This is evident in the precision score, with 98% in the training set and 75% on 183 the test set, and thus the F1 score with a difference of 20% (0.99 on the training set and 184 0.79 on the test set). A potential explanation could be that the severe class has a limited 185 number of data points, but our random forest classifier for the severe class perfumed 186 well. These results suggest that the best approach is a multi-class predictor. 187

188

The multi-class DNN implemented using the full feature set had good metrics (Table 3) . 190

The precision, recall and F1 score of 100%, 100% and 1.00 in the test split. This 191 indicates that the model is not overfitting, and validating our notion that this would 192 generalize better than the binary models. The model's performance is supported by its 193 confusion matrix (true class vs predicted) where it is possible to determine how well it 194 can predict the three classes (Figure 3) . 195

The potential of a DNN classifier is that it adjusts multiple parameters transform the 196 inputs into outputs. This is very important because the vast number of parameters 197 allows for the model to better identify hidden signals in the data. Also, DNN require 198 hyperparameter tuning, such as learning rate, number of hidden layers and neurons per 199 hidden layer, as well as the optimizer and activation function, which affect the 200 performance of the model. By adjusting these hyperparameters and castrating a model 201 capable of finding the hidden relationships in the data we were able to achieve such 202 high results and construct a predictive multi-class system. 203

The results of the DNN indicated that the multi-class had the highest performance. 205

Based on this, we constructed a DNN using the 6 most important features identified by 206 the random forest variable importance. This model was known as minimal DNN or 207 mDNN. This model was constructed using the same architecture as the full feature set 208 DNN. This model's performance in the training set and the test set (Table 4) Moreover, we simplified our prediction model by feature engineering of two classification 217 scores based on the top informative features. First, a "Long Hauler Score" was defined 218 as S1 = (IFN-γ + IL-2 ) / CCL4-MIP-1β. Second, "Severe Score" was defined as S2 = 219 (10*IL-10 + IL-6) -(IL-2 + IL8). Using a combined heuristic to first classify the Long 220

Haulers (S1>0.4) and second the severe COVID-19 patients (S2>0), we obtained a sensitivity of 97% for Long Haulers with a 100% specificity and a sensitivity of 88% for 222 severe patients with a specificity of 96% ( Figure 4B) . The immune response to SARS-Cov2 induces a release of different molecules with 234 inflammatory properties such as cytokines and chemokines. This event, known as 235 cytokine storm, is an immunopathological feature of COVID-19 and it has been 236 associated with the severity of the disease. The increase in blood concentrations of 237 different cytokines and chemokines such as IL-6, IL-8, IL-10, TNF-α, IL-1β, IL-2, IP-10, 238 MCP-1, CCL3, CCL4, and CCL5 has been described for COVID-19 patients (5). Some 239 of these molecules have been proposed as biomarkers to monitor the clinical evolution 240 and to determine treatment selection for COVID-19 patients. Nevertheless, it is 241 important to consider that some of these molecules function in a context dependent 242 manner, therefore the clinical relevance of analyzing single cytokine changes is limited.

One of the most important challenges during the pandemics is to avoid the saturation of 244 the health systems, therefore the determination of predictive biomarkers that allow a 245 better stratification of the patients is paramount. Even though cytokines such as IL-6 246 and IL-8 have been proposed as indicators of the disease severity, and in some studies 247 they were strong and independent predictors of patient survival (6), their predictive 248 value when analyzed alone is debatable (7). The generation of scores considering blood 249 levels of cytokines and chemokines with different immunological functions incorporates 250 the importance of the context-dependent function of these molecules. 251

In order to predict severe cases, a score was generated considering IL-10, IL-6, IL-2, 252

and IL-8 blood concentrations. In this classification, severe cases are characterized by 253 high IL-6 and IL-10 levels, both cytokines previously attributed to increase the 254 immunopathogenesis of COVID-19 and predictive value in severe cases (6, 8). In 255 different settings, IL-6 has been associated with oxidative stress, inflammation, 256 endothelial dysfunction, and thrombogenesis (9-12) which are characteristic features of 257 severe COVID-19 cases caused by excessive myeloid cell activation (13). Consistently, 258

increased IL-10 levels interfere with appropriate T-cell responses, inducing T-cell 259 exhaustion and regulatory T cell polarization leading to an evasion of the antiviral 260 immune response (14). Furthermore, besides its anti-inflammatory function on T cells, in 261 some settings IL-10 induces STAT1 activation and a pro-inflammatory response in type 262 I IFN-primed myeloid cells (15, 16) . Therefore, elevated levels of IL-6 and IL-10 promote 263 myeloid cell activation, oxidative stress, endothelial damage, and dampens adequate T 264 cell activation. Additionally, to strengthen the classification, the score presented here, differentiates the severe cases by the subtraction of IL-2 and IL-8, which are cytokines 266 related to proper T cell activation (IL-2) and recruitment (IL-8). 267

According to the score generated for distinguishing LH, these patients are characterized 268 by an increased IFN-γ and IL-2 and a reduced CCL4 production. In the context of a viral Regarding the immunopathological effects of LH immune profile, using murine models it 307 has been shown that high IFN-γ levels could affect the kinetics of the resolution of 308 inflammation-induced lung injury as well as thrombus resolution (23, 24), which could be 309 related to long-lasting symptoms of LH associated to pulmonary coagulopathy and For the construction of both models, t is required to separate the targets to reflect the 416 dosing question: can a predictor discriminate between the Severe, Long Hauler and 417

Other Sates. 418

To build the models that answer this question, we grouped the M-M and Normal labels 419 in a new class which was distinct form the Severe and Long-Hauler states. We then 420 proceeded to apply filters based on the task (binary or multi-class classification). For the 421

Severe binary predictor, we conditioned the targets to be exactly Severe or else they 422 were assigned to Not-Severe. This same task was done for Long-Haulers, were either 423 an instance label was exactly labelled Long-Hauler or else it would be assigned to the is connected to the previous one. Each layer transformed the inputs inputs using the 478 rectified linear activation function or ReLU. The DNN models were constructed to have 1 input layer, 3 hidden layers with 10 neurons each, followed by layer with 6 neurons. 480

Finally, the output layer consists of 3 neurons, for the outputs (classes) and the softmax 481 (multi-class) or sigmoid (binary) function. 482

In order for a DNN to generate the best possible predictions, we minimized the loss 483 function or error of the model using the ADAM optimizer to search for the optimal 484 combination of hyperparameters. When setting the optimizer, we defined the learning 485 rate to 1e-3. The loss function was set to categorical cross entropy because the targets 486 are one-hot encoded. 

Zhao 492 Inflammatory responses and inflammation-associated diseases in organs

Chronic viral infections in myalgic 495 encephalomyelitis/chronic fatigue syndrome (ME/CFS)

SARS-CoV-2 viral RNA shedding for more 498 than 87 days in an individual with an impaired CD8+ T-cell response

HCV-Infected, Monocyte Lineage Reservoirs Differ in 501 Individuals with or without HIV Co-Infection

CCR5 inhibition in Critical COVID-19 Patients 507

Decreases Inflammatory Cytokines, Increases CD8 T-Cells, and Decreases SARS-508

CoV2 RNA in Plasma by Day 14

An 514 inflammatory cytokine signature predicts COVID-19 severity and survival

Biosensors for Managing the COVID-19 Cytokine Storm: Challenges Ahead

IL-6 and IL-10 as redictors of 523 disease severity in COVID 19 patients: Results from Meta-analysis and Regression

Roles 527 of IL-6-gp130 Signaling in Vascular Inflammation

Interaction of IL-6 and 531 TNF-α contributes to endothelial dysfunction in type 2 diabetic mouse hearts

Interleukin-6, endothelial activation and thrombogenesis in chronic atrial fibrillation

The Role of Cytokines 544 including Interleukin-6 in COVID-19 induced Pneumonia and Macrophage Activation 545 Syndrome-Like Disease

IL-10: A Multifunctional Cytokine in 549

Viral Infections

Pro-Inflammatory Signaling by IL-10 and IL-22: Bad Habit Stirred Up by 552 Interferons? Front Immunol, 4, 18

IFN-alpha priming results in a gain of proinflammatory function by IL-10: 556 implications for systemic lupus erythematosus pathogenesis

CTL-vs Treg lymphocyte-attracting 561 chemokines, CCL4 and CCL20, are strong reciprocal predictive markers for survival of 562 patients with oesophageal squamous cell carcinoma

The differential immune responses to COVID-19 in 571 peripheral and lung revealed by single-cell RNA sequencing

CTLA-4) determines CD4 T cell migration in vitro and in 576 vivo

Migration of Th1 579 lymphocytes is regulated by CD152 (CTLA-4)-mediated signaling via PI3 kinase-580 dependent Akt activation

High-dimensional single-cell analysis reveals the immune 585 characteristics of COVID-19

Effects of IFN-γ on immune cell kinetics during the resolution of acute lung 590 injury

597 Variants of CCR5, which are permissive for HIV-1 infection, show distinct functional 598 responses to CCL3, CCL4 and CCL5

Chemokine receptor gene polymorphisms and COVID-19: Could 601 knowledge gained from HIV/AIDS be important?

A guide to chemokines and their receptors

Highly potent HIV 609 inhibition: engineering a key anti-HIV structure from PSC-RANTES into MIP-1 β/CCL4

A pharmacological interactome 615 between COVID-19 patient samples and human sensory neurons reveals potential 616 drivers of neurogenic pulmonary dysfunction

Up-621 regulation of CCR1 and CCR3 and induction of chemotaxis to CC chemokines by IFN-γ 622 in human neutrophils

Acknowledgments: The authors would like to acknowledge the work of Christine Meda 628 in coordinating the study and interacting with the patients, 629 630 Funding: None 631 632 Author contributions: 633 R.Y. organized the clinical study and actively recruited patients

wrote the draft of the manuscript and all authors 637 contributed to revising the manuscript prior to submission

All requests for materials and data should be addressed to the corresponding author 645