key: cord-0115086-3wob432t authors: Ma, Liantao; Ma, Xinyu; Gao, Junyi; Zhang, Chaohe; Yu, Zhihao; Jiao, Xianfeng; Ruan, Wenjie; Wang, Yasha; Tang, Wen; Wang, Jiangtao title: CovidCare: Transferring Knowledge from Existing EMR to Emerging Epidemic for Interpretable Prognosis date: 2020-07-17 journal: nan DOI: nan sha: 7148d0cd08bd528a0728b2df270a0baedc7ea254 doc_id: 115086 cord_uid: 3wob432t Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from systemic life-threatening problems and need to be carefully monitored in ICUs. Thus the intelligent prognosis is in an urgent need to assist physicians to take an early intervention, prevent the adverse outcome, and optimize the medical resource allocation. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, rarity of the cases, and privacy concerns. In this paper, we propose a deep-learning-based approach, CovidCare, which leverages the existing electronic medical records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data via transfer learning. The transferred parameters are further trained to imitate the teacher model's representation behavior based on knowledge distillation, which embeds the health status more comprehensively in the source dataset. We conduct the length of stay prediction experiments for patients on a real-world COVID-19 dataset. The experiment results indicate that our proposed model consistently outperforms the comparative baseline methods. CovidCare also reveals that, 1) hs-cTnI, hs-CRP and Platelet Counts are the most fatal biomarkers, whose abnormal values usually indicate emergency adverse outcome. 2) Normal values of gamma-GT, AP and eGFR indicate the overall improvement of health. The medical findings extracted by CovidCare are empirically confirmed by human experts and medical literatures. The whole world is now facing the unprecedented crisis brought by COVID-19. The exponential growth of COVID-19 patients has brought massive pressure on the health systems tragically, such as overwhelming the national health service and exhausting the intensive care units (ICUs). It is crucially essential to personalize prognosis for the individual patient by considering her/his specific health condition to enable a timely and early medical intervention, as shown in Figure 1 . The accurate prediction of the remaining length-of-stay for inpatients is critical for scheduling and optimizing limited hospital resources [14] . However, for newly-emerged infectious diseases (e.g., COVID-19, SARS) and rare diseases, the prognosis performed by human physicians may not meet the clinical demand, especially in rural areas and developing countries. Moreover, the precise risk prediction requires a high level of clinical expertise and experience [28] . However, the accumulation of clinical experience is time-consuming and difficult at the early outbreak of the new emerging infectious disease (EID). Thus it is difficult for human physicians to comprehensively evaluate the health of patients and accurately identify the key factors, especially in a situation where the deterioration of some EIDs in the early stage is usually not evident [20] . So during treating COVID-19, it is not so rare that physicians omit the ominous signs and miss the chance of early intervention, especially when the clinical resources are insufficient. As a result, intelligent prognosis is in an urgent need against EID and rare diseases. It not only can assist physicians to perform early diagnosis, select personalized treatments, and prevent adverse outcomes, but also optimize the allocation of medical resources and reduce the medical cost [37] . Recently, many deep learning-based models have been developed to enable intelligent prognosis by analyzing electronic medical records (EMRs), including mortality prediction [24, 25] , disease diagnosis prediction [22] , and patient phenotype identification [1] . To enrich the feature extraction and health status representation, most existing research works utilize sophisticated modules to extract health status representations that require a large amount of labeled training data. However, the quantity of labeled clinical data available for prognosis may not be ideal in practice in the early stage of the EID outbreak [16] , due to the following reasons. 1) The precise diagnostic mechanism has not been established in the early outbreak. Before introducing the nucleic acid detection mechanism, it is difficult to confirm whether a patient is really infected with COVID-19. For example, there are only 41 patients diagnosed with COVID-19 due to the lack of valid testing methods at the early outbreak in Wuhan, [16] . 2) The disease is still progressing for patients. The collection of enough outcomes needs to take a long time. 3) There are serious privacy concerns about electronic medical records, so the data-sharing mechanism for EID across multiple hospitals worldwide usually cannot be established timely. Therefore, the scarce labeled data will decrease the performance of deep learning models due to the potential over-fitting. Recently some researchers try to exploit additional information to deal with the scarcity of clinical data. On one hand, some researches encode ontology resources and structured relationships among medical codes (e.g., diagnosis of diabetes) in the network to enhance the representation learning. For example, GRAM [5] and KAME [23] introduce the external well-organized ontology information (e.g., International Classification of Diseases Codes) to represent the medical concept as a combination of its ancestors in the ontology via an attention mechanism. However, for the new EIDs like COVID-19, such relationship and ontology information are also difficult to acquire. On the other hand, some researchers try to make full use of the existing time series data through transfer learning. For instance, Doctor AI [4] and Gupta [12] train deep models at one hospital and transfer them to another hospital. However, these methods can only be adapted for the same tasks with similar clinical features between the source and target dataset. TimeNet [12] has been trained on different non-clinical time series datasets via an RNN autoencoder in an unsupervised manner to extract generic features for patient phenotyping. However, the extracted general-purpose features may not be suitable for a specific clinical task, leading to the underperformance of the model. Therefore, for the prognosis of EIDs with limited data, such a research challenge remains: How to make full use of the existing EMR data to learn the robust health status representation, when tackling tasks with different clinical feature sets? In this paper, we propose a novel healthcare predictive approach, CovidCare, based on transfer learning from existing EMR data (i.e., source dataset) to the new dataset (i.e., target dataset) with knowledge distillation. To improve the compatibility across source dataset and target dataset with different feature sets, CovidCare evaluates the health status of patients mainly from the perspective of clinical features rather than visits. The time series of each feature is embedded separately by GRUs. When training on the target dataset, the shared features of both datasets are specifically encoded by a pre-trained GRU. The model with private features trained on the source dataset is treated as a teacher network to guide the embedding behavior of the shared features. Doing so is able to further explore and leverage the information stored in the source dataset. Finally, feature-wise attention is deployed to abstract the biomarkers and adaptively identify the critical features for patients in diverse health conditions. In summary, CovidCare contributes to the community from the following aspects: • We propose a transfer-learning-based medical feature embedding approach, CovidCare, to perform clinical prediction for EIDs with limited data .Multi-channel architecture is developed to improve the compatibility across source and target datasets with different feature sets. By jointly optimizing the prediction loss and similarity loss, the student model with shared features is learned to imitate the teacher model's encoding behavior with full features on the source dataset. We further use the feature re-calibration module to provide the interpretability, which explicitly enhances high-risk features. • We conduct length-of-stay prediction experiments for inpatients with COVID-19. The results show that CovidCare significantly and consistently outperforms the baseline approaches for all evaluation metrics. • CovidCare can extract valuable medical findings for COVID-19: -Hypersensitive Cardiac Troponin, Hypersensitive C-Reactive Protein and Platelet Counts are the most fatal biomarkers, whose abnormal values usually indicate emergency adverse outcome. -Normal values of γ -Glutamyl Transpeptidase, Alkaline Phosphatase and estimated Glomerular Filtration Rate indicate the overall improvement of health. We invite medical practitioners to evaluate the extracted medical knowledge and prognosis cases. Human experts positively confirmed the clinical significance in the aspects of early prediction, key biomarkers extraction, and clinical resources management. • Beyond COVID-19, we also conduct mortality risk prediction for outpatients with end-stage renal diseases (ESRD) to verify the applicability of CovidCare to other diseases with limited EMR. The extensive experiments demonstrate that CovidCare can significantly benefit the prognosis for future pandemics and rare diseases. Outbreaks of the COVID-19 epidemic have been causing worldwide health concerns and was officially declared a pandemic by the World Health Organization (WHO) on March 11, 2020. Although the ultimate impact of COVID-19 is uncertain, it has significantly overwhelmed health care infrastructure. All emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and providers of essential community services [27] . Limited health-care resource availability will increase the chance of being infected while waiting for treatment and also the mortality rate [18] . This eventually leads to an increase in the severity of the pandemic. The rapidly growing imbalance between supply and demand for medical resources in many countries presents an inherent normative question: How can we make early and accurate risk prediction to allocate medical resources effectively during a pandemic? Many COVID-related researches focus on the severity of disease rather than the clinical outcome of mortality [8, 10, 36] . These studies answer key clinical questions on COVID-19 evolution and outcomes, as well as potential risk factors leading to hospital and ICU admission. However, they cannot make individualized risk predictions for patients. Recently, Li et al. [38] use machine learningbased methods such as decision tree to make risk prediction for COVID-19 patients. However, as discussed above, many challenges, such as data scarcity and model interpretability, have not been adequately addressed. To optimize patient care and appropriately deploy health care resources during this pandemic, effective and reliable early risk prediction is still an essential and urgent problem. With the prevalence of electronic healthcare information systems in various healthcare institutions, a large amount of Electronic Medical Records (EMR) have been accumulated over time [21, 32] . EMR is a type of multivariate time series data that records patients' visits in hospitals (e.g., diagnoses, lab tests, as shown in Figure 2 ). This provides essential healthcare information for the data-driven clinical status prediction. Deep learning-based models have shown the capability to perform mortality prediction [9, 11, 15, 34] , patients subtyping [1] , and diagnosis prediction [1, 6, 22, 26, 29, 31] . For most of the researches, extracting advanced clinical features and learning the compressed representation of the sparse EMR data are fundamental procedures of clinical healthcare prediction. EMR is longitudinally complex [7, 40] . Extracting the advanced clinical representation would introduce more parameters into the model, making the model more complex and hard to train. For EIDs and some rare diseases, the quantity of labeled data is much less, which can not support a model to be trained thoroughly. In order to deal with this issue, some researches try to introduce additional information about the data. On one hand, GRAM [5] and KAME [23] incorporate the external medical information (e.g., ontologies of the medical codes), which makes the model to be trained more sufficiently. They exploit medical knowledge in the whole prediction process by using a given medical ontology (i.e., knowledge graph), such as the International Classification of Diseases (ICD), to learn the representations of medical codes and obtain the embeddings of medical codes' ancestors. MIME [7] learns the multi-level embedding of data according to the knowledge about the inherent EMR structure (e.g., the multi-level relationship among medical codes). However, such external structured information and the extra knowledge about the data are often Groundtruth of LOS prediction at T -th admission on target dataset y T ,t ar Prediction result at T -th admission on source dataset y T ,sr c Groundtruth of prediction at T -th admission on source dataset y T ,sr c Prediction result at T -th admission on source dataset R sr c The whole source dataset R t ar The whole target dataset R t ar Target dataset (only included shared features with source dataset) Embedding of the i-th medical feature after self-attention s Overall representation of patient X t ea Model/Embeddings/Parameters used in Teacher model X stu Model/Embeddings/Parameters used in Student model X t ar Model/Embeddings/Parameters used in Target model not easy to be accessed or used in the clinical practice for EIDs. Ontology information is usually designed to handle the medical codes. It is not suitable for dealing with numerical lab tests, which also are essential clinical features to capture health status. On the other hand, some researchers try to explore the existing EMR data. Choi [4] empirically confirms that RNN models possess great potential for transfer learning across different medical institutions. Gupta [12] trains a deep RNN to identify several patient phenotypes on time series from MIMIC-III database, and then uses the features extracted using that RNN to build classifiers for identifying previously unseen phenotypes. However, these methods can only be utilized for the same tasks with the same clinical feature sets between source and target datasets. TimeNet [13] is pre-trained on non-medical time series in an unsupervised manner and further utilized to extract features for clinical prediction. Nevertheless, the extracted general-purpose features may not be suitable for exploring the specific clinical task, thus leading to limited performance. Many patients suffering from COVID-19 face severe life threats and need careful health monitoring and medical treatment in ICU. Typically, some biomarkers, such as Hypersensitive C-Reactive Protein, are recorded through the treatment trajectories, and further have been taken into consideration for the prognosis. Accurate prediction of health status can help with assessing the severity of illness; and determining the value of novel treatments, interventions, and health care policies [30] . Besides, due to the characteristics of COVID-19, large numbers of sick people appear for treatment during peak illness periods. Clinics and hospitals are overwhelmed. Predicting remaining time spent in ICU (i.e., length of stay) for admission is also vital for scheduling and hospital resource management. Below we define the data and task studied in this work and provide the list of notations used in CovidCare in Table 1 . Definition: Electronic Medical Records (EMR) Electronic Medical Records (EMR) data are routinely collected patient observations from hospitals through the clinical admissions, including discrete time-series data (e.g., medication, diagnosis) and continuous multivariate data (e.g., vital signs, laboratory measurements), as shown in Figure 2 . The admissions generating N features such as different lab test results denoted as r i ∈ R T (i = 1, 2, · · · , N ). Each medical feature contains T timesteps, as shown in Figure 2 a result, such a clinical sequence can be formulated as a âĂIJlongitudinal patient matrixâĂİ record, where one dimension represents medical features and the other denotes admission timestamps [21] . Problem: Length of stay prediction The prediction problem in this paper can be formulated as given N r historic EMR data of a patient, i.e., (r 1 , · · · , r N ), how to predict the patient's remaining time spent in ICU,ŷ (i.e., length of stay). We frame length of stay (LOS) prediction as a classification problem with 12 classes (discharging in 1/2/3/5/10/10+ days, suffering adverse outcome in 1/2/3/5/10/10+ days). Figure 3 shows the framework of the proposed CovidCare, which comprises of the following procedures. The whole model training process is shown in the Appendix, Algorithm 1. • Multivariate time series with all features are fed into the healthcare representation learning module as a teacher model to build the comprehensive embedding in the source dataset. • The student model in the source dataset learns to embed the proper health status based on features that are shared with the target dataset, by imitating the teacher model's embedding behavior. • The learned parameters of feature embedding are transferred to the healthcare representation learning model for the target dataset, and further fine-tuned to perform the task-specific prediction. In this subsection, we will introduce the patient health status embedding module based on ConCare [25] , which is a healthcare context representation learning method. There are three layers designed in this module, which consist of the feature extracting layer, self-attention layer, and prediction layer. We utilized the multichannel GRU mechanism in the feature extracting layer to capture different patterns of each medical feature individually, which is also designed to improve the scalability and compatibility of the feature-specific model transfer. Specifically, we apply N different GRUs to embed the N medical features. Each feature i can be described as a time series r i = (r i1 , r i2 , · · · , r iT ), and will be fed into the corresponding GRU i to generate feature embedding f i : f i = GRU i (r i1 , r i2 , · · · , r iT ). The feature embedding matrix F = (f 1 , f 2 , · · · , f N ) T . In the self-attention layer, we employ the multi-head self-attention mechanism to obtain information from the health context and better understand the correlations between medical features. This mechanism makes each feature adaptively interact with all other features, and combine information from the related ones according to self-attention weights. Mathematically, the self-attention weight matrix of head i: where where V i = F · W V i . And finally, the embedding matrix F * after feature interaction: We also utilize the attention mechanism to integrate embeddings of all features f * i into an overall representation of patient s, and interpret the importance of medical features at the same time. Eventually, in the prediction layer, we apply a full-connection layer to conduct corresponding prediction tasks, and we select cross-entropy as the loss term L pr ed . Firstly, we will introduce the feature-specific transfer learning mechanism. As we mentioned before, a small data volume of the dataset may restrict a deep learning model's training performance. If our target task only consists of relatively few data items, it is necessary to make use of the existing EMR dataset with a bigger data volume to help our model training. Because the information and pattern extracted from a bigger dataset are always more stable and general, which is also useful to our target model. Based on the patient health status embedding module introduced above, we conduct feature-specific transfer learning on feature extraction layer, since this layer mainly captures the general pattern of medical features, which is independent of patient cohorts and prediction tasks. Concretely, we transfer GRUs of shared features from the source model to the target model, so that we can make up for the shortcomings of small data volume by transfer obtained knowledge from a bigger dataset. However, we have not sufficiently extracted information from our source dataset since some private features remain unused, and it is obvious that unused features in the source dataset can also provide sufficient significant information. In other words, with a complete source dataset, we can capture correlations between features more sufficiently, and thus generating a more comprehensive representation of patients. Therefore, we proposed a knowledge distillation method to construct a more comprehensive transfer source model. We divide the source model into two parts, teacher model and student model. The Figure 3 : The CovidCare Framework student model is trained on the source dataset with only shared features (R sr c ) and will be transferred to the target model. While the teacher model is trained on the complete dataset with all features (R sr c ), but only auxiliary to student model and will not be transferred. Specifically, 'Knowledge distillation' exactly means we hope the student model could imitate the behavior of the teacher model to learn a more comprehensive representation of patients, just like the teacher model does. We design an additional loss term for this, which encourages the student model to restore the representation learned by the teacher model. In detail, we first train the teacher model to generate representation s t ea for every patient with loss term L t ea = L pr ed . We then train the student model, where the representation s stu should perform two functions, to predict corresponding task labels, and to imitate s t ea by a linear layer as much as possible. We use KLdivergence to calculate the similarity of the two representations. The loss of student model (L stu ) is described as the sum of two parts accordingly, L stu = L pr ed + L emb . And finally we tranfer GRUs from student model to target model, and fine tune the target model with target dataset(R t ar ) using loss term L t ar = L pr ed . We conduct the experiment that leveraging data from PhysioNet Source Dataset [33] to enhance the LOS prediction for COVID-19 [38] . Table 5 . Without loss of generality, we perform the length of stay prediction for patients at 10th admission in this paper. The distribution of days to the outcome for admissions are shown in Figure 5 . Medical features recorded in COVID-19 target dataset are listed in Table 6 . We take the PhysioNet Dataset [33] as the source dataset and pre-train the medical feature embedding based on the Sepsis prediction. This dataset is sourced from ICU patients in two separate U.S. hospital systems. These data were collected over the past decade with approval from the appropriate Table 2 . Medical features recorded in the PhysioNet source dataset are listed in Table 6 . Due to the limited amount of data, 5-fold cross-validation is employed on the prediction task. We do not separate independent test data. We assess the performance of multi-classification using the area under the receiver operating characteristic curve (AUROC-Micro/Macro), area under the precision-recall curve (AUPRC), the minimum of precision and sensitivity Min(Se, P+). Note that Micro calculates metrics globally by considering each element of the label indicator matrix as a label. Macro calculates metrics for each label and find their unweighted mean. This does not take label imbalance into account. We introduce several deep-learningbased models as our baseline approaches without additional labeled data or external ontology resources. • GRU is the basic Gated Recurrent Unit network. • MC-GRUs embeds the clinical feature via separate GRUs. • MC-GRUs t is pre-trained at the source dataset to obtain the parameters of corresponding GRUs. • ConCare (AAAI 2020) [25] embeds the feature sequences separately and uses the self-attention to model dynamic features and static baseline information. • TimeNet (IJCAI 2018) [13] maps variable-length clinical time series to fixed-dimensional feature vectors separately, and acts as an off-the-shelf feature extractor. It is pre-trained on the UCR time series Repository. • CovidCare stu is the proposed CovidCare without knowledge distillation from the teacher model. As is shown in Table 3 CovidCare consistently outperforms both transfer-based and non-transfer-based baselines, demonstrating its ability to learn a robust representation. CovidCare achieves 6.8% relative higher AUPRC and 1.5% higher min(Se, P+) compared to the best state-of-the-art models ConCare and TimeNet. Comparing all the methods with and without transfer mechanism, we can tell that utilizing preset knowledge from existing EMR can significantly promote the prediction performance of all models, indicating the effectiveness of the transfer learning mechanism. Moreover, we can see that CovidCare shows a higher performance than TimeNet. Both models employ feature-level transfer. However, our model CovidCare executes a more adaptive and reasonable transfer, which explains the superiority of our model's performance. The knowledge distillation mechanism makes sense as well. Compared to the reduced CovidCare stu model, CovidCare also achieves higher performance in terms of all metrics. This indicates that developing knowledge distillation based on leveraging existing EMR can enhance healthcare prediction. To quantitatively identify the reasonability of feature recalibration from an overall perspective and extract useful medical knowledge, we calculate the average importance weights of biomarkers for patients in diverse conditions. As shown in Figure 4 , some essential medical knowledge learned by CovidCare can be summarized. Many of them have been proved or mentioned in COVID-19 related medical literature. We also invite medical practitioners to evaluate the findings empirically. • Hypersensitive Cardiac Troponin has the most distinct difference between discharging and death cases, which means that CovidCare considers it as a significant mortality risk indicator. According to Chapman et al [2] , troponin is elevated in one in five patients who have (confirmed) COVID-19 and that the presence of elevated troponin in COVID-19 may be associated with higher mortality risk. COVID-19 patients with elevated troponin may also be more likely to require ventilation and develop acute respiratory distress syndrome. For these patients, if clinicians are reluctant to measure cardiac troponin, they may ignore the plethora of ischaemic and non-ischaemic causes of myocardial injury related to COVID-19, which leads to a higher mortality rate. • γ -Glutamyl Transpeptidase (GGT) and Alkaline Phosphatase are another two key features that CovidCare identified. They are significant biomarkers related to liver injury. According to a recent study published in the Lancet [39] , they are all related to COVID-19 patients' health status. Liver damage in mild cases of COVID-19 is often transient and can return to normal without any special treatment. However, when severe liver damage occurs, liver protective drugs have usually been given to such patients. In their case studies, GGT is elevated in 30 (54%) of 56 patients with COVID-19 during hospitalization in their center. They also mention that elevated alkaline phosphatase levels are observed in one (1Âů8%) of 56 patients with COVID-19 during hospitalization. • estimated Glomerular Filtration Rate (eGFR), Urea and Creatinie are kidney injury-related biomarkers, and their difference between live and death cases are also distinct. According to Cheng et al [3] , for on admission COVID-19 patients, Creatinie and Urea are elevated in 14.4% and 13.1% of the patients, respectively. eGFR < 60 ml/min per 1.73 m 2 is reported in 13.1% of patients. Compared with patients with normal Creatinie, those who entered the hospital with an elevated Creatinie are older and more severely ill. The incidence of in-hospital death in patients with elevated baseline serum creatinine is 33.7%, which is significantly higher than in those with normal baseline serum creatinine (13.2%). In order to further verify the generality of CovidCare, we also conduct an additional experiment on end-stage renal disease (ESRD) dataset. We take the ESRD dataset as the target dataset and perform the mortality prediction. Currently, many people are suffered from ESRD in the world [17, 35] . They face severe life threats and need lifelong treatments with periodic visits to the hospitals for multifarious tests (e.g., blood routine examination). The whole procedure needs a dynamic patient health risk prediction to help patients recover smoothly and prevent the adverse outcome, based on the medical records collected along with the visits. The core task of CovidCare is to learn the health status representation of the patient and perform the healthcare prediction. In this study, all ESRD patients who received therapy from January 1, 2006, to March 1, 2018, in a real-world hospital are included to form this dataset. During and after data collection and analysis, we did not identify individual participants as the patients' names, and they were replaced by patient ID. This study was approved by the Medical Scientific Research Ethical Committee. We drop the patients whose all entries of one feature are missing and select the observed features in more than 60% of patients' records. For missing values, we fill the missing front cells with the data backward to prevent future information leakage. If the patient's backward record is missing, we impute it with the first front observed record of the patient. The cleaned dataset consists of 662 patients and 13,108 visits. The statistics of the ESRD dataset are presented in Table 7 . Medical features recorded in the ESRD target dataset are listed in Table 8 . The mortality prediction task on ESRD datasets is defined as a binary classification task of predicting the death of a patient in one year. For the binary classification task, we assess performance using the area under the receiver operating characteristic curve (AU-ROC), area under the precision-recall curve (AUPRC), and the minimum of precision and sensitivity Min(Se, P+). According to Table 4 , CovidCare consistently outperforms other baseline approaches for all metrics. The experiment results verify the applicability of our proposed framework. CovidCare can not only predict LOS for new EID, but also perform mortality prediction for ESRD, which is the disease with limited EMR. In this paper, we propose a transfer learning-based prognosis solution, CovidCare, to perform the length of stay prediction for patients with COVID-19. In order to embed the medical features robustly, the model is trained to imitate the teacher model's medical embedding behavior via knowledge distillation. The experimental results on the real-world COVID-19 dataset show that CovidCare consistently outperforms several competitive baseline methods. More importantly, CovidCare identifies several key indicators (e.g., hs-cTnI, hs-CRP and Platelet Counts) for patients with critical conditions, and those abnormal values indicate potential emergent adverse outcomes. CovidCare also reveals that normal values of γ -GT, AP and eGFR indicate the overall improvement of health and a possible early-discharging. The medical findings extracted by CovidCare are empirically validated and confirmed by human experts and medical literature. We believe the proposed model, CovidCare, will significantly benefit the intelligent prognosis for tackling future emerging infectious diseases such as COVID-19. Computeŷ T ,sr c , s t ea = CovidCare t ea (R sr c ) 4: Compute L pr ed = CE(ŷ T ,sr c , y T ,sr c ) Compute L t ea = L pr ed 6: Update parameters of CovidCare t ea by optimizing L t ea using back-propagation 7: end while 8: Randomly initializing parameters in Student Model CovidCare stu 9: while not convergence do: 10: Computeŷ T ,sr c ,ŝ t ea = CovidCare stu (R sr c ) 11: Compute L pr ed = CE(ŷ T ,sr c , y T ,sr c ), L emb = D K L (Softmax(ŝ t ea )||Softmax(s t ea )) 12: Compute L stu = L pr ed + L emb 13: Update parameters of CovidCare stu by optimizing L stu using back-propagation 14: end while 15: Transfer parameters of shared GRUs from CovidCare stu to Target Model CovidCare t ar , and randomly initializing other parameters in CovidCare t ar 16: while not convergence do: 17: Computeŷ T ,t ar = CovidCare stu (R t ar ) 18: Compute L pr ed = CE(ŷ T ,t ar , y T ,t ar ) 19: Compute L t ar = L pr ed 20: Update parameters of CovidCare t ar by optimizing L t ar using back-propagation 21: end while The experiment environment is a machine equipped with CPU: Intel Xeon E5-2630, 256GB RAM, and GPU: Nvidia RTX8000. The code is implemented based on Pytorch 1.5.0. To train the model, we use Adam [19] with the batch size of 256, and the learning rate is set to 1e −3. To fairly compare different approaches, the hyper-parameters of the baseline models are fine-tuned by the grid-searching strategy. Patient subtyping via time-aware LSTM networks High-sensitivity cardiac troponin can be an ally in the fight against COVID-19 Kidney disease is associated with in-hospital death of patients with COVID-19 Doctor AI: Predicting Clinical Events via Recurrent Neural Networks GRAM: graph-based attention model for healthcare representation learning Retain: An interpretable predictive model for healthcare using reverse time attention mechanism Mime: Multilevel medical embedding of electronic health records for predictive healthcare Prevalence of underlying diseases in hospitalized patients with COVID-19: a systematic review and meta-analysis. Archives of academic emergency medicine Predicting clinical events by combining static and dynamic information using recurrent neural networks Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: a systematic review and meta-analysis StageNet: Stage-Aware Neural Networks for Health Risk Prediction Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks Using Features from Pre-trained TimeNet for Clinical Predictions Multitask learning and benchmarking with clinical time series data Uncertainty-aware attention for reliable interpretation and prediction Clinical features of patients infected with 2019 novel coronavirus in Wuhan Fibroblast growth factor 23 and risks of mortality and end-stage renal disease in patients with chronic kidney disease Potential association between COVID-19 mortality and health-care resource availability Adam: A method for stochastic optimization The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application Big healthcare data analytics: Challenges and applications Diagnosis Prediction via Medical Context Attention Networks Using Deep Generative Modeling Kame: Knowledge-based attention model for diagnosis prediction in healthcare AdaCare: AdaCare: Explainable Clinical Health Status Representation Learning via Scale-Adaptive Feature Extraction and Recalibration ConCare: Personalized Clinical Feature Embedding via Capturing the Healthcare Context Health-ATM: A Deep Architecture for Multifaceted Patient Health Record Representation and Risk Prediction Pandemic influenza plan A prospective study of mortality associated with anaesthesia and surgery: risk indicators of mortality in hospital Deepcare: A deep dynamic memory model for predictive medicine Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets Pairwise-Ranking based Collaborative Recurrent Neural Networks for Clinical Event Prediction Healthcare data analytics Early prediction of sepsis from clinical data: the PhysioNet/Computing in Cardiology Challenge Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU Determining factors that predict technique survival on peritoneal dialysis: application of regression and artificial neural network methods Comorbid Chronic Diseases and Acute Organ Injuries Are Strongly Correlated with Disease Severity and Mortality among COVID-19 Patients: A Systemic Review and Meta-Analysis Locally informed simulation to predict hospital capacity needs during the COVID-19 pandemic An interpretable mortality prediction model for COVID-19 patients Liver injury in COVID-19: management and challenges Resolving the bias in electronic medical records