key: cord-0264013-7tb82xtm authors: Murata, G. H.; Murata, A. E.; Campbell, H. M.; Mao, J. T. title: BASELINE METABOLIC PROFILING AND RISK OF DEATH FROM COVID-19 date: 2022-01-25 journal: nan DOI: 10.1101/2022.01.22.22269691 sha: a7856bb46e25b75c3152b9384bdd62d1f21476b0 doc_id: 264013 cord_uid: 7tb82xtm Objective: To derive a predicted probability of death (PDeathLabs) based upon complete value sets for 11 clinical measurements (CM) obtained on patients prior to their diagnosis of COVID-19. PDeathLabs is intended for use as a summary metric for baseline metabolic status in multivariate models for COVID-19 death. Methods: Cases were identified through the COVID-19 Shared Data Resource (CSDR) of the Department of Veterans Affairs. The diagnosis required at least one positive nucleic acid amplification test (NAAT). The primary outcome was death within 60 days of the first positive test. We retrieved all values for systolic blood pressure (SBP), diastolic blood pressure (DBP), O2 saturation (O2SAT), body mass index (BMI), estimated glomerular filtration rate (EGFR), alanine aminotransferase (ALT), serum albumin (ALB), hematocrit (HCT), LDL cholesterol (LDL) hemoglobin A1c (A1C), and HDL cholesterol (HDL) if they were done at least 14 days prior to the NAAT. Clinicians evaluate several attributes of CM that are of critical importance: metabolic control, disease burden, chronicity, refractoriness, tendency to relapse, temporal trends, and lability. We derived 1-3 parameters for each of these attributes: the most recent value (metabolic control); time-weighted average and abnormal area under a severity versus time curve (disease burden); time and number of readings above or below goal (chronicity); longest abnormal cluster and time/number of consecutive readings above goal if the last value was abnormal (refractoriness); number of abnormal clusters (tendency to relapse); long- and short-term changes (temporal trends); and coefficient of variation and mean deviation between consecutive readings (lability). We created computer programs to derive cumulative values for these 13 parameters for all 11 CM as each new value is added. A fitted logistic model was developed for each CM to determine which of the 13 parameters contributed to the risk of death. A main logistic model was developed to determine which of the 13 * 11 = 143 metabolic parameters were independently predictive of death. The resulting model was used to derive PDeathLabs for each patient and the area under its receiver operating characteristic (ROC) curve calculated. Single variable logistic models were also derived for age at diagnosis, the Charlson 2-year (Charl2Yr) and lifetime (CharlEver) scores, and the Elixhauser 2-year (Elix2Yrs) and lifetime (ElixEver) scores. Stata was used to compare the ROCs for PDeathDx and each of the other metrics. Results: On September 30, 2021 there were 347,220 COVID-19 patients in the CSDR. 329,491 (94.9%) patients had CM performed at least 14 days prior to the COVID-19 diagnosis and form the basis for this report. 17,934 (5.44%) died within 60 days of the diagnosis. On the subset regressions, the number of significant parameters ranged from all 13 for SBP to 7 for HDL. 239,393 patients had complete sets of data for developing the main model. Of 143 candidate predictors, 49 parameters were identified as statistically significant, independent predictors of death. The most influential domains were the most recent value, disease burden, temporal trends, and tendency to relapse. The ROC area for PDeathLabs was 0.785 +/- 0.002. No difference was found in the ROC areas of PDeathLabs and age at diagnosis (0.783 +/- 0.002; P = NS). However, the ROC area for PDeathLabs was significantly greater than that of Charl2Yrs (0.704 +/- 0.002; P < 0.001), CharlEver (0.729 +/- 0.002; P < 0.001), Elix2Yrs (0.675 +/- 0.002; P < 0.001), and ElixEver (0.707 +/- 0.002; P < 0.001). A poor prognosis was found for chronic systolic hypertension. On the other hand, a higher BMI was protective once SBP, DBP, HDL, LDL and A1C were considered. Conclusions: Our study confirms that parameters derived for 11 CM are significant determinants of COVID-19 death. The most recent value should not be selected over other parameters for multivariate modeling unless there is a physiologic basis for doing so. PDeathLabs has the same discriminating power as age at diagnosis and outperforms comorbidity indices as a summary metric for pre-existing conditions. If validated by others, this approach provides a robust approach to handling CM in multivariate models. infection (1) (2) (3) (4) (5) . They have gained popularity because they are very useful for managing patients and allocating scarce resources. Most are composed of demographic traits, co-morbidity scores, and pre-existing conditions grouped under headings like "malignancy". Clinical measurements (CM) have not been prominent features of these models. CM provide valuable information about the metabolic status of patients at baseline and include vital signs and routine laboratory tests. Some models have none (1, 5) , while others include only one (usually the most recent) of the hundreds of measurements that might be available for each type (2) (3) (4) . However, there are convincing arguments to expand the use of CM in these prediction rules: a) A diagnosis informs us that the patient has a certain disease, while CM tell us the burden imposed by its risk factors or the disease itself. The two are not synonymous. For example, one patient with hypertension might have had severely elevated blood pressures (BPs) for decades, while another might be normotensive because of successful treatment. The first would be considered at high risk while the second is "normal". On the other hand, a patient may not have been given the diagnosis of hypertension even though her BPs are abnormal for a term pregnancy. Classification errors such as these are very common in clinical practice. b) CM provide some insight into the mechanism of injury. For example, suppose chronic hypoxia is more closely associated with death than acute hypoxia. This observation suggests that cumulative injury over time (such as with secondary pulmonary hypertension) is more influential than the acute metabolic burden imposed by low oxygen levels at presentation. c) CM add explanatory variables to a prediction model. These components are critical for identifying targets for interventions to reduce the risk. For example, causality might be inferred by a dose-response effect between the CM and risk of the death. The model for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.22.22269691 doi: medRxiv preprint becomes "hypothesis-generating" and could lead to other studies confirming the mechanism of injury or clinical trials to reduce the risk. Prediction rules rarely define causality in a way that points to a further course of action. CM are routinely included in periodic health examinations and available for testing in prediction models. They include systolic (SBP) and diastolic blood pressure (DBP), oxygen level (O2Sat), body mass index (BMI), estimated glomerular filtration rate (EGFR), alanine aminotransferase (ALT), serum albumin (ALB), hematocrit (HCT), hemoglobin A1c (A1C), and low-(LDL) and high-density lipoprotein cholesterol (HDL). The major problem is the lack of a systematic approach to selecting the appropriate parameter for each type. There is no evidence that the most recent value of any type is the most influential measurement even though it is common practice to include them in models as a representative value. For example, if microvascular disease mediates the risk posed by diabetes, the most recent A1c is the least suitable measure of tissue glycation. The purpose of this manuscript is to validate a method for summarizing the effects of hundreds of measurements done on each patient for each CM and to evaluate the effects of these parameters on prognosis. Our approach emulates the way that clinicians interpret CM to reach conclusions about the patient's metabolic status. Unlike prediction rules, they rarely interpret individual values for CM out of context. In fact, conclusions are often based upon a review of neighboring values or even other domains of the medical record. Moreover, clinicians often render judgements about several attributes when assessing a CM. These attributes include metabolic control, chronicity, disease burden, temporal changes, refractoriness, tendency to relapse, and lability. Each concept is independent of the others and has clinical implications of its own. For example, a diabetic patient may have an A1C that meets a metabolic target, but the entire value set could represent a substantial glycemic burden, indicate a tendency to relapse, and demonstrate a steep, upward trend. In our study, a parameter is a variable synthesized from the complete value set for a CM that represents one attribute for that CM. To reach conclusions about for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.22.22269691 doi: medRxiv preprint metabolic status requires extensive processing of all data for a CM, deriving parameters for each attribute, and testing the association between derived parameters and COVID-19 death. Our computer programs not only derive these parameters but also generate a running summary whenever a new value is added to the set. Thus, the patient's status relative to any CM can be assessed in detail at any point in time. Cases were identified through VA's COVID-19 Shared Data Resource (CSDR). Membership in this registry requires at least one positive nucleic acid amplification test. The primary outcome was death within 60 days of the diagnosis. The outcome was retrieved from the CSDR, which assigns a 1 to those who died and 0 otherwise. Data domains in the Corporate Data Warehouse (CDW) were interrogated for all measurements entered ≥ 14 days before the diagnosis of COVID-19. This precaution excludes readings that may have been taken during the pre-symptomatic phases of illness. We retrieved SBP, DBP, O2Sat, heights and weights from the file Vital.VitalSign. The latter were used to calculate BMI. We also retrieved EGFR, ALB, HCT, ALT, A1C, HDL and LDL from the file Chem.PatientLabChem. The following parameters were derived for each of the 11 CMs (above): a) Metabolic control requires that the measurement be compared to an external standard; that both be standardized to a common scale; that the value is timely, that it has not already been treated, that enough time has elapsed since the last treatment change to reach a plateau value; and that it is not rapidly increasing or decreasing. Metabolic control is reflected in the most recent value (Value1). Our programs provide the option of raw or standardized values and require the user to specify a treatment target. They can be modified to exclude remote values but, for now, include all Value1s in the medical record regardless of their timeliness. No attempt was made to assess prior treatments. Temporal changes are handled by another module. b). Chronicity is defined as the total number of days or measurements above target. Each reading was paired with the preceding one. Days between successive abnormal readings were for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.22.22269691 doi: medRxiv preprint considered above goal; days between normal readings as at goal; days between a normal and abnormal reading as worsening; and days between an abnormal and normal reading as improving. Days above goal were summarized across all successive pairs in the patient's record (FUDaysAboveGoal). NumAboveGoal is the total number of measurements exceeding the target value in the patient's record. c) Disease burden is the degree to which an abnormality has caused end-organ damage. For chronic diseases, it is almost always a function of severity and exposure time and expressed as the area under the severity * time curve (AUC). This concept applies to SBP, DBP, BMI, A1c, LDL and LDL. Our software uses a simple trapezoidal estimation technique without interpolation, extrapolation, or curve-fitting and includes all readings from the first to last value. AUC can be large just because the patient is elderly. To eliminate this possibility, our software derives AUC only above the target specified by the user (AbnAUC). Time-weighted averaging eliminates the sampling bias that arises from a tendency to measure a CM more frequently when it is abnormal. The raw average is an over-estimate of severity when abnormal readings are more closely spaced than normal ones. TimeWtAvg is defined as: AUC/FUDays d) Refractoriness is the repeated failure to achieve goal and is represented by consecutive abnormal values (a "cluster"). Each cluster begins with a change from normal to abnormal value and terminates with the next normal value. The assumption is that an abnormal reading should have triggered an intervention that normalized the measurement. Refractoriness is worse when there are many readings in the cluster, or its duration is long. It can result from the failure to treat, failure of the patient to adhere, failure of the disease to respond, or intolerable side-effects. Our program defines the number of clusters for each patient (NumCluster) and a start date, consecutive days above goal (TimeAboveGoal), and consecutive readings above goal (CtAboveGoal) for each cluster. The least refractory episode has a CtAboveGoal of one, and refractoriness increases as CtAboveGoal increases. TimeAboveGoal and CtAboveGoal for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Lag3Dev. These values were extracted for each patient from the running summary associated with the most recent value. The default was to use all available readings in the medical record so that metabolic status was defined over the patient's lifetime. Because there are 4 vital signs and 7 laboratory tests of interest, the list of covariates consists of 13 * 11 = 143 items. Group differences in categorical variables were tested by chisquare analysis. Group differences in continuous variables were analyzed by student's t-test and Mann-Whitney U-test. Because there were so many comparisons, univariate analysis is not included in this report but is available upon request. For each CM, a subset logistic regression was done to identify which of the 13 parameters was predictive of death. A model was fitted to all patients with complete data. A variable was considered significant if the P-value associated with the coefficient was < 0.05. The subset model was then used to assign a predicted probability of death to each patient. Receiver operating characteristic curve (ROC) analysis was done to determine if that predicted probability discriminated between those who live and died. All 143 variables were used to generate a main predictive model. Set-wise logistic regression was applied to all patients with a complete data set. The dependent variable was death within 60 days of diagnosis. Modeling started with the set of most recent values. Sets related to different criteria were added in succession, and previous variables were removed if they became insignificant. A variable was considered significant if the P-value associated with its coefficient was < 0.05. The final model was used to assign a predicted probability of death for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Subset regressions were done for each CM to identify which of the 13 parameters were determinants of the outcome (Appendix). The model did not converge for ALT. As expected, single CM did not distinguish between surviving and dying patients very well, although some ROCs were close to those of co-morbidity scores. Nevertheless, the number of significant parameters ranged from all 13 for SBP to 7 for HDL. Value1 was significant for 7 CM, FUDaysAboveGoal for 9, NumAboveGoal for 9, AbnAUC for 9, TimeWtAvg for 9, NumClust for 10, TimeAboveGoal for 6, CtAboveGoal for 7, MaxClustDays for 5, CoeffVar for 10, MeanValDiff for 10, NetChange for 9, and Lag3Dev for 7. 239,393 patients (or 70.5% of the cohort) had complete sets of data for developing the main model. Table 1 shows the components of this model. Of 143 candidate predictors, 49 parameters were identified as statistically significant and independent predictors of death. The most influential domains were the most recent value, disease burden, temporal trends, and tendency to relapse. The main model was used to calculate a predicted probability of death (PDeathLabs) for each subject. The ROC area for PDeathLabs was 0.785 ± 0.002. For comparison, single factor logistic models were developed for age at diagnosis, the 2-year for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Charleson Co-morbidity Index score (Charl2Yrs), lifetime Charleson score (CharlEver), the Elixhauser 2-year score (Elix2Yrs), and lifetime Elixhauser score (ElixEver). Their predicted probabilities were used to derive an ROC area for each. No difference was found in the ROC areas of PDeathLabs and age at diagnosis (0.783 ± 0.002; P > 0.05). However, the ROC area for PDeathLabs was significantly greater than that of Charl2Yrs (0.704 ± 0.002; P < 0.001), CharlEver (0.729 ± 0.002; P < 0.001), Elix2Yrs (0.675 ± 0.002; P < 0.001), and ElixEver (0.707 ± 0.002; P < 0.001). Thus, baseline metabolic measurements outperform co-morbidity scores for pre-existing conditions in predicting COVID-19 deaths. In this manuscript, we propose a novel method for handling pre-existing findings from vital signs and laboratory tests in models predicting COVID-19 death. Our observations confirm that they should be included because they have independent prognostic significance, generate insights into the mechanism of action, and may even be targets for interventions to mitigate the risk. By including explanatory variables, models can become hypothesis-generating and lead to future studies validating the suspected pathogenesis or trials of interventions targeting the abnormality itself. All 13 parameters were identified as predictors of death in the subset models. These observations provide some justification to the approach that clinicians use to evaluate metabolic status. It is comforting to know that our parameters and the approach used in clinical practice converge upon a common set of criteria. Our main model showed that many parameters besides the most recent value were significant contributors to COVID-19 death. For 2 CM, disease burden contributed to the prediction while the most recent value did not. For 6 others, both current metabolic control and the long-term parameters were independently predictive of death. This finding suggests that CM have chronic and acute effects on the likelihood of recovery from COVID-19 infection and that the former should routinely be considered for prediction models. Our results confirm the importance of CM parameters for COVID-19 for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. protective. This paradoxical effect might be explained by the fact that, for a given level of risk posed by SBP, a higher DBP is associated with a lower pulse pressure, reduced systolic wall stress, and better coronary perfusion. In a previous study, we showed that hypertension is the most important independent risk factor for COVID-19 death (Murata GH, doi pending). This analysis suggests that the most important mechanism is long-term elevation of SBP. BMI and obesity are considered risk factors for COVID-19 death (2, 5) . However, we found that higher values for BMIValue1 in the main model lowered the risk of death. This discrepancy might be explained by the fact that we controlled for the mediators of injury in obesity (SBP, DBP, A1C, HDL, and LDL). Once these factors are considered, a higher BMI may only be a marker of better nutrition. This finding illustrates the hazards of drawing conclusions from incompletely specified models. Serum albumin and hematocrit are indicators of nutritional status or catabolism and are powerful predictors of recovery from acute illness. On the other hand, they do not produce cumulative organ injury over years. Accordingly, ALBValue1 and HCTValue1 were included in the main model but not their long-term parameters. A high value for A1CValue1 was identified as a risk factor while A1CTimeWtAvg and A1CAbnAUC were not. Thus, recent hyperglycemia is deleterious in COVID-19 infection but microvascular injury developing over decades may not play that much of a role. Likewise, HDLValue1 had a for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.22.22269691 doi: medRxiv preprint protective effect while the long-term parameters did not. Finally, we found that time-weighted averages and abnormal AUC were both included as measures of disease burden for EGFR, ALT, LDL, and O2SAT. For the last 3, the effects were in opposite directions. These results imply that abnormal AUC provides information about risk beyond the time weighted average. This study shows that there is no justification for selecting one arbitrary parameter of a CM for modeling over the others unless there is a physiologic basis for doing so. We also found that temporal instability in a CMwhether reflected in long-or short-term trends, coefficient of variation, or tendency to relapsehad significant effects on prognosis. The direction and magnitude of this effect varied from CM to CM and, at times, were unexpected. Future studies should be done to identify the mechanisms by which fluctuations in CM affect recovery from COVID-19 infection. Until then, it is reasonable to include temporal changes of CM in models of COVID-19 death. This recommendation is consistent with the way that clinicians prioritize treatment of abnormal values. For example, many would consider a patient whose A1c rapidly increased from 5.0 to 8.5 to have a poorer prognosis than one whose A1c decreased from 11.0 to 9.0. As expected, measures of chronicity added little to the model once disease burden was considered. The minimal contribution from refractoriness suggests that clustering adds very little to the prognosis compared to the same abnormalities scattered over the patient's timeline. Nevertheless, it remains a useful clinical concept because it identifies patients for whom further treatment may be futile. We calculated a predicted probability of death (PDeathLabs) based upon complete value sets for 11 CM. The ROC area indicated that vital signs and laboratory findings are powerful predictors of outcomes. In fact, the ROC areas for PDeathLabs and age at diagnosis were equivalent, while PDeathLabs was consistently superior to Charl2Yrs, CharlEver, Elix2Yrs, and ElixEver. In summary, this term was a very convenient way to summarize millions of for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.22.22269691 doi: medRxiv preprint Development and validation of a 30-day mortality index based on pre-existing medical administrative data from 13,323 COVID-19 patients: The Veterans Health Administration COVID-19 (VACO) Index. PLoS One Development of COVIDVax model to estimate the risk of SARS-CoV-2-related death among 7.6 million US veterans for use in vaccination prioritization Developing a COVID-19 mortality risk prediction model when individual-level data are not available Risk prediction of covid-19 related death and hospital admission in adults after covid-19 vaccination: national prospective cohort study SARS-CoV-2 vaccine protection and deaths among US veterans during 2021