key: cord-0015292-cq31gx7z
authors: Zheng, Hua; Ryzhov, Ilya O.; Xie, Wei; Zhong, Judy
title: Personalized Multimorbidity Management for Patients with Type 2 Diabetes Using Reinforcement Learning of Electronic Health Records
date: 2021-02-11
journal: Drugs
DOI: 10.1007/s40265-020-01435-4
sha: cf9d24300849e473f68c78a1ed4aee52e1f5dda0
doc_id: 15292
cord_uid: cq31gx7z

BACKGROUND: Comorbid chronic conditions are common among people with type 2 diabetes. We developed an artificial intelligence algorithm, based on reinforcement learning (RL), for personalized diabetes and multimorbidity management, with strong potential to improve health outcomes relative to current clinical practice. METHODS: We modeled glycemia, blood pressure, and cardiovascular disease (CVD) risk as health outcomes, using a retrospective cohort of 16,665 patients with type 2 diabetes from New York University Langone Health ambulatory care electronic health records in 2009–2017. We trained an RL prescription algorithm that recommends a treatment regimen optimizing patients’ cumulative health outcomes using their individual characteristics and medical history at each encounter. The RL recommendations were evaluated on an independent subset of patients. RESULTS: The single-outcome optimization RL algorithms, RL–glycemia, RL–blood pressure, and RL–CVD, recommended consistent prescriptions as that observed by clinicians in 86.1%, 82.9%, and 98.4% of the encounters, respectively. For patient encounters in which the RL recommendations differed from the clinician prescriptions, significantly fewer encounters showed uncontrolled glycemia (A1c > 8% in 35% of encounters), uncontrolled hypertension (blood pressure > 140 mmHg in 16% of encounters), and high CVD risk (risk > 20% in 25% of encounters) under RL algorithms compared with those observed under clinicians (43%, 27%, and 31% of encounters, respectively; all p < 0.001). CONCLUSIONS: A personalized RL prescriptive framework for type 2 diabetes yielded high concordance with clinicians’ prescriptions, and substantial improvements in glycemia, blood pressure, and CVD risk outcomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s40265-020-01435-4) contains supplementary material, which is available to authorized users.

Comorbid chronic conditions are common among people with type 2 diabetes (T2DM) [1] . Hypertension (HTN) and atherosclerotic cardiovascular disease (CVD) are the two most common multimorbidities for T2DM patients [2] ; Therefore, the need to address comorbid chronic conditions, in addition to patients' diabetes-specific treatment goals [3] ,

Artificial intelligence (AI) prescription algorithms have been successfully applied to single disease problems, but previous applications have not considered comorbid conditions, pharmacological treatments, treatment histories, and other individual characteristics that are important for personalized diabetes management.

We trained and evaluated a series of AI algorithms to optimize patients' glycemia, blood pressure, and CVD risk outcomes, either individually or jointly, using a retrospective cohort of type 2 diabetes patients from an ambulatory care electronic health records database (2009) (2010) (2011) (2012) (2013) (2014) (2015) (2016) (2017) .

When optimizing glycemia, blood pressure, and CVD risk individually, the algorithms consistently recommended prescriptions with clinicians' decisions in 86.1%, 82.9%, and 98.4% of patient encounters. In cases where the AI recommendation differed from the clinicians' prescriptions, health outcomes were significantly improved.

The RL algorithm can be integrated into electronic health record platforms to assist physicians with dynamic real-time suggestions on personalized treatment paths. T2DM, and treatment history. In our setting, we first applied RL to optimize glycemic control, BP control, and CVD prevention separately, and then studied the potential of RL for multimorbidity management by optimizing all three outcomes jointly. We evaluated the effectiveness of the personalized treatment recommendations made by RL against the observed clinicians' treatment by estimating patient outcomes based on the outcomes of similar patients in the EHR database.

We used ambulatory care EHR samples for T2DM patients from New York University Langone Health (NYULH-EHR) to derive and validate the RL algorithm. Eligible patients had had at least one encounter with an NYULH ambulatory primary care physician between 2009 and 2017 and had been selected by a T2DM rule-based phenotyping algorithm, defined as the following criteria: (1) had at least two encounters with an International Classification of Diseases, Tenth Revision (ICD-10) code for T2DM; (2) had two or more abnormal hemoglobin A1c (A1c; ≥ 6.5%) and at least one encounter with an ICD-10 code for T2DM; or (3) had a prescription for a T2DM medication, excluding metformin and acarbose. We excluded patients seen for consultation only and patients in emergency department, inpatient, or specialist settings, as these lacked consistent documentation of T2DM across encounters. We randomly selected 60% of the eligible patients as the training cohort to develop the RL algorithm, and reserved the remaining 40% of patients as the test cohort to evaluate the performance of the RL algorithm. This study was approved by the NYULH Institutional Review Board, and the data were de-identified to ensure anonymity.

For each patient, we had access to demographic data, including age, sex, race, ethnicity, and smoking status, as well as the following biomarkers: systolic BP (SBP), diastolic BP (DBP), BMI, HbA1c, total cholesterol (TC), low-density lipoprotein (LDL), high-density lipoprotein (HDL), creatinine, triglycerides, and estimated glomerular filtration rate (eGFR). In NYULH-EHR, 1% of samples had missing vitals, including BPs and BMI, 8% had missing HbA1c, 5-32% had missing renal function biomarkers, and 13% had missing lipid biomarkers. Following on from the work of Lundberg et al. [18] , we imputed the missing patients' biomarkers based on the observed values measured in previous encounters.

Medication prescriptions were first grouped by therapeutic class codes of antihyperglycemic, antihypertensive, and lipid-lowering, then analyzed by pharmacologic subclass. The antihyperglycemic therapeutic class contains nine pharmacologic subclasses, including the peroxisome pharmacological treatments, individual treatment histories, and other individual characteristics that may inform treatment selection.

We provide an artificial intelligence (AI) prescription algorithm, based on reinforcement learning (RL), which is able to dynamically suggest personalized optimal treatments for patients with T2DM to manage their multimorbidity based on evidence from patients' electronic health records (EHRs). RL has been successfully applied in the past to single disease problems, such as blood glucose control [11] , HIV therapy [12] , cancer treatment [13] , anemia treatment in hemodialysis patients [14] , treatment strategies for sepsis in intensive care [15] , and a personalized regimen of sedation dosage and ventilator support for patients in intensive care units (ICUs) [12] . Prescriptive algorithms using regression trees and k nearest neighbors (kNN) have previously shown great potential in personalized diabetes management [16, 17] .

Our approach leverages the power of RL and abundant data in the EHR system to dynamically recommend treatment prescriptions, which are personalized based on patient characteristics, including age, sex, race, body mass index (BMI), blood pressure (BP), laboratory tests, duration of proliferator-activated receptor (PPAR) agonist thiazolidinedione (PPARg), insulin-release stimulant type (INSR), incretin mimetic (glucagon-like peptide 1 receptor agonist; GLP1), dipeptidyl peptidase-4 (DPP4) inhibitor and biguanide (DPP4-BIG), DPP-4 inhibitors (DPP4), biguanide type (BIG), insulin-release stimulant and biguanide (INSR-BIG), sodium-glucose cotransporter-2 inhibitors (SGLT2), and insulins (INSO). The antihypertensive therapeutic class contains 10 pharmacologic subclasses, including angiotensin receptor antagonists (ARAs), potassium-sparing diuretics in combination (PSD), α/β-adrenergic blocking agents (ABAB), ACE inhibitor with thiazide or thiazide-like diuretic (ACE-TD), ARAs with thiazide diuretic (ARA-TD), ACE inhibitors (ACE), thiazide and related diuretics (TD), β-adrenergic blocking agents (BAB), calcium channel blocking agents (CCB), and ARAs with CCBs (ARA-CCB). The antihyperlipidemic therapeutic class contains five pharmacologic subclasses, including bile salt sequestrants (BSS), HMG-CoA reductase inhibitors (HMG), HMG-CoA reductase inhibitors and cholesterol absorption inhibitors (HMG-CA), proprotein convertase subtilisin/kexin type 9 inhibitors (PCSK9), and lipotropics (LIP).

RL algorithms model the course of patients' EHR histories, which includes prescriptions, biomarkers, and health outcomes changing over time using a Markov decision process with key elements, including state, action, and reward [15, 19] . In this setting, 'state' refers to the observed patient demographics, laboratory test results at the current encounter, and their histories of laboratories tests and prescriptions. 'Action' refers to the prescribed treatment regimen at the current encounter, which are pharmacologic subclasses or their combinations. The result of an action is a numerical reward representing the improvement of health outcomes compared with the previous encounter. The cumulative reward is defined as the sum of the rewards along the course of EHR encounter records. RL has been well-established as an efficient AI learning algorithm to maximize cumulative reward by selecting an optimal action at each encounter through a learning algorithm called Deep Q Networks [20, 21] with a multilayer (deep) neural network. An important advantage of RL is that the action in every encounter is personalized to the patient's individual characteristics as they are observed, in a way that optimizes the cumulative reward. In this paper, we focus on glycemia control (lowering A1c towards 6.5%), BP control (lowering SBP towards 120 mmHg), and CVD prevention (minimizing CVD risk). We first optimized each outcome individually using three separate RL algorithms, referred to as RL-glycemia, RL-BP, and RL-CVD. We then trained a multimorbidity management RL algorithm (RL-multimorbidity) to optimize glycemia, BP and CVD risk simultaneously. The details of state, action, and reward are described as follows

• State: A list of observed patient characteristics, including age, sex, race, smoking status; vitals and laboratory test values at current encounter and in the past 6 months, including BMI, weight, SBP, DBP, triglycerides, TC, HDL, LDL, A1c, and creatinine; prescription history in the past 6 months; and encounter histories, including days since the previous encounter and days since the first encounter. • Action: The action space consists of the pharmacologic subclasses and their combinations, referred to as the treatment regimen. The action space of RL-glycemia contains nine pharmacologic subclasses in the antihyperglycemic therapeutic class, or their combinations; the action space of RL-BP contains 10 pharmacologic subclasses in the antihypertensive therapeutic class, or their combinations; the action space of RL-CVD contains five pharmacologic subclasses in the antihyperlipidemic therapeutic class, or their combinations; and the action space of RL-multimorbidity contains pharmacologic subclasses in all three therapeutic classes, or their combinations. • Reward: The reward of a prescription is a numeric measure of treatment efficacies between two consecutive encounters. For RL-glycemia, if A1c <5.6% in both encounters, their rewards are zero, otherwise the reward is defined by the reduction in A1c. For RL-BP, if patients have no HTN symptoms (< 120 mmHg) in both encounters, the reward is zero, otherwise it is equal to the decrease in SBP. For RL-CVD, the reward is the reduction in global CVD Framingham Risk Score (FRS) [22] , which is a function of age, TC, HDL, SBP, treatment for HTN, smoking, and T2DM status (all yes). Sex-specific risk equations were applied to males and females separately. For RL-multimorbidity, the reward is defined as the average of standardized rewards values of RL-BP, RL-glycemia, and RL-CVD (model and training details are shown in the electronic supplementary materials).

We evaluated the RL-recommended therapy by comparing its effect with the observed clinicians' prescriptions on the test cohort of NYULH-EHR samples. In each encounter, the RL algorithm recommends a treatment regimen for the patient. If the recommendation is the same as the observed clinicians' prescriptions in the data, we noted that RL is 'consistent'

with the clinicians' prescriptions. When RL is discrepant with the clinicians' prescriptions, the efficacy of the RL-recommended treatment is not directly observed. For this reason, we imputed the outcome of the RL-recommended treatment using kNN regression, an approach commonly used for causal inference in observational studies [23] . In short, the imputation works by averaging the outcomes of the k most similar patient encounters, in terms of patient characteristics, in which the RL-recommended therapy had been administered by clinicians. The similarity between patient encounters was estimated by Euclidean distance, as in the study by Bertsimas et al. [16] . To assess the performance of the imputation, we first compared imputed outcomes with observed outcomes under clinicians' treatments, and found 87-95% correlation between them, indicating that the imputation algorithm can effectively estimate unobserved health outcomes (Table 1) . We varied the number k of nearest neighbors and found the performance of the imputation (for any of the three health outcomes) was insensitive when k was between 8 and 10. We estimated the efficacy of the recommendations made by RL, first in the whole set of test samples, and then for individual sex, racial, and age subgroups.

To better understand which features have the most impact on treatment recommendations, we used SHAP (SHapley Additive exPlanations) [24, 25] to estimate and rank the contributions of clinician features explaining RL and clinician prescriptions.

Overall, 16 To understand when and how RL makes different prescriptions from clinicians, Table 4 compares consistent and discrepant encounters by patient demographics and clinical characteristics. The most significantly associated factor was severity at the time of the encounter. For RL-glycemia, encounters with higher A1c were more likely to have different recommendations (average A1c 8.1% for discrepant encounters vs. 7.5% for consistent encounters, p < 0.001). For RL-BP, encounters with higher SBP were more likely to have different recommendations (average SBP 132.85 vs. 131.00 mmHg, p < 0.001).

The efficacy of the RL prescriptive algorithms was consistently observed across T2DM patients, and sex, racial, and age subgroups (Tables 5, 6, 7). Specifically, African American T2DM patients, and T2DM patients aged older than 60 years, observed higher efficacies from the RL algorithms than clinicians' prescriptions compared with the observed efficacies in White patients and patients aged 60 years and younger. For example, A1c under RL-glycemia for African American patients was 0.39% lower than under clinicians' treatment. In contrast, A1c under RL-glycemia was 0.28% lower than under clinicians' treatment for White patients. Patients aged 60 years and younger observed higher efficacy, with A1c under RL-glycemia 0.47% lower than that under clinicians' treatment, than those older than 60 years of age, with A1c under RL-glycemia 0.19% lower than that under clinicians' treatment.

The patterns of different treatment recommendations, along with the resulting differences in health outcomes, for RL-glycemia, RL-BP, and RL-multimorbidity, are illustrated in Fig. 1 . In the case of RL-glycemia, the most frequently observed discrepancy (1167 encounters) was that clinicians prescribed insulin monotherapy (INSO) while RL prescribed biguanide type (BIG). On these encounters, RL-glycemia achieved, on average, 1.22% lower A1c than clinicians. In the case of RL-BP, the most frequently observed discrepancy (1010 encounters) was that clinicians prescribed ACE inhibitors (ACE), while RL prescribed BABs. On these encounters, RL-BP achieved a 6.78 mmHg lower SBP. The most frequently observed discrepancy between RL-multimorbidity and clinicians' prescription was biguanide type (BIG) prescribed by clinicians, and HMG-CoA reductase inhibitors (HMG) prescribed by RL-multimorbidity, observed in 1272 patient encounters. On these discrepant encounters, RL-multimorbidity achieved a 0.15% higher A1c but 2.42% lower CVD risk and 0.30 mmHg lower SBP. Overall, RL algorithms tended to prescribe fewer medications than clinicians (Fig. 2) . Figure 3 shows the importance of features associated with the RL-multimorbidity algorithm and clinicians' prescriptions. In general, there was reasonable agreement between the feature importance estimates of RL-multimorbidity and those identified by the clinicians. A1c is the most important feature for clinicians, while RL-multimorbidity was most influenced by recent therapies, age, BMI, and A1c. One difference is the importance of creatinine in the clinicians' prescriptions, but it was not as important for RL-multimorbidity. Another difference is the reduced role of the time since first encounter in RL-multimorbidity compared with clinicians' prescriptions.

To our best knowledge, this is the first RL-assisted prescriptive algorithm for personalized single and multimorbidity outcome management for patients with T2DM. Using an EHR database, the developed RL algorithm can efficiently Table 4 Comparison of RL and clinicians for glycemic control, BP, and CVD prevention.

Demographic characteristics of patients having encounters at which RL and clinicians prescribed consistently versus differently. Categorical variables are expressed as frequency (%), and continuous variables are expressed as the mean (SD) of biomarkers RL reinforcement learning, BP blood pressure, CVD cardiovascular disease, T2DM type 2 diabetes mellitus, HTN hypertension, SBP systolic blood pressure, DBP diastolic blood pressure, BMI body mass index, C cholesterol, LDL low-density lipoprotein, HDL high-density lipoprotein, SD standard deviation recommend treatment regimens to optimize patient health outcomes incorporating their individual demographic and treatment history. Compared with other machine-learning methods, the RL approach has a particular advantage as it can efficiently learn complex dynamic drug-disease and drug-drug interactions in the presence of high temporal variation, uncertain outcomes, and long-term treatment effects [15, 19] . RL recommendations showed high levels of concordance with clinicians' prescriptions for single outcome optimizations of glycemia, BP, and CVD risk control. This demonstrates the feasibility of using RL for T2DM management, and indicates that clinicians make near-optimal decisions with regard to single-outcome management. RL-multimorbidity recommendations showed more frequent discrepancy with clinicians' prescriptions as well as the recommendations by single-outcome RL algorithms. This provides data-driven evidence that optimizing multimorbidity management is different from optimizing single outcomes in parallel. For example, in the 1272 patient encounters with the most frequently observed discrepancy between RL-multimorbidity and clinicians, their average A1c was 7.0%, SBP was 127.2 mmHg, and CVD risk was 12.6%. For these encounters, clinicians prescribed BIG to prioritize glycemic control, while RL-multimorbidity prescribed HMG for lipid-lowering. This indicates challenges and uncertainties of multimorbidity management for patients with borderline and balanced levels of severities in multiple chronic conditions [26, 27] . RL-multimorbidity showed overall improvements in managing the three outcomes simultaneously, significantly reducing the number of encounters with uncontrolled glycemia, uncontrolled HTN, and high FRS CVD risk.

Although both clinicians and RL-multimorbidity place high importance on similar factors, these factors are ranked differently. RL algorithms did not weigh features that were not included in the reward functions, such as creatinine, as much as clinicians who consider it an important renal function biomarker. This indicates a potential challenge of the RL algorithms using single-directed reward outcomes as the optimization goal. Ideally, a comprehensive reward function should incorporate domain knowledge and adverse events, such as hypoglycemia and kidney comorbidity, to achieve optimized outcomes while balancing the risks of adverse events [28] .

Typical limitations with EHR data are their unobserved medication adherence, partially observed clinical data at each encounter, and uncontrolled time span between encounters [29] . However, the RL algorithms were designed to incorporate these uncertainties under real-world scenarios. In particular, if there were observable patient characteristics that were associated with higher non-adherence to a certain treatment leading to lower levels of efficacy, RL would be able to identify this and prescribe different treatments for patients with those characteristics.

Although our evaluation methodology controls for several confounding factors that could explain differences in treatment effects, we can only estimate counterfactual outcomes under RL recommendations for patients with discrepant prescriptions. In addition, the T2DM patient population from NYULH ambulatory care may not be representative of the United States T2DM population. To ultimately validate the efficacy of the RL algorithms, randomized clinical trials with patients randomly assigned to RL and clinician mechanism would be needed.

In this study, we demonstrated the feasibility of using RL prescriptive algorithms for patients with T2DM to manage their multimorbidity based on test data from an ambulatory care center. The RL-glycemia, RL-BP, and RL-CVD algorithms showed high concordance (83-98%) with clinicians' prescriptions, while RL-multimorbidity showed relatively low concordance (71%) for multimorbidity management. For patient encounters in which the RL recommendations differed from the clinician prescriptions, RL prescriptions showed significantly improved health outcomes compared with clinicians' prescriptions. Potentially, the algorithm can be integrated into EHR platforms to assist physicians for T2DM management with dynamic real-time suggestions of personalized treatment paths.

The online version of this article (https ://doi.org/10.1007/s4026 5-020-01435 -4) contains supplementary material, which is available to authorized users.

Comparing the national economic burden of five chronic conditions

Diabetes, hypertension, and cardiovascular disease: clinical insights and vascular mechanisms

Feature importance of (a) RL-multimorbidity and (b) clinician prescription. RL reinforcement learning, BMI body mass index, HDL high-density lipoprotein, LDL low-density lipoprotein

The impact of comorbid chronic conditions on diabetes care

Rising to the challenge of multimorbidity

Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study

Challenges of managing people with multimorbidity in today's healthcare systems

Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review

Managing patients with multimorbidity: systematic review of interventions in primary care and community settings

ACC/AHA guideline on the primary prevention of cardiovascular disease: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines

Prevalence of hypertension in the US adult population. Results from the Third National Health and Nutrition Examination Survey

Reinforcement learning application in diabetes blood glucose control: A systematic review

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer

Optimization of anemia treatment in hemodialysis patients via reinforcement learning

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care

Personalized diabetes management using electronic medical records

Optimal prescriptive trees

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery

Reinforcement learning: an introduction

Playing atari with deep reinforcement learning

Human-level control through deep reinforcement learning

General cardiovascular risk profile for use in primary care: the Framingham heart study

Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge: Cambridge University Press

A unified approach to interpreting model predictions

Explainable machine-learning predictions for the prevention of hypoxaemia during surgery

How to measure comorbidity: a critical review of available methods

Measures of multimorbidity and morbidity burden for use in primary care and community settings: a systematic review and guide

Reward functions for accelerated learning. Machine learning proceedings 1994

Medication adherence: a call for action

Acknowledgements Hua Zheng, Ilya O. Ryzhov, Wei Xie, and Judy Zhong report no conflicts of interest. Judy Zhong is funded by NIA R01AG054467 and NIA R01AG065330.Author Contributions WX, JZ, and IOR initiated the study. WX, IOR, and HZ designed the data analyses, algorithm, and experiments. JZ provided the EHR data, clinical assessment, interpretation of subgroup efficacy, model performance and feature importance, and connections with clinicians' workflow. HZ wrote the paper in conjunction with JZ All authors have read and approved the final manuscript, contributing edits where applicable. WX and JZ take full responsibility for the work, including the study design, access to data, and the decision to submit and publish the manuscript.

Funding JZ is funded by NIA R01AG054467 and NIA R01AG065330.

The authors have no relevant financial or non-financial interests to disclose