key: cord-0920695-ryw2w1ye authors: Linn, Kristin A.; Underhill, Kristen; Dixon, Erica L.; Bair, Elizabeth F.; Ferrell, William J.; Montgomery, Margrethe E.; Volpp, Kevin G.; Venkataramani, Atheendar S. title: The design of a randomized controlled trial to evaluate multi-dimensional effects of a section 1115 Medicaid demonstration waiver with community engagement requirements date: 2020-10-07 journal: Contemp Clin Trials DOI: 10.1016/j.cct.2020.106173 sha: 66e8eab4f8d83769032085d384c83117ad8caddf doc_id: 920695 cord_uid: ryw2w1ye Section 1115 demonstration waivers provide a mechanism for states to implement changes to their Medicaid programs. While such waivers are mandated to include evaluations of their impact, randomization โ€“ the gold standard for assessing causality โ€“ has not typically been a consideration. In a critical departure, the Commonwealth of Kentucky opted to pursue a two-arm randomized controlled trial (RCT) for their controversial 2018 Medicaid Demonstration waiver, which included work requirements as a condition for the subset of beneficiaries deemed able-bodied to maintain eligibility for benefits. Beneficiaries were randomized 9:1 to the new waiver program or a control group who would retain their current benefits as part of the existing Medicaid expansion program. To address potential bias from differential attrition from the Medicaid program that would accrue from solely analyzing administrative data, our team designed a rich, prospective, longitudinal survey to collect primary and secondary outcomes from six categories of interest to policymakers: insurance coverage, health care utilization and quality, health behaviors, socioeconomic measures, personal finances, and health outcomes. At baseline, a subset of survey participants was invited to participate in the collection of biometric samples via in-person follow-up visits, and a cross-section were also invited to participate in qualitative interviews. While the demonstration waiver was terminated before the program began, our study design illustrates that it is possible for other researchers and state agencies seeking to evaluate Medicaid demonstration waivers and other demonstration policies to work together to implement high quality randomized trials โ€“ even for controversial policies. States are increasingly using Section 1115 waivers to implement changes to their Medicaid programs. Consistent with the experimental imperative of demonstration waivers, the Centers for Medicare and Medicaid Services (CMS) stipulates that states must conduct and report the results of evaluations of their waiver programs. Unfortunately, to date these evaluations have yielded limited understanding of whether a given waiver program has achieved its objectives. A 2018 report by the Government Accountability Office (GAO) found ยง1115 waiver evaluation designs have typically lacked rigor and generally fail to provide actionable, policy-relevant information. 1 One major reason for this massive gap in evidence is that the majority of waivers have not employed an experimental strategy that randomizes beneficiaries to the new program or an appropriate control. Instead, universal implementation of waiver programs has forced researchers to rely on descriptive snapshots of Medicaid access and beneficiary health outcomes over time. Such analyses cannot reliably measure the impacts of a program, since any observed changes in beneficiary health outcomes could be attributed to other policy changes in the state or societal trends in health and economic opportunities. 2, 3 This ambiguity underscores the need for evaluation designs to include carefully considered comparison groups. 4, 5 condition for continued eligibility for adults considered able-bodied. The Commonwealth hypothesized that the program would improve beneficiary health as a result of "able-bodied, working age adults [experiencing] the dignity of a job, of contributing to their own care, and gaining a foothold on the path to independence." 6 An alternate possibility, however, was that the program would lead to coverage losses among beneficiaries who did not meet or report their required activities, as well as a reduction in program access among potential future beneficiaries. 7, 8 The types of research designs thus far used in evaluations of Medicaid waivers would not allow policymakers to credibly distinguish these two possibilities. 1, 9 In this report, we describe the RCT portion of the evaluation that would have been conducted had Kentucky HEALTH been implemented as planned on July 1, 2018. Program implementation was delayed several times due to litigation, and Kentucky HEALTH was ultimately ended by executive order on December 16 th , 2019. We discuss design and implementation challenges and general lessons that may be relevant to evaluations of Medicaid waiver programs -or other public programs -in other states. In light of implementation delays before the ultimate removal of the waiver, we also include a description of beneficiary departure from Medicaid by arm in the time between notice of randomized group assignment and October 2018. The Kentucky HEALTH demonstration waiver aimed to introduce community engagement (work requirements) and cost-sharing requirements, as well as remove vision and Figure 1 contains details about the number of individuals in the population who met exclusion criteria and the number of eligible individuals who were randomized. Primary outcome measures for the study were identified after considering the Kentucky HEALTH program is shown with the associated two-tailed evaluation hypothesis. All outcomes were beneficiary-level measures, given that beneficiary health was the primary goal of the waiver. We determined the total sample size for prospective data collection based on the six primary outcomes. The non-bolded outcomes in Table 1 comprise our secondary outcomes. Included in the secondary outcomes were biomarker measurements, described further below, which were collected from a subset of high-risk individuals who indicated a diagnosis of diabetes and/or hypertension in the baseline survey. Built into the evaluation plan was the opportunity to identify, (pre-)specify, and conduct additional analyses in future years. This is critical, as new hypotheses of interest may have emerged from surveys and qualitative interviews or from changes in demonstration waivers or implementation. Information obtained from analysis of data for individuals entering the waiver at the time of first implementation would be useful for structuring hypotheses, data collection efforts, and research designs for future randomizations to examine waiver impacts among individuals who first entered the program after initial implementation, when program features and implementation strategies had stabilized. In this context, the goal of the current document J o u r n a l P r e -p r o o f was to balance pre-specification (which minimizes prospects of data mining) and the opportunity to continually learn from the data in a policy relevant manner. J o u r n a l P r e -p r o o f Banking status Beneficiaries moved into Kentucky HEALTH will have significantly different health outcomes, compared to traditional Medicaid beneficiaries. Physical health days Self-reported mental health Mental health days Self-reported dental health Self-reported changes in health status Mortality Biometrics b Table 1 : Kentucky HEALTH program goals, primary hypotheses, and primary and secondary outcomes for the RCT. a All tests of evaluation hypotheses will consider "beneficiaries" to include all who are beneficiaries in each group at baseline. That is, beneficiaries who transition off the Medicaid program during the 5-year waiver period will be included in analyses, for both the Kentucky HEALTH group and the traditional Medicaid control group. b Compared in a high-risk sample that included all individuals who indicated they carried a diagnosis of diabetes and/or hypertension c The evaluation team suggested this category, so there is no associated Kentucky HEALTH Program Goal. Abbreviations: HEALTH, Helping to Engage and Achieve Long-Term Health; RCT, Randomized Controlled Trial; SUD, Substance Use Disorder J o u r n a l P r e -p r o o f The primary data source would have been a prospective, longitudinal survey of individuals sampled from both the waiver and control arms of the RCT population -the Kentucky HEALTH Experiment Survey (KHES). We opted for primary data collection for two reasons. First, Kentucky -like many other states -does not have an all payer claims database capable of tracking health utilization by individuals across changing sources of insurance. Furthermore, among states that do have all payer claims databases, many are limited to inpatient care. As it is critical to follow beneficiaries who leave the Medicaid program to understand both positive and negative program effects -and to mitigate bias from differential attrition -reliance on Medicaid claims alone could have provided biased estimates of health and utilization effects. Second, administrative data generally have blind spots, including the lack of validated self-reported physical and mental health measures and information on labor force participation. These topics have been reliably interrogated in a number of large-scale surveys. Survey activities (sampling, fielding the survey, and providing de-identified data to the evaluation team) were contracted out to NORC. The initial KHES assessment, which was planned as a baseline for the original implementation date, attempted to contact 34,191 individuals from the intervention group and 22,556 from the traditional Medicaid control group from April to August, 2018, in order to obtain a target sample of 5,400 intervention and 3,600 control group completed surveys at baseline. In total, NORC obtained 9,396 completed baseline surveys, including 5,590 from the intervention group and 3,806 from the control group ( Figure 1 ). Although an equal number of samples from the two arms would have been preferred to maximize power for hypothesis testing, NORC anticipated the need to contact a large portion of J o u r n a l P r e -p r o o f the 10% randomized to the control arm in order to obtain an adequate number of completed surveys over time. This prompted our team to prescribe a 60/40% composition of intervention/control completed surveys in the survey study design. Notably, planning an RCT requires careful sample size calculations that assume realistic survey response rates. For the initial KHES survey, the yield rate was 16.7%; using the various response rate formulas compiled by the American Association of Public Opinion Research (AAPOR), the response rate for this study ranges from 29.1% (definition 1) to 48.9% (definition 4). 11, 12 Yield rate is calculated from the total number of sources attempted to be reached, while, the AAPOR response rate formulas vary the denominator by dropping cases for various factors such as changed address with no forwarding, disconnected phone, or ineligibility for survey. In some cases, the number of unreachable cases that would have been ineligible had they been reached is estimated and removed from the denominator in the response rate calculation. Whether they remained in Medicaid or not, our design specified that baseline survey respondents would be re-surveyed at six months after Kentucky HEALTH implementation to capture immediate waiver effects, one year after implementation, and yearly thereafter for a total of five years. The five-year follow up would allow for an unprecedented long-run examination of the health and socioeconomic trajectories of a low-income population in the Medicaid program and the opportunity to evaluate health outcomes that may require years to develop (e.g., chronic disease severity or mortality). [17] [18] [19] Randomized individuals who were 60 years of age or older as of July 1, 2018 were not eligible for participation in the longitudinal survey. The upper age limit was chosen so that all survey participants had the potential to be exposed to the intervention or control for the full Drawing on the survey sample, we also purposively recruited a cohort of 127 individuals to complete a one-on-one qualitative interview by phone, which discussed healthcare utilization, health status, experiences with Medicaid, labor force and volunteering, financial circumstances, and perceptions of the Kentucky HEALTH program features. We planned to contact these individuals once per year during the duration of the waiver to discuss experiences with the program and any changes in insurance status. Qualitative data collection was planned for additional groups of beneficiaries, healthcare providers, and program staff in later years. Administrative data collected and maintained by the Commonwealth was expected to serve as a secondary data source, and it includes basic demographic characteristics (e.g., age, gender, race, level of education, location), family income (continuous percentage as a function J o u r n a l P r e -p r o o f Given the projection of N=7,250 completed surveys at the conclusion of the study, we report here the minimum detectible effect size for a simple comparison of binary outcomes across the two arms (Kentucky HEALTH vs. traditional Medicaid) at a single wave of follow-up using the KHES. Of our six primary outcomes, insurance status, annual wellness visit, current smoking status, and labor force participation are naturally binary. The primary analysis would dichotomize debt as none or more than $0 USD. Physical and mental health would be dichotomized as 14 or fewer poor health days in the past 30 days for each. In the effect size calculation, we wanted to achieve at least 90% power to reject the null hypothesis using a twosided test of two independent proportions. Conservatively, we used a Bonferroni adjusted Type I error to control the family wise error (FWE) for six tests, i.e., one for each of the primary outcomes. Under these specifications, we would have over 90% power to detect a difference in proportions of a primary outcome between arms of approximately 0.05 if the proportion in one J o u r n a l P r e -p r o o f group is 0.5 (i.e., the most conservative setting) in year five. Differences in proportions of 0.05 would constitute policy-meaningful effects; for example, a reduction of the percentage of Medicaid beneficiaries who respond "Every Day" or "Some Days" to the question, "Do you now smoke cigarettes every day, some days, or not at all?" from the baseline rate of approximately 68% to 63% would correspond to 20,000 beneficiaries quitting smoking. The subsequent reductions in heart disease, stroke, and cancer would likely translate into reduced costs and burden on the medical system. Furthermore, we would have greater power for testing outcomes with baseline proportions greater or less than 0.5, as well as greater power in waves prior to year five when the expected level of attrition would be lower than 20%. Effect size calculations for the bio-measure study are provided in the Supplementary Material. Here we provide an overview of the statistical analysis plan for the KHES data. The full waiver evaluation plan, which includes analysis specifications for observational administrative data and the qualitative surveys, can be found in the draft evaluation plan that was submitted to CMS (Section 5) that is provided in the Supplementary Material. Our primary approach specified an intent-to-treat (ITT) analysis that would target the effect of being randomized to the waiver group relative to control, regardless of each beneficiary's level of exposure to The RCT is built into the roll-out of the Section 1115 Medicaid demonstration project, which was being implemented by Kentucky and overseen by the Secretary of HHS. The waiver was considered to be a "demonstration project . . . subject to the approval of department or agency heads . . . that [is] designed to study, evaluate, improve, or otherwise examine public benefit or service programs." As such, the randomized implementation of the waiver was exempt from In any Medicaid population, employment, income, and other life changes can cause a portion of beneficiaries to transition in and out of the program, known as churning. We compared rates of departure from Medicaid by arm of the RCT and with respect to several key subgroups; departure was defined as absence of active enrollment as of August 2019. As of August 19 th , 2019, 26.6% of the randomized study cohort had unenrolled in Medicaid, including 26.7% of the arm assigned to Kentucky HEALTH and 26.1% of the control arm. Of the survey sample, 23.4% had unenrolled overall. The proportion of beneficiaries who had unenrolled was only slightly lower in the control arm at 23.2% versus 23.5% in the intervention arm. Table 2 displays demographic characteristics of the RCT cohort separately by Medicaid enrollment status and treatment arm. Overall, former beneficiaries who were no longer enrolled in Medicaid tended to be younger and more likely to be male and employed. We do not report p-values because of the large sample sizes both in the full randomized cohort and survey sample, noting that statistical significance of the differences may not reflect practical or meaningful differences between arms. With the potential for meaningful differences in mind, we designed the KHES to include beneficiary interviews regardless of whether they were enrolled or unenrolled in Medicaid throughout the follow up period to allow for an unbiased comparison of the intervention and control arms. have been proposed to minimize non-response. 13 Low-income populations tend to have higher rates of illiteracy and lower reading comprehension skills than the general population, which can influence response rates and the quality of the data collected from responders. 14,15 Low income populations also tend to be more mobile than the general population and often live in non-standard housing. In light of these factors, our team partnered with NORC to implement several evidence-based measures 16 that aimed to maximize survey response rates: 1) a presurvey mailer notified beneficiaries that they would be contacted at a later time and given the opportunity to participate in the KHES; 2) the mailer also explained the value of the data that would be collected for informing future decisions about Kentucky HEALTH and ensured beneficiaries that all individual-level responses would be confidential and not shared with the Commonwealth; 3) the survey was designed to take no longer than 20 minutes on average; and 4) individuals could participate by phone or by filling out the survey questions online. Despite planning similar measures for future waives of the KHES, loss to follow-up due to out-of-date contact information would have been a concern. NORC planned to use the Accurint locating service to obtain updated contact information for all baseline respondents. Since Accurint relies on credit records as a primary source of contact information, this could have introduced a potential source of bias if contact information was found to be less complete for lower-income survey participants. Although Kentucky HEALTH was ultimately not implemented, our study design can serve as a resource to other states and research teams planning evaluations of Medicaid waivers. In the following section, we briefly describe several alternative designs our team considered J o u r n a l P r e -p r o o f during the planning stage. First, we sought to randomize specific components of Kentucky HEALTH, so as to more reliably isolate the causal consequences of each component. We considered using a factorial design as an efficient approach to estimating main effects and interactions among the waiver components. 22 However, after discussions with the Commonwealth, such a design was ultimately deemed too complicated logistically from an administrative standpoint and likely to increase burden and confusion among beneficiaries who would be randomized to different subsets of the program components. We also contemplated using cluster randomization to assign households as opposed to individuals to the intervention or control. Although this design may have benefitted from higher social acceptability among beneficiaries by avoiding discordant assignment to intervention and control within households, it would have been challenging to correctly identify and track members of dynamic household units. Prior to the Commonwealth's decision to randomize beneficiaries, we also considered a stepped wedge study design where roll out of KY HEALTH would be staggered across areas. [23] [24] [25] However, we ultimately concluded that the small number of workforce areas in Kentucky would not provide sufficient power and that the RCT would be more robust to confounding by changes and trends over time in program administration, labor force, and other factors. In March 2019 -more than a year after we randomized Medicaid beneficiaries in Kentucky -CMS issued a document recommending that evaluations "be rigorous, incorporate baseline and comparison group assessments, as well as statistical significance testing." 26 The guidelines call for thoughtful evaluation plans that clearly detail how the proposed design will ensure hypotheses about waiver performance can be tested. The prescribed structure for evaluation plans broadly mimics the typical format of an RCT protocol, requiring specification of J o u r n a l P r e -p r o o f the main hypotheses of interest, study design, target population and control group, primary and secondary outcomes including time points at which they will be tested, data sources, and analytic strategy. The guidance recommends a discussion of study limitations and how the chosen design will minimize them. Following the section on limitations, CMS requires that states justify any analysis plan that does not include a comparison group or baseline data analysis. The implication is that less-rigorous, non-randomized study designs for new, untested program modifications should be the exception, not the rule. By encouraging the use of randomization when possible and emphasizing research design best practices, CMS is inviting high quality waiver evaluations that will help policymakers design evidence-based modifications to Medicaid programs in the future. The study design presented in this paper can be used by other researchers as an example of how to meet these new CMS requirements. During the 12-month planning phase prior to July 2018, our team participated in biweekly calls with state health officials to finalize the design of the waiver evaluation and work out the logistical details of implementation. In addition to biweekly calls, we also traveled as a team to Frankfurt, Kentucky to meet with policymakers in person in September 2017, January 2018, and May 2018. Early on, our team invested time building an understanding of the context and motivation for Kentucky HEALTH, which included the twin challenges of poor health and labor market outcomes among Medicaid beneficiaries and a devastating opioid epidemic. We also spent time in the initial months solidifying our knowledge of various logistical aspects that would shape our approach to the evaluation such as the algorithm that would be used to J o u r n a l P r e -p r o o f classify beneficiaries as medically frail and the structure and location of different databases containing individual-level beneficiary information and claims. After learning about the goals and logistics of Kentucky HEALTH, we identified policymakers' main hypotheses about the effects of the waiver and began to suggest possible study designs that would enable us to test those hypotheses. Throughout these early discussions, we conveyed the benefits of a randomized, controlled experiment, emphasizing that it would provide higher-quality evidence about the success or failure of the waiver than a set of observational study designs tailored to the various outcomes and hypotheses. This was particularly important since many other states in the region were engaged in significant concurrent policy interventions, making it likely that 'difference in differences' type analyses would be confounded by these concomitant interventions. Importantly, our team benefitted from working with a group of scientificallyminded policymakers in Kentucky who understood the value and rigor that randomization would bring to the evaluation. As a result, state health officials enthusiastically supported our recommendation to do an RCT. We also obtained buy-in from the state for other design recommendations by iterating frequently with policymakers and involving them in all steps of the decision-making process. As one example, we presented several options for the timing and frequency of follow-up biomeasure collection that would require different levels of investment, allowing the state to choose the option that best balanced their scientific and budgetary priorities. Flexibility to work within some of the state's logistical and budgetary constraints helped build rapport between our respective teams, facilitating trust and progress. For instance, though 1:1 randomization is common in clinical trials, policymakers in Kentucky wanted as many individuals as possible to J o u r n a l P r e -p r o o f be in the intervention group and requested a 9:1 allocation ratio between intervention and control. 5 With only 10% of the Medicaid population assigned to the control group, our survey sampling partner NORC alerted our team that they would not be able to achieve equal group sizes for the KHES when accounting for projected response rates. Given the state's preference for the 9:1 allocation, we modified our plan for the KHES data collection to survey beneficiaries using a 60/40% breakdown of intervention/control participants. Altogether, our experiences point to several lessons for researchers and policymakers seeking to evaluate Medicaid demonstration waivers using RCTs: 1. Start early with pre-implementation collaboration between researchers and state officials: we utilized a lead time of twelve months to settle on design features with the Commonwealth and plan data collection with NORC. 2. Find well-placed leaders within state government who are supportive of scientific approaches to policy evaluation. 3. Align study design with programmatic goals. For example, while researchers might desire a 50/50 randomization, policy priorities might dictate a higher proportion be part of the intervention group. 4. Collect data on Medicaid beneficiaries even after they exit the program. 5. Consider survey response rate and power: states with smaller Medicaid populations may need to assign more than 10% of the beneficiary population to the control arm, or risk inadequate power to compare study arms. J o u r n a l P r e -p r o o f 6. Account for delays: delayed implementation may have important effects on study design and may necessitate additional methods to keep the original cohort engaged, such as postcard mailings or additional follow-up phone calls. Randomized evaluations of Medicaid waiver programs -and other state policies -are more possible to implement than policymakers and researchers may realize. In conjunction with the Commonwealth of Kentucky, we designed an RCT that randomized a cross-sectional cohort of all enrolled Medicaid members in the Commonwealth of Kentucky in February 2018 to continue to receive traditional Medicaid or to receive benefits according to Kentucky HEALTH, a Section 1115 Medicaid waiver demonstration. As part of our research design, we designed a survey to obtain longitudinal data from a subset of the randomized individuals over five years of follow up, as well as qualitative interviews with beneficiaries and providers. Longitudinal follow-up of a representative, randomized sample allows for comparison of longterm health and labor outcomes between individuals assigned to the waiver and those assigned to receive traditional Medicaid benefits. Results from RCTs would provide actionable information to CMS and other states designing Section 1115 Medicaid waivers to promote the health and well-being of the citizens who interact with the Medicaid system. Our experiences underscore that it is possible for other researchers and state agencies seeking to evaluate Medicaid demonstration waivers and other demonstration policies to work together to implement high quality randomized trials -even for controversial policies. J o u r n a l P r e -p r o o f Medicaid Demonstrations: Evaluations Yielded Limited Results, Underscoring Need for Changes to Federal Policies and Procedures. United States Governmental Accountability Office Causal Inference in Statistics: An Overview The State of Applied Econometrics: Causality and Policy Evaluation Statistics and Causal Inference Moving Toward Evidence-Based Policy: The Value of Randomization for Program and Policy Implementation SPEECH: Remarks by Administrator Seema Verma at the The Problem With Work Requirements for Medicaid Medicaid Work Requirements -Results from the First Year in Arkansas Fulfilling States' Duty to Evaluate Medicaid Waivers Health Insurance Coverage and Health -What the Recent Evidence Tells Us A Revised Review of Methods to Estimate the Status of Cases with Unknown Eligibility Assessment of Medicaid Beneficiaries Included in Community Engagement Requirements in Kentucky Optimal design features for surveying lowincome populations Literacy skills and communication methods of lowincome older persons Adult Literacy in America: A First Look at the Findings of the National Adult Literacy Survey Increasing response rates to postal questionnaires: systematic review The Oregon experiment--effects of Medicaid on clinical outcomes Lessons from early Medicaid expansions under health reform: interviews with Medicaid officials Exemption of Certain Research and Demonstration Projects from Regulations for Protection of Human Research Subjects 48 Fed. Reg. 9266 In:1983. 21. NCT03602456: Evaluation of the Health and Economic Consequences of Kentucky's Section 1115 Demonstration Waiver Implementing Clinical Research Using Factorial Designs: A Primer The stepped wedge trial design: a systematic review Design and analysis of stepped wedge cluster randomized trials The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting Section 1115 Demonstrations Developing the Evaluation Design The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Dr. Volpp has received research support from