key: cord-0812091-3629rlai authors: nan title: 12th Meeting of the International Academy of Health Preference Research date: 2021-06-18 journal: Patient DOI: 10.1007/s40271-021-00532-0 sha: 3befbe9c41f4b5e204e60188e7db3429392f2cc4 doc_id: 812091 cord_uid: 3629rlai nan Abstract: The QALY assumption that health preferences are separatively additive in a sequence of health states is inconsistent with diminishing marginal utility, which predicts that the quality-adjusted value of a sequence of health improvements depends on timing. We show under what conditions the sum of linear health-state utilities (HSUs) can distort nonlinear time-equivalent values (TEVs) and provide examples of short-term (B 1 year) nonlinearity from published DCE studies. Linear distortion depends on the combined effect of degree of curvature in the utility function and the pattern of clinical health-state improvements over the treatment period. We vary the curvature of a utility function of the CARA form: is linear for a=0. We compare 3 utility-function curvature patterns with 3 possible treatment-effect timing patterns having the same average slope (and thus same QALYs): linear, initial slow, and initial fast speeds of onset of action. The TEVs for combinations of timing-specific marginal treatment effects and marginal-utility weights are calculated as: where H mt is the health index for treatment-effect timing pattern m in period t, U nt is the utility of a utility function with curvature n in period t, and T is the number of treatment periods. We identify cases where differences in timing of health-status improvements can result in linear-additive QALYs overstating, understating, and equaling nonlinear TEV values, depending on how health changes and nonlinear-curvature weights interact. Unlike QALYs, general TEVs do not require eliciting time/quality trade-off preferences in one-year longevity increments. DCEs increasingly are replacing traditional methods for estimating HSUs. They also offer a tractable means of obtaining evidence on timingspecific, short-term, nonlinear health utility. We discuss results from three published empirical DCE applications that identify diminishing marginal utility in timing for short-duration spells of ill health: migraine (24 h), EQ-5D (2 months), and Crohn's disease (12 months). Background: DCE studies are commonly applied to elicit preferences but are perceived as being difficult for respondents. This raises the question about whether a less complex approach can give similar results. The main objective of this study was to compare two statedpreference methods, DCE and PTT, to assess differences in preferences measured and ease of use of the two methods [1] . Methods: In the United Kingdom, Germany, and Romania (n = 2959), representative samples of the general public completed a DCE and PTT in random order to elicit preferences for treatments that reduce a baseline chance of getting rheumatoid arthritis (RA) within 2 years. For each country separately, random parameters logit (RPL) models (DCE) and interval regression models (PTT) were used. Model results (relative importance, maximum acceptable risk [MAR]) and participant feedback were compared across methods. Results: Results were consistent across countries. For a 40 percentage point reduction in chance of developing RA (60-20%), MARs did not differ between methods for serious infection or serious side effects but were dissimilar for mild side effects (e.g., 45.8% DCE v. 15.8% PTT in UK). The majority found both methods easy/very easy to complete with the DCE reported being easier (p \ 0.05). Respondents who completed the DCE first found both methods easier to understand and easier to answer (p \ 0.05). Conclusions: Across all countries, MARs for the two relatively more important attributes did not differ across methods, but MAR differed for the relatively less important attribute. Both DCE & PTT were easy for a majority of participants to understand and complete, with DCE being easier. DCE has an advantage of a multi-attribute approach that considers all trade-offs simultaneously. However, PTT might be equally suitable when considering more important attributes, with a simpler design and fewer questions. Background: In addition to asking respondents their first choice, there is an increasing interest to ask them to choose among the remaining alternatives of a choice set in Discrete Choice Experiments (DCEs). To inform the decision on which preference method to use if wanting to move beyond traditional first-choice DCE, this study makes a head-to-head comparison of best-worst, best-best and ranking discrete choice experiments. Methods: The study consisted of three arms, respondents were randomised to one. Each arm involved an identical experiment and only differed in the elicitation method: best-worst; best-best and ranking. The three methods were compared using six criteria: trade-off consistency, choice consistency, scale dynamics, efficiency, stated difficulty and stated preference. Rank-ordered mixed logit models and respondent-reported data were used to compare the criteria between arms and first and second choices. Results: Choices were most consistent in ranking and least consistent in best-worst, especially the second ''worst'' choice. Learning effects and efficiency were largest for best-worst, especially in the second choice. Furthermore, ranking was perceived to be easiest and most preferable. However, trade-offs differed more by first or second choice within a preference elicitation method than between the three methods. Respondents were more consistent in first choices, these were also reported to be easier. Conclusions: All methods improve efficiency of data collection relative to using first choices only. However, even after allowing for differences in scale and scale dynamics, first choices reflect preferences that differed from those of second choices for all three preference elicitation methods. This raises doubts whether to move beyond first-best choice preference elicitation methods. Methods: A targeted review of recent DCE articles (2018-2020Q1) published in the health, marketing, and transport economics literature was used to identify the most commonly used internal validity tests. Two of these were then incorporated in four different (online) data collections. Based on the estimated respondent preferences, the achieved sensitivity and specificity was simulated and compared with the sensitivity and specificity of the most commonly used (RLH) statistical test. Results: Dominant and repeated choice tasks are included in about 1/5 of the recent DCE publications and by far the most commonly used internal validity tests. Across different datasets, their sensitivity ranges from 76 to 83%, while their specificity depends on the type of invalid response pattern (e.g. random, deterministic, etc) but ranges from 32 to 68%. In comparison, the RLH test was found to have a sensitivity of 79-97% and specificity of 46-94%, of which 93-94% for random response patterns. Conclusions: Dominant and repeated choice tasks are relatively unreliable at identifying high and low-quality respondents as well as costly to include in terms of statistical power. The root-likelihood test, in contrast, does not require additional choice tasks to be included in the DCE design and was found to provide a superior alternative that never performs worse and often performs substantially better, particularly at identifying random response patterns. Methods: Labelled utility functions were rewritten into a single generic utility function using a label dummy variable and indicator functions, which was used to create a PCSD with 3 alternatives in each task (out of 6). The convergent validity of two designs' results from conditional logit and mixed logit models were tested using the Swait and Louviere test and the convolution test, respectively. The PSCD's impact on choice variances was examined using a heteroscedastic conditional logit model. Results: Using data from 790 respondents, we found preference estimates from the FSCD and PSCD are statistically different up to scale. These results remain different even accounting for more flexible substitution patterns across alternatives using the MIXL model. We found that the PCSD appeared to induce smaller choice variance than the FCSD, which reflects positively on its purpose of reducing the cognitive burden. The PSCD was preferred by female and when phones were used to answer the survey. Conclusions: Our findings indicate that the PCSD can reduce cognitive burden and we suggest its use for surveys accessible by mobile phone. While both a PCSD and FCSD should capture the same behaviour, our study reveals statistically significant differences, perhaps because respondents in a FCSD were not trading off on all attributes and alternatives due to choice task complexity, but without more research on external validity it is not possible to conclude which design type better uncovers true preferences. Background: A dual-response none option format can provide (potentially useful) preference information for respondents who otherwise consistently choose the opt-out option. Unfortunately, there is still a lack of evidence about the quality of these additional choice data. Morover, the different framing of the opt-out options could very well induce slightly different choice behavior, resulting in different preference estimates and potentially very different uptake predictions. Methods: To investigate the impact of the opt-out elicitation format on preference estimates, uptake predictions, and data quality, an existing COVID-19 instrument was re-fielded using a standard and dual-response none option format. This resulted in two nationally representative samples of approx. 1,000 respondents each. These data were analyzed using raw data tabulations, MIXL (individual-level) uptake predictions, and data quality was determined using statistical root-likelihood (RLH) tests. Results: In the standard none sample, 24% always, 26% sometimes, and 50% never chose the opt-out, resulting in a 60% predicted uptake. In the dual-response sample, 28% always, 37% sometimes, and 36% never chose the optout, resulting in a 49% predicted uptake. Compared to the external benchmark of 35-40%, the dual-response format thus resulted in better external validity. Data quality was also better in the dual-response sample with fewer respondents identified as having used a random response pattern. Conclusions: The dual-response none format resulted in improved external validity without an obvious trade-off in terms of data quality -a conclusion that also holds for respondents who consistently choose the none option. These respondents did appear to have distinctively different preferences, which implies that practioners have to be cautious when pooling their choice data in a single (combined) statistical model. Background: Besides the lack of evidence-based guidance on accurate health-risk communication in DCEs, modelling the impact of benefit/risk attributes on the preferences of respondents merits further deliberation. Screening or elimination of alternatives before choice (i.e, choice set formation [1, 2] ) might be used in health related DCEs that include benefit-risk attributes. This study aims to demonstrate the econometric modelling of benefit/risk-based choice set formation within health-related DCEs. Methods: In four different case studies first a standard trade-off model was fitted (multinomial logit model), building on this a screening model was fitted and finally a full choice set formation model was estimated. This final model allows for attributes to be used first to screen out alternatives from choice tasks before respondents' trade-off attributes and make a choice among feasible alternatives [1, 2] . Educational level and health literacy of respondents was accounted for in all models. Results: Model fit in (e.g, Log Likelihood & BIC) improved from using only trade-off or screening models compared to choice set formation models in three out of four studies. In those studies, significant screening behavior was identified which significantly impacted trade-off inferences, rejecting the pure trade-off model and supporting the existence of screening on the basis of benefit/risk profiles. Educational level and health literacy showed significant interactions with attributes in all studies. Conclusions: Choice modelers should pay close attention on how respondent behave when they include benefit/risk attributes in their DCE. Further studies should investigate why and when respondents undertake screening behavior. Researchers should explore extensions of econometric models to reflect non-compensatory behavior. Assuming benefit and risk attributes will only impact trade-off behavior is likely to lead to false conclusions about benefit/risk-based behavior. References: Background: Accurate evaluation of treatment preferences in DCEs require defining a clear decision context that asks respondents to consider self-reported or a standardized health baseline. Standardized baselines fix the context and help ensure trade-off plausibility, but also presents a double hypothetical to respondents as they make their choices. Our objective was to compare, at the respondent level, how treatment preferences varied when respondents used their own baseline versus more severe baselines. Methods: A DCE survey was administered to patients and parents of minors (\ 18) with sickle-cell disease to evaluate willingness to pursue gene therapy. Respondents answered choice questions for both their self-reported baseline and an assumed, more severe, baseline. Abstracts Latent classes were identified by cohort and baseline status. Changes in individual respondent's class-membership probabilities by actual and hypothetical baselines were used to evaluate how baseline framing affected preference estimates. Results: 174 patients and 109 parents completed the survey. Multiple classes were identified across cohorts by baseline. Class allocation was above 93% for all classes and progressed unevenly with standardized baselines. The relative value of opting out of gene therapy changed with baselines, yet this value varied less across self-reported baselines. A significant difference in preferences found between patients with moderate symptoms and those assuming moderate symptoms was not observed among parents. Conclusions: The analysis of class progression can be an effective tool to evaluate how individual respondents react to changes in choice contexts. Our results indicated that respondent choices accounted for varying baselines. Adaptive behaviors seem plausible given unchanged values for opt out among some respondents with current mild symptoms and some with current moderate symptoms. Overall, we found some support for the use of standardized baselines to represent clinically-relevant choice contexts. Background: There is a growing interest in quantifying the degree of heterogeneity in stated preferences for health. A popular investigation into preference heterogeneity involves split-sample analysis to make comparisons across subgroups. However, subgroups may differ in many observed characteristics. Not accounting for these other characteristics may bias comparisons if these are also associated with preferences. This study explores matching and weighting approaches to identify differences in preferences. Methods: We compare simulated stated preferences of patients and the public for a hypothetical healthcare intervention, where patients are older and have lower household income. The utility function for both is specified to be identical (preference homogeneity) and utility is assumed to increase with health and life years, and decrease with risk and cost. Utility for cost is specified as a function of income and age. We conduct unmatched, propensity score-matched, and entropy balanced analyses. Results: Due to differences in age and income, unmatched analysis detects statistically significant differences in the preference for cost when comparing the public's preferences with those of patients. Both propensity score matching and entropy balancing reduce imbalance in the individual characteristics across subgroups, although the reduction is greater when using entropy balancing. Following matching or weighting, there are no significant differences in the preference weights for any attributes. Conclusions: Unweighted and unmatched analyses may produce erroneous conclusions regarding heterogeneity in preferences when making comparisons across subgroups. Matching and weighting methods may be useful for researchers seeking to compare preferences for health and health care when there are too many characteristics to feasibly incorporate with interaction terms. Elements of psychophysics, 1860 Are QALYs an appropriate measure for valuing morbidity in acute diseases? Conjoint-analysis QALYs for acute conditions Evaluating migraineurs' preferences for migraine treatment outcomes using a choice experiment Quality-Adjusted Life-Years without Constant Proportionality