key: cord-0953285-2z75n8j0 authors: Craig, Benjamin Matthew; de Bekker-Grob, Esther W.; González Sepúlveda, Juan Marcos; Greene, William H. title: A Guide to Observable Differences in Stated Preference Evidence date: 2021-10-26 journal: Patient DOI: 10.1007/s40271-021-00551-x sha: 0865af7f059834b9eeea5649cd2c9143635c23e8 doc_id: 953285 cord_uid: 2z75n8j0 BACKGROUND AND OBJECTIVE: In health preference research, studies commonly hypothesize differences in parameters (i.e., differential or joint effects on attribute importance) and/or in choice predictions (marginal effects) by observable factors. Discrete choice experiments may be designed and conducted to test and estimate these observable differences. This guide covers how to explore and corroborate various observable differences in health preference evidence. METHODS: The analytical process has three steps: analyze the exploratory data, analyze the confirmatory data, and interpret and disseminate the evidence. In this guide, we demonstrate the process using dual samples (where exploratory and confirmatory samples were collected from different sources) on 2020 US COVID-19 vaccination preferences; however, investigators may apply the same approach using split samples (i.e., single source). RESULTS: The confirmatory analysis failed to reject ten of the 17 null hypotheses generated by the exploratory analysis (p < 0.05). Apart from demographic, socioeconomic, and geographic differences, political independents and persons who have never been vaccinated against influenza are among those least likely to be vaccinated (0.838 and 0.872, respectively). CONCLUSIONS: For all researchers in health preference research, it is essential to know how to identify and corroborate observable differences. Once mastered, this skill may lead to more complex analyses of latent differences (e.g., latent classes, random parameters). This guide concludes with six questions that researchers may ask themselves when conducting such analyses or reviewing published findings of observable differences. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40271-021-00551-x. Health preference research (HPR) refers to any investigation dedicated to understanding the value of health and healthrelated alternatives using observational or experimental methods [1] . In HPR, investigators conduct discrete choice experiments (DCEs), randomizing different choice sets to different individuals across multiple tasks to test hypotheses about the value of health and health-related alternatives [2] . Specifically, analyses of stated preference evidence can quantify the effects of the alternatives' attributes on preferential choice behaviors (i.e., attribute importance) [3, 4] . Currently, however, a clear guidance is lacking about how to examine differences in attribute importance by observable factors. For example, a study might examine how individuals with different observable characteristics (e.g., age) may make difference choices or how tasks with different observable characteristics (task sequence) may elicit different choices. The objective of this paper is to provide guidance on how to explore and corroborate various observable differences in health preference evidence generally. To help readers, the guide introduces a worked example, including its decision context, definitions (see bolded terms and Glossary), and an overarching analytical process. For a more extensive coverage of choice modeling, we recommend the textbook Applied Choice Analysis by Hensher, Rose, and Greene [5] . As a worked example for this guide, we examined secondary data from the 2020 US COVID-19 vaccination preferences (CVP) study [6] . In brief, the 2020 US CVP study included a DCE with eight choice tasks as well as four kaizen tasks. The choice sets in each of the eight choice tasks (Fig. 1) included an opt-out ("no vaccination for six months") and three vaccination alternatives described using five attributes (see Glossary for their definitions): 1. Proof of vaccination (two nominal attribute levels): (1) Vaccination card; (2) No vaccination card; 2. Vaccination setting (two nominal attribute levels): (1) Medical setting; (2) Community setting; 3. Vaccine effectiveness (two ordinal attribute levels): (1) 70%; (2) 50%; 4. Duration of immunity (two ordinal attribute levels): (1) 6 months; (2) 3 months; 5. Risk of severe side effects (four ordinal attribute levels): (1) 1 per 1,000,000; (2) 1 per 100,000, (3) 1 per 10,000, (4) 1 per 1000. Overall, the 2020 US CVP descriptive system delineated 64 vaccination alternatives (2 4 × 4; i.e. all possible combinations of attribute levels). Based on the values from the initial analysis and for simplicity of presentation [6] , each alternative may be expressed as a profile of attribute levels from the best (11111) to the worst (22224) vaccination. Between the 9th and 12th of November 2020, the 2020 US CVP study recruited an exploratory sample of US adults from a marketing panel (Dynata®; 1153 respondents) and a confirmatory sample via crowdsourcing (Mturk®; 912 respondents). These surveys occurred simultaneously, prior to US approval of any vaccines, but after clinical trial results were announced, promoting vaccination efficacy and safety [7] . Further details on the 2020 US CVP, including its study protocol and experimental design, have been published elsewhere [6, 8] . As a worked example, this guide shows how the value of the COVID-19 vaccinations (described using five attributes) differs systematically by nine observable factors in concordance with random utility theory. Under random utility theory, each individual i ∈ [1, … , N] and each alternative j ∈ [1, … , J] has a utility U ij such that U ij = V j + ij [9] . In the worked example, we normalize each utility U ij by subtracting the utility of opt-out ("no vaccination for six months") and assume that the random terms ij are distributed as type I extreme values [10] . Therefore, the systematic component V j represents the value of a COVID-19 vaccination relative to no vaccination, and the probability function of vaccination choice j by individual i is a conditional logit, Pr [5] . Because of the normalization, the utility of the opt-out is zero by construction, which serves as a reference for the alternatives (e.g., negative values imply being worse than "no vaccination for six months"). The value of each vaccination alternative V j represents the willingness of individual i to be vaccinated with alternative j. In this worked example, the value V j was approximated by subtracting main-effect coefficients k from the value of the best vaccination (11111). Each of the seven maineffect coefficients k represents a loss in value attributed to a worse level (i.e., attribute importance) [3] . The first four attributes have two levels, each representing a loss ( 1 , 2 , 3 , and 4 ), and the fifth attribute has four levels, representing up to three losses ( 5 , 6 , and 7 ). For example, imagine the worst vaccination in the CVP descriptive system (22224). A more general value specification using alternative specific constants (ASCs) is shown in the Electronic Supplementary Material (ESM), Online Resource 1. In a DCE, each respondent completes multiple choice tasks t ∈ [1, … , T] and their error terms may therefore be correlated. In the worked example, the parameters of V (i.e., , ) were estimated using evidence on preferential choice behaviors y ij by maximum likelihood with respondent-specific clusters [4] . In other words, choice defines value [11] . In econometric notation, the hat symbol on a parameter indicates an estimate, such as ̂ , as opposed to its true value , which can be hypothesized but cannot be measured perfectly. When describing the estimation results, the constant ̂ and main-effect coefficients ̂ are known as fixed effects because each estimate is fixed across respondents and represents a causal relationship between the alternatives and preferential choice behaviors [5] . Each fixed effect may also be expressed by its effect on choice predictions p (i.e., marginal effect). An observable difference is an estimated relationship between an observable factor Z and a fixed effect (e.g., ̂ ,̂ ) and represents evidence of preference heterogeneity (i.e., , |Z ). For example, does the value of the best COVID-19 vaccination (11111) differ by age and sex? A future guide may examine observable differences in the proportional magnitude of all attributes (i.e., scale) or the ratio of two fixed effects (e.g., willingness to pay is the ratio of a fixed effect and the fixed effect of out-of-pocket price). In this guide, each observable factor is categorical and measured explicitly, leaving little ambiguity about the groups. Although the measurement of observable factors is straightforward, the measured relationships between these factors and fixed effects depend in part on the factor distribution (e.g., multicollinearity, micronumerosity). Online Resource 2 describes a known-groups analysis that assesses the relationship between group size and statistical power given the intended effect size. Latent factors, such as respondent attitudes, are not directly observable or reportable without the use of instruments that approximate their magnitude; therefore, the relationship between a latent factor and a fixed effect cannot be assessed directly. Estimations of latent differences may fail because of a measurement error of the latent factor or a lack of clarity in its definition. For example, a respondent's attitude may extend beyond a positive-negative scale to be multidimensional characterizing affect, behavior, and cognitive aspects [12] . While errors may occur in objective measurement, they are more common in subjective measurement [13] . Compared to observable factors, the models of latent factors, such as risk perception, vary by purpose and context and the groupings may seem vague. Future guides in HPR may cover how to examine differences by latent factors, such as latent classes, random effects, and correlated errors (that are violations of independence from irrelevant alternatives), and test for latent differences [5] . To explore and corroborate observable differences in stated preference evidence (Fig. 2) , this guide introduces a three-step analytical process, starting with an exploratory analysis that generates hypotheses to be tested using the confirmatory sample. Based on the study protocol, we intended to recruit 1000 respondents for each sample and field identical surveys simultaneously from two sources to avoid temporal and single-source effects. Using nine observable factors taken from the worked example, we demonstrate the exploratory-confirmatory process (Fig. 2 ) and discuss its merits and limitations for future research. A typical analysis of preference heterogeneity explores multiple factors, which confounds the classical interpretation of statistical uncertainty (e.g., p values, 95% confidence interval). When multiple models are estimated, the significance of any one parameter is unknown because some spurious relationships may appear significant by chance, and other substantive relationships may be hidden [14] . Presenting significant p values under one of the multiple exploratory models is known as cherry picking because such "cherries" can give a false impression of statistical inference [15] . Likewise, not controlling for a relationship in a model because its p value is slightly higher than a pre-defined threshold can lead to the omission of a substantive relationship for which the experiment was just not powered. Instead of picking cherries, an exploratory analysis can generate hypotheses to be tested using a confirmatory sample. In this worked example, two samples were collected simultaneously from different sources so that the hypotheses generated using the exploratory sample could be tested using the confirmatory sample, potentially inferring observable differences (i.e., dual-sample process). As described in the Acknowledgments, the study design and hypotheses of this worked example were distributed to colleagues prior to the confirmatory analysis. Alternatively, a study may register their exploratory results and hypotheses on the Health Preferences Study and Technology Registry (hpstr.org). Instead of using a dual-sample process, a researcher may split a sample from a single source into two sub-samples (exploratory and confirmatory); however, evidence from a split-sample process may be contaminated by the same sampling biases inherent to the single source. Obviously, it is easier to predict choices from the originating source than from an external source. Likewise, the dual-sample process implies that the results may or may not differ by source because of a sampling bias inherent to each source. Regardless of whether the process is completed using two sources or a split sample, it is important to compare the characteristics of the exploratory and confirmatory samples. Collecting separate exploratory and confirmatory samples alone may not prevent biases in statistical inference. When researchers change their hypotheses based on confirmatory results, this again gives a false impression of statistical uncertainty (i.e., "the tail that wags the dog"). The fact that the model and hypotheses must be stated clearly prior to statistical inference is not specific to HPR [1] . To avoid such biases, the exploratory results (Table 1 ; Online Resources 3a and 4a) were distributed to various colleagues (see "Acknowledgments") prior to conducting the confirmatory analysis (Online Resource 4b). Alternatively, the exploratory results may be published in a peer-reviewed journal or registry. Simply placing the results online is not sufficient because these may be changed at any time and without notice. As described in the analytical process (Fig. 2) , we first explore the results by strata starting with a known-groups analysis, separating the sample by known groups and assessing the relationship between group size and statistical power given the p value (0.05) [Online Resource 2] . From its findings, we inferred that 84 respondents per group is sufficient to identify observable differences when present for this specific model. Assuming that each group (or stratum) is of sufficient size, the exploratory analysis proceeds by estimating the fixed effects by each stratum of the observable factor, ̂ ,̂ |Z . Like a grid search, stratification (1a) "casts a wide net" to systematically explore potential differences by each observable factor. In this exploratory analysis, the stratified results were simplified using two identification thresholds (1b) followed by a joint Wald test. To identify potential differences in the fixed effects, we conducted a Wald test for each parameter and assessed whether its p value is less than 0.05. Unlike the likelihood ratio or Lagrange multiplier tests, Wald tests account for individualspecific clusters within the panel data [5] . To further assess the magnitude of the observable differences, we calculated the marginal effects (i.e., effect of attributes on choice predictions) by strata and assessed whether their range (i.e., the maximum effect minus the minimum effect among the strata) is greater than 0.05. Do the marginal effects differ substantively? Some observable differences may be statistically significant but have little influence on the choice predictions, and it is prudent to focus on the meaningful differences. In the worked example, evidence that an observable difference passed these two identification thresholds (p < 0.05, range > 0.05) generates parameter-specific hypotheses (1c) for the confirmatory analysis (Table 1 ; Online Resource 4a). The selection of these two identification thresholds (or any alternative other threshold) was arbitrary and useful. A well-performed exploratory analysis can aid in informing the efficient allocation of scarce scientific resources by identifying potentially significant and meaningful relationships for further investigation. Apart from these hypothesized relationships, a researcher may conduct a joint Wald test, testing whether all parameters are identical across strata simultaneously (p < 0.05). Unlike the parameter-specific tests and their two identification criteria, a significant p value on a joint test does not generate a hypothesis regarding an observable factor. However, an insignificant p value may generate a hypothesis of no differences by the observable factor (1c), which is demonstrated in this worked example. The confirmatory analysis begins by comparing the exploratory and confirmatory samples (Online Resource 5) and estimating the differences in fixed effects using interactions (2a), instead of stratification (Table 2b , Online Resource 4b). As part of the confirmatory interaction analyses, we conducted a Wald test for each hypothesis, potentially corroborating an observable difference (2b). In addition to hypothesis testing, we compared the exploratory and confirmatory estimates (2c) to aid their interpretation. Next, we conducted the stratified analyses using the confirmatory sample (Online Resource 3b) and tested the hypotheses of no differences (Online Resource 2b). The stratified analyses may corroborate the absence of any observable differences or generate new hypotheses for further study. Once corroborated, each observable difference (or its absence) was interpreted and disseminated. A differential effect is an observable difference that is associated with an observable factor, such as respondents' age. A joint effect is an observable difference caused by an interaction of two or more randomized factors, such as task sequence. Differential effects may imply preference heterogeneity (e.g., differences in preference between groups), and joint effects may indicate a loss in internal validity, motivating improvements in experimental methods. In a DCE, experimental factors (unrelated to the alternatives and decision context) may be randomly assigned to respondents and interacted with the indicators of specific alternatives or attribute levels to estimate joint effects. In the worked example, respondents were randomly assigned to experimental designs (random, generator developed, efficient), task sequences (first to eighth), object positions (left-middle-right), and attribute orders (first to fifth). When such experimental factors influence choices, the observable differences are unrelated to preference heterogeneity. Even when corroborated, the relevance of an observable difference depends largely on the range of marginal effects, namely how much the factor influences the choice predictions (also known as effect size or magnitude). In the exploratory analysis, the second identification criterion is based on the range of marginal effects. Now that the observable effect is corroborated, these marginal effects have more practical implications. To better understand their relevance, the observable differences were ranked by the range of marginal effects and summarized for their broader implications in future research. In concordance with Fig. 2 , we first conducted stratified analyses using the exploratory sample for all observable factors included in the 2020 US CVP survey instrument (1a). The full exploratory results of the stratified analyses are provided in Online Resource 3a. Only nine of the 14 analyses generated parameter-specific hypotheses based on the two identification thresholds (1b). Specifically, the analyses of nine observable factors (Fig. 2 ) generated 17 parameter-specific hypotheses (H01-H17) using the exploratory sample. Next, we conducted nine interaction analyses using the exploratory sample (1b), one for each of the nine observable factors in Fig. 2 . Its full results are provided in Online Resource 4a; however, Table 1 shows just the estimates of observable differences ̂ ,̂ |Z and the predictions (P01-P17) related to the 17 hypotheses (H01-H17). Among these hypotheses (1c), seven describe a relationship between the value of the COVID-19 vaccination and the observable factor Z . The other ten hypotheses describe a relationship between the main-effects coefficients and the observable factor Z . Table 1 also shows ̂ as a choice prediction p and each ̂ as a marginal-effect percentage. Furthermore, the joint test results of the stratified analyses (Online Resource 3a) generated two hypotheses (H18-H19; 1c): no differences by US census region (Northeast, Midwest, South, West); no differences by marital status (Married or separated, Never married, Divorced, Other). The stratified results of the remaining three factors did not generate any hypotheses (i.e., race and ethnicity, task sequence, and object position) but may motivate further exploration. For example, if an observable factor is related to differences in scale (i.e., heteroskedasticity), the Wald tests of specific parameters may be insignificant, but the joint test may be significant. We first compared the exploratory and confirmatory samples (Online Resource 5) then conducted the confirmatory analysis. Table 2 shows the confirmatory results (2a), replicating the exploratory interaction analyses ( Table 1 ). The full results of interaction and stratified analyses using the confirmatory sample (2b) are included in Online Resources 4b and 3b, respectively. Next, we highlight three key findings (2c) that were hypothesized by the exploratory analysis and corroborated by the confirmatory analysis. First, vaccination uptake is associated with respondent demographics and socioeconomic status (SES). Predicted uptake is significantly lower for persons of age 35-54 years, who reside in rural communities, with only a high school degree or less, and/or lower household income. The fact that age, sex, and SES are associated with lower uptake may not be surprising to some; however, it is noteworthy that these associations were corroborated, and the associations with race and ethnicity were not. The relationship with employment status was not corroborated, but this may be because of greater homogeneity in the confirmatory sample owing to its recruitment through crowd sourcing, instead of a marketing panel. Second, vaccination uptake is strongly associated with self-reported respondent behaviors, namely influenza vaccination and being unaffiliated with either political party. The associations between uptake and observable behaviors may be derived from a common source, for example, some persons who are immune to the influence of political or public health authorities (i.e., naysayers) may be reluctant, regardless of the vaccination's attributes. We did not control for demographics or SES in the estimation of these behavioral associations, which may diminish after taking them into account. Third, the confirmatory analysis found little evidence that corroborates heterogeneity in any of the main-effect coefficients . Effectiveness 3 is lower among persons who reside in urban areas compared with other areas. Influenza vaccination is associated with the effects of both safety and efficacy ( 7 and 3 ), such that persons who were asked to be vaccinated care less about its merits than others. For example, healthcare professionals (and others asked to be vaccinated against COVID-19 by their employers) might care less about its merits. The rest of the hypotheses on main effect coefficients were not confirmed, but worth further investigation. In this worked example, only ten of the 17 hypothesized differences (H1-H17) were corroborated (p < 0.05) and each represented a differential effect. This analysis did not confirm any joint effects that would suggest a lack of internal validity. We also did not find differences by US census region (H18: p = 0.18), but we found differences by marital status (H19: p < 0.001), which may be tested in a future study. The stratified analyses generated other new hypotheses: based on the two identification criteria, main-effects coefficients for safety and effectiveness ( 7 and 3 ) may be associated with each of the five respondent characteristics as well as the two behavioral factors (influenza vaccination and political party affiliation). Overall, the worked example demonstrated three key findings about the heterogeneity in US COVID-19 vaccination preferences. The first result on demographics and SES may help target outreach programs, for example, engaging school boards and other organizations active in rural communities (3a). Although the historical disparities by race and ethnicity merit recognition, they are not associated with differential effects in either the exploratory or the confirmatory results. Instead, programmatic resources may be directed to address disparities related to SES more generally. In the worked example, political independents and persons who have never been vaccinated against influenza are among those least likely to be vaccinated (0.838 and 0.872, respectively; 3b). In response, the authors believe that the Centers for Disease Control and Prevention might create more educational programs that target groups with a high concentration of registered independents or reduced flu vaccinations (e.g., college campuses, US states like Alaska and Maine). This targeting may be particularly relevant in preparation for the 2021-2022 influenza season. How useful is such a 2020 study when preferences on COVID-19 vaccination will likely change over time [7] ? Temporal confounding motivated the simultaneous collection of the exploratory and confirmatory data in the worked example; however, it also implies that the evidence may not be generalizable to 2021 because vaccination preferences could have shifted as the context of the pandemic has evolved and people have more exposure to outcomes of COVID-19. If temporal confounding was not present, the confirmatory study would have been designed to test the hypotheses generated during the analysis of the exploratory sample. Each study team must assess its own temporal confounding as well as the wisdom of allocating scarce resources toward a simultaneous or subsequent confirmatory study. The interpretation of observable differences, like these, may seem transparent, but they can also be overly simplistic and misleading. For example, heterogeneity in main-effect coefficients is not the same as heterogeneity in marginal effects because a marginal effect summarizes both the coefficient and the constant. The observable differences in main effects are usually found to be less meaningful than differences in the marginal effects. Likewise, an analyst may care about relative effects (i.e., ratios of attribute importance), such as willingness to pay or maximum acceptable risk. In some cases, attribute importance estimates may vary by an observable factor, but their ratio does not. Furthermore, interactions imply an independence of the observable factors; however, these factors are likely correlated (e.g., SES and naysayer behaviors) because of a latent process. More advanced analysts in HPR may skip the estimation of observable differences and proceed directly to more complex methods that account for preference heterogeneity, such as random parameters or latent classes. For example, the proposed analytical process (Fig. 2) does not attempt to separate taste and scale heterogeneity [16] . When conducting exploratory and confirmatory studies in HPR or any other field, trust and order matters. You must trust that the study team followed its protocol, particularly the order of analyses. If we were to have used switched the order of the samples (i.e., recruited the exploratory sample via crowd sourcing and the confirmatory sample using the marketing panel), the conclusions would have been different. However, such a switch based on the results is not appropriate (i.e., tail that wags the dog). Overall, the primary limitation of the worked example is its sampling frames. The exploratory sampling frame was a marketing panel and the confirmatory sampling frame was a crowd-sourcing vendor. Both tend to list more educated respondents who have means to participate in online surveys, which is not generalizable to the US general population. Any inference on observable differences must account for this sampling frame bias in its interpretation of the preference evidence. Although we could have re-weighted the results to imply gains in representativeness, this subterfuge may exacerbate existing biases. It is better to recognize the limitations of crowd sourcing, which may not be the best source to confirm results from a marketing panel. This guide describes how to identify and corroborate observable differences using dual samples, which may or may not be affordable in other investigations. Unless underpowered (see Online Resource 2), every health preference study can split its sample and follow the analytical process described by the guide. Although advanced analyses of latent classes and random parameters are welcome, this analytical process (Fig. 2) provides a more principled approach to examining preference heterogeneity based on observable factors. When conducting such analyses or reviewing published findings of observable differences, researchers may consider the following questions: 1. How were the observable differences specified (e.g., statistical power)? 2. How were the observable differences estimated (e.g., a single interaction)? 3. How were the observable differences corroborated (e.g., dual or split samples)? 4. To aid interpretation: a. Was the observable factor randomized (differential vs joint effects)? b. Were the changes in the choice predictions meaningful (marginal effects)? c. What are the implications of these findings for future research? Alternative-specific constant (ASC) A parameter representing the value of a specific object. Attribute importance A parameter representing the value of an object's attribute (dummy) or difference in attribute level (incremental) [3] . Cherry picking The presentation of significant p values under one of the multiple exploratory specification which can give a false impression of statistical inference [15] . Choice defines value The parameters of a value function V are estimated using empirical evidence on preferential choice behaviors y ij [11] . Differential effect An observable difference that is associated with an observable factor. Discrete choice experiments (DCEs) An experiment that randomly assigns different choice sets to different individuals to test hypotheses. Dual-sample process The use of samples from two different sources for exploratory and confirmatory analyses. Fixed effect A fixed parameter representing a causal relationship between alternatives and the preferential choice behaviors y ij . Health preference research (HPR) Any investigation dedicated to understanding the value of health and health-related alternatives using observational or experimental methods [1] . Interaction The product of two or more independent variables. Joint effect An observable difference caused by an interaction of two or more randomized factors. Known-groups analysis An analysis that separates a sample into groups known to have observable differences. A known-groups analysis is often conducted to assess whether pre-determined differences are observed under a variety of constraints (e.g., sample sizes). Likewise, an unknown-groups analysis separates a sample into groups without known differences to assess whether pre-determined differences are absent under a variety of constraints. Each analysis may identify potential causes of spurious results (e.g., a lack of statistical power). Latent difference A relationship between a latent factor and a fixed effect that represents a specific form of preference heterogeneity (e.g., ASC by risk perception class). Latent factor A categorical variable that is not directly observable or reportable without the use of instruments that approximate their magnitude; therefore, the relationship between a latent factor and a fixed effect cannot be assessed directly. An observable difference in choice prediction. A relationship between an observable factor and a fixed effect that represents a specific form of preference heterogeneity (e.g., ASC by age group). Observable factor A categorical variable that is measured explicitly, leaving little ambiguity about the groups; therefore, the relationship between an observable factor and a fixed effect may be assessed directly. The probability that a test will correctly reject a false null hypothesis. For a known-group analysis (Online Resource 2), 21 respondents per block was sufficient to identify the observable differences (p < 0.05) in over 80% of the bootstrap iterations. Preferential choice behaviors A behavior y ij that resolves ambiguity in preferences between objects in a set (i.e., choice set) [4] Random utility theory Each individual i ∈ N and each alternative j ∈ J has a utility U ij such that U ij = V j + ij , where V j are the alternatives' values and ij are errors clustered by individual [9] . Relative attribute importance A ratio of two fixed effects where each represents attribute importance (i.e., the importance of one attribute relative to another). Scale heterogeneity A relationship between an observable or latent factor and the scale parameter, representing the proportional magnitude of all fixed effects. Split-sample process The separation of a sample from a single source into two sub-samples for exploratory and confirmatory analyses. The interaction between all variables with the same observable factor simultaneously, inherently separating the sample into groups (i.e., strata). The tail that wags the dog The practice of changing hypotheses based on confirmatory results that can give a false impression of statistical inference [15] . Health preference research: an overview Discrete choice experiments in health economics: past, present and future A guide to measuring and interpreting attribute importance A theory of data Applied choice analysis: a primer United States COVID-19 vaccination preferences (CVP): 2020 hindsight COVID-19 health preference research: four lessons learned. ISPOR Value Outcomes Spotlight QALYs for COVID-19: a comparison of US EQ-5D-5L value sets A theory of individual choice behavior Conditional logit analysis of qualitative choice behavior Choice defines value: a predictive modeling competition in health preference research Chapter Five: the ABC of ambivalence: affective, behavioral, and cognitive consequences of attitudinal conflict Psychometric theory Upper Saddle River: Pearson/Prentice Hall Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations Key issues and potential solutions for understanding healthcare preference heterogeneity free from patient-level scale confounds We thank Catharina (Karin) G.M. Groothuis-Oudshoorn, Deborah Marshall, Semra Ozdemir, and Joffre D. Swait, who can attest that the authors sent their hypotheses and predictions to them on 22 March, 2021, declaring them openly on this date in anticipation of a future confirmatory analysis.Author contributions As part of the original study, BMC developed the research questions, methodology, and design of the study; executed the protocol; and published the primary paper. For this extension, BMC, EWBG, JMGS, and WG conducted the secondary data analysis, interpreted the results, and wrote the final manuscript. The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s40271-021-00551-x. Funding Benjamin M. Craig provided all financial support for the project. The authors have no conflicts to disclose.Ethics approval Not applicable. Availability of data and material The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. The code generated used in the analysis of the study datasets is available from the corresponding author on reasonable request.