key: cord-0314263-rvpi5qt5 authors: Wright, D.; Wolff, S. M.; Jaspal, R.; Barnett, J.; Breakwell, G. title: The Choice of Response Alternatives in COVID-19 Social Science Surveys date: 2022-01-25 journal: nan DOI: 10.1101/2022.01.24.22269741 sha: ab05fcf167064a796d14761bdb54a983d5e835ad doc_id: 314263 cord_uid: rvpi5qt5 Social science research is key for understanding and for predicting compliance with COVID-19 guidelines, and much of this research relies on survey data. While much focus is on the survey question stems, less is on the response alternatives presented that both constrain responses and convey information about the assumed expectations of the survey designers. The focus here is on the choice of response alternatives for the types of behavioral frequency questions used in many COVID-19 and other health surveys. We examine issues with two types of response alternatives. The first are vague quantifiers, like "rarely" and "frequently." Using data from 30 countries from the Imperial COVID data hub, we show that the interpretation of these vague quantifiers (and their translations) depends on the norms in that country. If the mean amount of hand washing in your country is high, it is likely "frequently" corresponds to a higher numeric value for hand washing than if the mean in your country is low. The second type are precise numeric response alternatives and they can also be problematic. Using a US survey, respondents were randomly allocated to receive either response alternatives where most of the scale corresponds to low frequencies or where most of the scale corresponds to high frequencies. Those given the low frequency set provided lower estimates of the health behaviors. The choice of response alternatives for behavioral frequency questions can affect the estimates of health behaviors. How the response alternatives mold the responses should be taken into account for epidemiological modeling. We conclude with some recommendations for response alternatives for health behavioral frequency questions in surveys. come to believe that the behavior is common. Another possibility is that the response 97 alternatives could change what the respondents think the target event is. Wright, Gaskell, 98 and O'Muircheartaigh (1997) describe how this can occur for vague and ambiguous terms. 99 For example, they asked respondents how often their teeth were cleaned either with response 100 alternatives suggesting this meant by a dentist or with response alternatives suggesting this Finally, respondents can interpret any set of alternatives as a scale from low to high, 109 ignoring the particular words used to compose the scale. This will be more likely when the 110 response alternatives are vague. In these cases it is unclear what question they answer: how 111 much they engage in the behavior compared with others; with their expectations; with their 112 behavior before the pandemic; etc. This is discussed further at the end of the paper when we 113 make recommendations for the choice of response alternatives. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint The response alternatives for behavioral frequency questions generally have one of 115 five formats (more elaborate approaches exist, for example having respondents list the 116 behaviors on a calendar and some of these are discussed in the recommendations section, e.g., 117 Schatz, Knight, Belli, & Mojola, 2020), each with its own concerns: 118 1. Free recall. Respondents provide a numerical estimate, often for a specific duration. 119 This can make difficult memory demands for high frequency behaviors. Respondents 120 often use rough heuristics to guestimate the frequency of the behavior. If they try to 121 recall every incident this tends to lead to under-reporting. Another issue is that 122 respondents often round their responses, giving response prototypes (Huttenlocher, 123 Hedges, & Duncan, 1991). happened, or if it happened since some memorable event (e.g., Loftus & Marburger, 126 1983). Time since an event can be used to estimate the event frequency. The difficulty 127 is people have difficulty remembering when events occur and tend to forward telescope 128 these dates to be more recent than is accurate (e.g., Neter & Waksberg, 1964 ; 129 Thompson, Skowronski, & Lee, 1988) . and mutually exclusive numeric response categories. The concern here, examined in 138 Study 2, is how the choice of these sets can affect how the respondent answers the 139 question and therefore the study's results. 140 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. The vague quantifier questions are not directly comparable to the free recall question 188 as the latter combines the two behaviors asked about in the vague quantifier questions. We 189 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint using ln(x + .5) (the +.5 as some people, 1.22%, said zero) and this lessened the skewness to 194 0.14. The associations between the vague quantifier responses and the transformed numeric Boxplots for the transformed numeric response from the free recall question for the two vague quantifier questions. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint transformed numeric responses and the vague quantifiers. Given the large sample size, the 202 25% and 75% are sufficient for our purposes. Since there are separate vague quantifier 203 questions for hand washing and hand sanitizing, these will be examined separately. The hand washing vague quantifier variable is treated as categorical, so with df = 4 205 for its five categories. When it is used to predict the transformed variable from the free 206 recall question, the R 2 value was .119. Including the categorical variable of country, with its The findings were similar comparing the hand santizing vague quantifier variable 217 with the transformed free recall responses. When just the vague quantifier variable is used to 218 predict the these the R 2 value was .081. Including the categorical variable of country 219 increased this to .143, and difference of ∆R 2 = .058. Using the single country mean variable 220 produced an R 2 = .140, or 93.97% of the possible amount (as opposed to 3%). This is shown 221 in Figure 4 with the greener lines being below the redder lines. In Study 2 we examine the relationships between response alternatives effects and 223 some attitude questions. Therefore we felt it prudent to examine if an attitude variable, CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. Cantril's ladder and each vague quantifier question raised these to R 2 = .124 and to 231 R 2 = .090, respectively. These increases are small and will not be considered further. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. may talk about the study before the second completes it (and would in other ways also be 280 non-independent), the duplicates (i.e., not the first one using the IP address) were excluded. Respondents who on average responded faster than two seconds for the behavioral frequency 282 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint and attitude questions were also excluded. The number of people excluded for these reasons, 283 in both conditions, is shown in Table 3 . Respondents were asked three behavior/health questions related to COVID-19: • How often did you wash your hands? • When you washed your hands, typically how long did you spend? • In a typical day, how often did you apply hand sanitizer? They were randomly assigned to have either the low or the high response alternatives as 290 listed in Table 4 . Respondents were then asked seven attitude questions are about their health beliefs. Respondents used a 0--100 sliding scale. Responses were measured to the tenth, for example 293 29.4. Our analyses concerning these variables are exploratory and concern whether the set of 294 response alternatives that were presented affects the responses on these variables. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint Table 4 The response alternatives for the low and the high frequency conditions. The dashed lines show how the raw data can be re-coded for comparable frequencies/durations. Low High Hand washing frequency . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint Statistical Plan 316 The behavioral frequency questions are treated in two ways. When treating them as 317 1--5 rating scales, the means of these three are compared between the two groups using Figure 5 shows the histograms for the behavior questions when treated as 1--5 rating 340 scales. Table 5 shows the group means, the differences in means, effect sizes for these 341 differences (Cohen's d), and the 95% confidence intervals for these effect sizes. The values of 342 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. Table 4 . Table 6 shows the proportions for each of these categories for the two conditions. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint Table 5 Descriptive statistics for the raw values, 1--5, for the three behavior answers. This coding ignores the verbal labels, so these differences show some people pay attention to the labels. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint The re-coded behavior questions are analyzed individually using multinomial logistic is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint correlations of these transformed variables are shown in Table 7 . To increase statistical power, the pre-registered plan was if the items were correlated 398 to combine them into a single dimension. The eigenvalues for the correlation matrix are: using vague quantifiers is difficult, but differences among the countries can almost 412 completely be accounted for by the norms in that country. This provides strong support for 413 the hypothesis put forth in Wright et al. (1994) . While this stresses a difficulty making these 414 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. Behavioral frequency questions require respondents to answer questions about past 438 events. We make recommendations for two types of behaviors: rare and frequent. Whether 439 an event type is rare or frequent will depend on the sample and there will be overlap between 440 them, so survey designers should consider all the suggestions below where appropriate. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint 442 Consider what is hopefully a rare event for the respondent, like a hospitalization, 443 catching COVID-19, or being laid off. It is likely for most respondents these are rare and it 444 is also likely that answers to these questions will be important both for the survey flow (e.g., 445 on a COVID-19 survey if you answer YES to having COVID-19, you might be asked further 446 questions), and the estimates for these will be important for epidemiological models. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint Frequent Events 467 By frequent events we mean those that, for most respondents, are likely to occur 468 multiple times each week (e.g., handwashing, eating a piece of fruit). As with rare events it is 469 important to consider the cognitive limitations of the respondent, and in particular whether 470 the respondent is likely to use some estimation heuristic or try to recall and count all 471 episodes. Each of these has potential biases (e.g., people are more likely to not remember an 472 event than to create a false memory for a non-existent event, e.g., Wright, Loftus, and Hall, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. for sociologists and psychologists constructing theories of why people behave in they ways 512 that they do, and for many other purposes. Behavioral frequency questions have a special 513 place within survey methods as researchers often translate responses into numeric estimates 514 for the behaviors, and sometimes the precision of these estimates is critical (e.g., to 515 epidemiologists predicting trends for the COVID-19 pandemic). While people often focus on 516 the way the event itself is described in the question stem, there is less focus on the response 517 alternatives. We focus on the response alternatives. Our first study examined how people, across thirty countries, answered questions 519 about hand washing and hand sanitizing. We found that people in different countries 520 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint interpreted the vague quantifiers used as response alternatives differently. We were able to 521 account for most of the differences among countries using estimates of the behavior in the 522 countries from a different set of respondents. We do not recommend using vague quantifiers 523 with relatively well defined events like hand washing, but allowing people to provide 524 numerical estimates. Our second study showed that care is still necessary when doing this. Using different sets of response alternatives produced different estimates of the behavior. One consequence of this is that comparisons between studies that use different sets of 527 response alternatives should be done cautiously, if at all. One conclusion from our studies is that the choice of response alternatives should be 529 carefully considered and the deciding how to construct them may be difficult. It may require 530 careful pilot research and techniques like cognitive interviewing and in particular think-aloud 531 protocols (Willis, 2005) . We provide a list of recommendations to allow researchers to start 532 thinking about their choices of response alternatives for behavioral frequency questions. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 25, 2022. ; https://doi.org/10.1101/2022.01.24.22269741 doi: medRxiv preprint How to determine the number of factors to retain 535 in exploratory factor analysis: A comparison of extraction methods under realistic Public health surveillance and knowing about health in 585 the context of growing sources of health data Translating the statistical representation of the effects of education 589 interventions into more readily interpretable forms Cognitive psychology meets the 594 national survey Since the eruption of Mt A study of response errors in expenditures data from 600 household interviews Hardly ever or constantly? Group comparisons using vague 603 quantifiers Advances in the science of asking questions Assessing the feasibility of a life What respondents learn from questionnaires: The survey interview and 610 the logic of conversation Measurement as cooperative communication: What research 613 participants learn from questionnaires What response scales may tell your respondents: 616 Informative functions of response alternatives Response scales: effects of 620 category range on reported behavior and comparative judgments The range of response 623 alternatives may determine the meaning of the question: further evidence on 624 informative functions of response alternatives Trust in numbers Effects of time and memory factors on response in 629 surveys Telescoping in dating naturally 631 occurring events Household pulse survey: measuring social and economic impacts 635 during the Coronavirus pandemic. Retrieved from 636 www.census.gov/programs-surveys/household-pulse-survey.html 637 van der Waerden Indagationes Mathematicae (Proceedings) Construct explication through factor or 641 component analysis: a review and evaluation of alternative procedures for determining 642 the number of factors or components Modern applied statistics with S Eyewitness recall: Regulation of grain size and the role of 648 confidence Cognitive interviewing: tools for improving questionnaire design Meaning and relevance (under review). A robust alternative to Pearson's correlation for testing 654 associations and use in latent variable models How much is 'quite a bit'? 656 Mapping between numerical values and vague quantifiers How response alternatives Now you see it; now you don't: inhibiting 662 recall and recognition of scenes Examining engagement in context 665 using experience-sampling method with mobile technology Dynamic documents with R and knitr