key: cord-0612936-442ylpx7 authors: Olsho, Alexis; Smith, Trevor; Brahmia, Suzanne White; Eaton, Philip; Zimmerman, Charlotte; Boudreaux, Andrew title: Effect of administration method on student performance on multiple-choice/multiple-response items date: 2021-06-03 journal: nan DOI: nan sha: 7a9364d277c495bd224b16a5a23df8d9ac75e1d9 doc_id: 612936 cord_uid: 442ylpx7 Multiple-choice/multiple-response (MCMR) items (i.e., multiple-choice questions for which there may be more than one correct response) can be a valuable tool for assessment. Like traditional multiple-choice/single-response questions, they are easy to grade; but MCMR items may provide more information about student reasoning by probing multiple facets of reasoning in a single problem context. Because MCMR items are infrequently used, best practices for their implementation are not established. In this paper, we describe the administration of MCMR items on an online, research-based assessment. We discuss possible differences in performance on MCMR items that may result from differences in administration method (in-person vs. online). This work is presented as a potential first step toward establishing best-practices for the administration of MCMR items on online assessments. Research-based assessments are now widely used to examine student understanding and learning in physics instruction. In an ongoing, collaborative project, we are developing the Physics Inventory of Quantitative Literacy (PIQL) to assess students' quantitative reasoning in introductory physics contexts. Our instrument includes several "multiplechoice/multiple-response" (MCMR) items. MCMR items are multiple-choice items for which there may be more than one correct response, and for which students are encouraged to choose as many answers as they feel appropriate. MCMR items may be a useful tool for probing multiple facets of student understanding in a given context, providing more insight into student reasoning than standard multiple-choice/singleresponse items when using free-response or open-ended questions is not an option [1] . MCMR items may therefore be especially useful for research-based assessments administered online/electronically. However, while best-practices exist for online administration of research-based assessments [2] , little work has been done to establish valid methods for administration of MCMR items online. Wilcox and Pollock report that student performance on coupled multipleresponse items on the Colorado upper-division electrostatics (CUE) diagnostic was similar across online and in-person administration methods, but an examination of possible differences was not a focus of their work [1] . We also note that the coupled multiple-response items on the CUE are a particular style of MCMR item, and the CUE diagnostic (as its name suggests) is used with upper-division students. It is possible that their findings do not apply to all MCMR items, or all students enrolled in college-level physics courses. In this paper, we describe our experiences administering MCMR items online. In a natural experiment arising from the advent on online learning due to the COVID-19 pandemic, we compare student performance and response patterns on the PIQL's six MCMR items when administered online and inperson. Significant differences in student performance on a subset of the PIQL's MCMR items suggest that certain types of questions elicit different response patterns depending on administration method. The work described in this paper was done in the during of the development of the Physics Inventory of Quantitative Literacy (PIQL) [3] . The PIQL is a reasoning inventory that is intended to assess the development of mathematical sensemaking over the course of instruction in introductory physics classes. Many items on the PIQL were developed using the framework of conceptual blending theory (CBT) [4] . CBT provides a framework for understanding the integration of mathematical and physical reasoning that is often a goal of introductory physics instruction. According to CBT, development of expert mathematization in physics would occur not through a simple addition of new elements (physics quantities) to an existing cognitive structure (arithmetic or algebra), but rather through the creation of a new and independent cognitive spaces. These spaces, in which creative, quantitative analysis of physical phenomena can occur, involves a continuous interdependence of thinking about the mathematical and physical worlds. MCMR items can provide an effective way to assess the reasoning engendered by these blended reasoning spaces. An example of an MCMR item from the PIQL is shown in Fig. 1 . This item is an example of an MCMR item to probe multiple facets of a blended reasoning space about mechanical work as a signed quantity. The correct answers are d and g-choice d relates to a more mathematical understanding of the calculation and interpretation of a negative scalar product, while answer choice g relates to a interpretation of negative net work as causing a decrease in mechanical energy of a system. Consider the following statements about this situation. Select the statement(s) that must be true. Choose all that apply. a. The work done by the hand is in the negative direction. b. The force exerted by the hand is in the negative direction. c. The displacement of the block is in the negative direction. d. The force exerted by the hand is in the direction opposite to the block's displacement. e. The force exerted by the hand is in the direction parallel to the block's displacement. f. Energy was added to the block system. g. Energy was taken away from the block system. The PIQL was originally developed as an in-person, proctored assessment, but shifted to online administration in early 2020 due to the COVID-19 pandemic. Though the shift was not planned, online administration of research-based assessments offers many advantages (namely, a smaller investment of time and resources by those administering the test) [5] . We therefore began to collect evidence of the PIQL's validity as an online assessment, with an eye toward dissemination of the PIQL as an online assessment instrument for widespread use. Best-practices exist for online administration of low-stakes assessments such as the PIQL, and we adhered to those practices as much as possible by using a generous time-limit, sending multiple reminder emails to students to increase participation rate and offering course credit for participation (but not grading responses for correctness). In addition, we constructed the online version of the instrument to discourage copying or saving of test items: each item was shown in a browser window on its own; students were not able to backtrack in the PIQL and were not shown a summary of their work or given the correct answers after completion. We are not aware of established best-practices specifically for MCMR items, but we did not have reason to believe that administration of MCMR items would be substantially different than than multiple-choice/single-response (MCSR) items. Our primary concern with the MCMR items was that students might not recognize that they are free to choose as many answer choices as they feel are appropriate. When the instrument was administered in-person, there were multiple opportunities to remind students that they could choose more than one response on these items. These reminders were both in writing on the instrument itself, and also verbally by the proctor. Validation interviews suggested that multiple reminders were necessary, as this variety of question is relatively rare on the assessments typically encountered by students. Because we recognized that a majority of students completing the survey for the first time would have little-to-no experience with MCMR items, we wanted to increase the likelihood that students would recognize that they could select multiple responses for those items when encountering them online (unproctored). All of the MCMR items were moved to the end of the PIQL for online administration. After answering the last MCSR item, students saw a page with no instrument item, but rather a statement that the remaining questions on the survey might have more than one correct response, and that students should choose all answers that they feel are correct. At the top of the page for each of the remaining items (all MCMR), students saw a reminder that the question might have more than one correct response. We also prompted students to "choose all that apply" in the question stem for each MCMR item. As noted, our initial interest in the results from MCMR items administered online was motivated by a concern that some students-largely unfamiliar with MCMR itemswould not recognize that they could choose multiple responses for some of the items on the PIQL. As such, our preliminary analyses of online responses to MCMR items focused on the effectiveness of the measures taken to inform students about the nature of those items. The preliminary analysis found that the measures were effective: not only were students choosing more than one response on the MCMR items administered online, they were choosing more than one response at a higher rate than students seeing the same items in-person [5] . Because we grouped the MCMR items together at the end of the PIQL on the online version (a change from the in-person version), we were not sure whether this was due to the change in administration method from inperson to online, or due the change in question order. In a small follow-up study, we presented the PIQL online to a class of students (N = 109) enrolled in the first quarter of the calculus-based introductory physics sequence. Approximately half the students (N = 59) saw the MCMR items grouped at the end of the PIQL, while the other half saw the MCMR items interspersed with the MCSR items. We found no significant difference between the two groups for the number of answers chosen for MCMR questions. These informal results suggested that more study was appropriate to investigate the difference in response patterns for MCMR items when administered online and in-person. In particular, we were interested in knowing whether students seeing MCMR items online were more likely to choose more responses than students seeing the same items on a paper version of the PIQL. Using data collected with the PIQL over eight academic quarters (four using paper copies of the PIQL and in-person administration methods, and four using the online version of the PIQL and remote administration methods), we decided to perform a more in-depth analysis of student responses to six stable MCMR items. In each of these quarters, the PIQL was administered at the beginning of the academic term, before significant instruction has occurred, to students enrolled in any of the three courses of the calculus-based introductory physics sequence at the University of Washington. The data presented below represent students from all three courses. The total number of sets of responses from inperson administration is N = 3825; we have N = 2689 sets of responses from online administration. We note that some students provided data for multiple courses at various times, including a small number of students that appear in both the in-person and online data sets. Because we were interested in whether students were choosing more responses when viewing the items online, rather than determining whether students were choosing more than one response, we calculated the average number of responses for each of the six MCMR items for both online and in-person administrations. The results are shown in Fig. 2 . For each of the six items, the difference in the average number of answer choices is statistically significant (Welch 2-sample t-test p < .001). This difference does not seem to be due to a handful of students choosing all possible answers on MCMR items administered online; Fig. 3 shows the distributions of number of answer choices for each of the six items and suggests that the differences in averages seem to be due to students choosing one or two more responses online compared to in-person. For MCMR items, dichotomous scoring methods require a student to choose all correct responses and only correct re- sponses to be considered correct. For example, MCMR item 4 on the PIQL (the Work item shown in Fig. 1) has two correct answer choices: d and g. In a dichotomous scoring scheme a student who picks only answer d would be scored the same way as a student who chooses answers e and f (both incorrect). This ignores the nuance and complexity of students' response patterns within (and between) items. In an effort to move beyond the constraints of dichotomous scoring for MCMR items, we have developed a four-level scoring scale in which we categorize students' responses as Completely Correct, Some Correct (if at least one but not all correct response choices are chosen), Both Correct and Incorrect (if at least one correct and one incorrect response choices are chosen), and Completely Incorrect [9, 10] . Only two of the MCMR items on the PIQL have more than one correct response. Therefore, an increase in the number of answers chosen is not necessarily associated in an improvement in performance. Indeed, for all four MCMR items with a single correct answer choice, student performance decreased substantially when items were scored using dichotomous scoring methods. Fig. 4 shows how the CTT difficulty changes with administration method for the PIQL's four MCMR items with a single response (CTT difficulty is equal to the percentage of students answering correctly; therefore, a decrease in an item's difficulty is associated with a decrease in student performance on that item.) For all four items, the difference in difficulty is statistically significant (binomial test p < .001). To investigate more thoroughly how the increase in the number of answer choices associated with online administration was affecting student performance, we used the four-level scoring scale described above. The results are shown in Fig. 5 . We note that the dark purple "completely correct" bars in Fig. 5 represent the percentage of students that would be scored as answering a given item correctly if a dichotomous scoring method is used. The results shown in Fig. 5 suggest that the percentage of students choosing no correct responses does not change substantially when the PIQL's MCMR items are administered online. Decreases in CTT item difficulty (i.e., the percentage of students answering completely correctly when a dichotomous scoring method is used) are instead associated with students choosing incorrect responses in addition to correct responses. This effect is particularly apparent for items 1, 4, 5, and 6. All of these items focus on reasoning about the meaning of sign associated with various quantities or quantitative relationships. Items 2 and 3, for which this effect is less pronounced, focus on proportions and scaling-while quantitative reasoning is required for these items, the answer choices themselves do not ask students to identify correct reasoning. PIQL items about sign and signed quantities generally have answer choices that involve explicit reasoning or interpretation, whereas PIQL items to assess proportional and covariational reasoning typically do not have explicit reasoning in the answer the choices. Thus it is difficult to tell whether the increase in answer choices following this pattern is associated with the topic (sign and signed quantities) or answer choice type (explicit reasoning); however, it seems unlikely that the effect is limited to items that probe student reasoning about sign. The data presented in this paper suggest that students choose more responses to PIQL MCMR items when the items are presented online, compared to items given on paper. This effect seems to be more pronounced for MCMR items that include explicit reasoning or interpretation in the answer choices. It is unknown whether online or in-person administration methods of MCMR items provide a more accurate picture of student reasoning. It is possible that the ease of choosing multiple responses on the online version of the assessment allows students to choose more responses that represent their reasoning about a physics context; this would indicate that the online version provides a more complete picture of their thinking. It is also possible that the ease of clicking responses on the online version inspires students to choose responses in which they have less confidence, which may suggest that the paper version of the PIQL's MCMR items is giving a truer assessment of student reasoning. The effect may be due to a combination of these reasons, or to other reasons we have not yet considered. Although additional analyses of student response patterns on the MCMR items from online and in-person administrations of the PIQL may provide some insight about the reasons for the observed differences, we expect that student interviews will be necessary to understand how students are interacting with MCMR items online. Data collected from such interviews could be compared to existing data from interviews in which MCMR items were presented on paper. Instructors should realize that data collected via online administration of MCMR items may not be comparable to data collected in-person. Our data indicate student performance on MCMR items is likely to decrease with online administration when the items are scored using dichotomous scoring methods; this decrease may not necessarily be associated with a decrease in understanding. Also, when determining scoring methods going beyond dichotomous scoring, it may be necessary to take the increased number of responses into account. These results indicate that even MCMR items with evidence of validity when administered in-person should be subjected to validity checks for online use. Validation and analysis of the coupled multiple response colorado upper-division electrostatics diagnostic Administering research-based assessments online The physics inventory of quantitative literacy: A tool for assessing mathematical reasoning in introductory physics (2021) The way we think: Conceptual blending and the mind's hidden complexities Online administration of a reasoning inventory in development An introduction to the bootstrap An introduction to the bootstrap with applications in r, Statistical computing & Statistical graphics newsletter ggplot2: Elegant Graphics for Data Analysis Developing a reasoning inventory for measuring physics quantitative literacy Developing a reasoning inventory for measuring physics quantitative literacy