key: cord-0687762-i9uxshzo authors: Carleschi, Emanuela; Chrysostomou, Anna; Cornell, Alan S.; Naylor, Wade title: Does transitioning to online classes mid-semester affect conceptual understanding? date: 2021-01-25 journal: European Journal of Physics DOI: 10.1088/1361-6404/ac41d9 sha: 959cb65dbf02c3b796591fbfa82c9d8a158616fe doc_id: 687762 cord_uid: i9uxshzo The Force Concept Inventory (FCI) can be used as an assessment tool to measure conceptual gains in a cohort of students. The FCI uses a conceptions/"misconceptions"lens rather than a context dependent perspective, such as"knowledge-in-pieces". In this study it was given to first year students ($N=256$ students) pre- and post-mechanics lectures, at the University of Johannesburg. From these results we examine the effect of switching mid-semester from traditional classes to online classes, as imposed by the COVID-19 lockdown in South Africa. Overall results indicate no appreciable difference of gain, when bench-marked against previous studies using this assessment tool. When compared with $2019$ grades, the $2020$ semester grades do not appear to be greatly affected. Furthermore, statistical analyses also indicate a gender difference in mean gains in favour of females at the $95%$ significance level (for paired data, $N=48$). Many studies through the years have advocated for various changes to teaching pedagogy away from the so-called traditional lecturing pedagogy, such as flipped classrooms, and peer-assessment [1, 2, 3, 4, 5, 6] , to name but a few. Many of these studies have used assessment tools, such as the Force Concept Inventory (FCI) [7, 8, 9] , to assess the efficacy of these changes, where such drastic changes have been regarded over many years of studies. In the following subsections we discuss some of the history behind the FCI, the South African context and the plan for this article. Any physics department that decides to use an assessment tool has to go through the decision making process of choosing which kind of assessment of the many possible and the reasons why [10] . As discussed in Ref. [10] , one must choose a particular topic to focus on, so we decided to look at conceptual issues in classical mechanics, because this is one of those core foundations that is taught across not just physics but also engineering and teaching degrees at the University of Johannesburg (UJ). We have chosen to use the FCI as the type of assessment for the understanding of mechanics concepts by first year students. The current version of the FCI was released in 1995 and is known as v95. This version has 30 questions, fewer ambiguities, and a smaller likelihood of false positives than the original version [11] . The original version had 29 questions [8] , which was a revision of an earlier test called the Mechanics Diagnostic Test (MDT) [7] . One motivation for choosing the FCI is that it has not only been used by many other institutions internationally [11] , but has also been under immense scrutiny over the decades preceding it. For example, the issue of false positives has further been investigated by Yasuda et al. [12] , where systematic errors generated by false positives were found to be statistically significant for questions: Q.6, Q.7, Q.16 and not Q.5 (N = 1110). However, the authors' recommendation was not to change the question structure given the fact that the current version (v95) has been extensively used in the literature. Other issues with how the FCI questions -and other assessments -are written in terms of "none of the above" (NOTA) and "zero" distractors have also been discussed in Ref. [13] (and references therein). There have also been extensive studies on gender, as we discuss in Sec. 3.3, and finally the FCI has been translated to a number of languages. We come back to this issue for South Africa in the next subsection. Given the above considerations we may consider the FCI as a de facto standard for assessing the conceptual basis of Newtonian mechanics, for which the gain between preand post-test results serves as a benchmark for standard performance on a course. The v95 version of the FCI was given at the start of a course on mechanics as a "pre-test" and then at the end in the form of a "post-test". The normalised gain G [11] is defined as: where %S f and %S i are the final and initial scores respectively. The reported gain associated with students enrolled in an introductory mechanics course is approximately G = 25% [11, 14] . With this established tool and quantifiable indicator of performance, we may probe the question of how a forced transition from in-person to online teaching affected the conceptual understanding of Newtonian mechanics in a first-year cohort at what might be considered a relatively low-resource institution for higher-education. To further enrich our analysis of students' comprehension, we compare student performance in the pre-and post-tests on a question-by-question basis. Such in-depth analyses have recently been performed in Refs. [15, 16, 17] , where a breakdown of the type of response to individual questions can lead to a polarisation of a correct answer and one predominantly incorrect answer. † This can also be useful in circumstances where only a pre-or post-test may be administered. Less than one tenth of the students enrolled at the UJ are first language English speakers. Administering the FCI in English to these first year students might therefore introduce biases in the performance due to the proficiency in the English language rather than the actual understanding of the underlying physics concepts. This is, however, the only feasible way to conduct the FCI, because English is the most common spoken language in the country. The influence of English proficiency on the FCI performance within the specific South African context is something the authors shall investigate in future studies. † See Refs. [18, 19, 20, 12] for a discussion of other ways to analyse and interpret individual question responses in the FCI. As for the relevance of the FCI within the South African context specifically, it is worthwhile taking a closer look at the diagnostic reports on the learners' performance in the National Senior Certificate (i.e. their final year of high school) exams in the Physical Sciences paper [21] , compiled by the National Department of Basic Education of South Africa for the academic year 2019 [22] (which is the year of completion for the first year cohort of physics students under investigation in this study), as well as for 2020 and previous years [21] . With regards to mechanics-specific questions, all the reports highlight recurrent challenges experienced by learners in properly identifying the direction of forces applied to objects, as well as drawing and labelling free-body diagrams correctly. This result is very interesting in light of the findings we report in this work: that the questions in the FCI that specifically deal with forces -i.e. Q.5, Q.11, Q.13, Q.18, Q.29 and Q.30 -are those showing polarisation of the answers. This hints to the fact that, despite its intrinsic challenges, the FCI can be used as a good indicator of students' "misconceptions" in the specific South African context. ‡ Whilst this study has been conducted for only one year's worth of data (where, as mentioned above, usual analyses of such changes in pedagogy are conducted over several years), we believe that the unprecedented nature of the 2020 lockdown experienced in South Africa in the wake of the COVID-19 pandemic warranted study. This work may be considered then as a useful indicator of the short-term effects of a forced transition to online learning on first-year mechanics students. As such, our study was conducted with a very diverse group of first year students enrolled at the Faculty of Engineering and the Built Environment (FEBE), at UJ, whose demographics (average 2015 -2019) range as follows: African 92.8%; White 3.8%; Indian 2.3%; Coloured 1.1% [23] . Yet for such a diverse background, the gains appear to have remained comparable to the bench-marked studies of the last three decades. In seeking to unpack the fact that an change in pedagogy did not affect the report gain (G) we shall look at a number of factors, including the previously studied Gender Gap [24, 25, 26, 27, 28, 29, 15] , where there has been a resulting gender difference in favour of males in the student performance on standardised assessments, such as the FCI in previous studies. However, our results indicate this does not seem to be the case here. Another aspect at play here could be the increased peer scaffolding (along the lines of Mazur's peer evaluation [30] ) as there had been an increased reliance on discussion groups with peers, due to the COVID lockdown. This persisted into the second semester, where a range of discussion boards, interactive tutorials, and WhatsApp groups were used. As presented in Table 1 , it can be seen that the scores for the average Semester 2 mark for both 2019 and 2020, which includes a combination of coursework, practicals, and exams, were not appreciably different. The only noticeable difference was in the number of students who passed after the November exam, or the course throughput, where explained in the table caption, in 2020 a slightly lower semester mark was used to gain entrance to the final exam. ‡ In this article we shall use the word "misconceptions" to mean "prior-conceptions" as we shall briefly elaborate on in Sec. 2. Given these motivations our paper shall be presented as follows: In section 2 we shall detail the methodology of our study, followed by the analysis tools and techniques used in this study in section 3, along with supporting appendices, and finally we shall conclude in section 4. Table 1 : Comparison of the 2019 and 2020 marks for engineering physics 1 in the second semester. Note that: 1) the course throughput is calculated after the main exam only, excluding the results of the supplementary exams; 2) In 2020 the entrance requirement for the exam was lowered to 30% for the theory part of the course (instead of the usual 40% in 2019 and previous years) in light of the COVID-19 pandemic. Our study sought to explore the "misconceptions" in Newtonian mechanics carried by early undergraduates in South African institutions of higher education. To study this we have used a conceptions/"misconceptions" lens [31] rather than a context dependent perspective such as "knowledge-in-pieces" (KiP) [32] . This is done as the FCI is premised on the notion of "(mis)conceptions" and that it is easier to use this terminology. The subjects for our testing were the 2020 first year cohort of engineering students at the University of Johannesburg. The class consisted of approximately 400 students (only four students dropped out during the semester transition, from 404 to 400). The course was initially taught as a traditional lecture-based course, with a weekly online assessment, fortnightly tutorials, and fortnightly practicals (these being done in person in groups of approximately 30 students with graduate students acting as tutors and demonstrators of the practicals). The academic year had begun in early February of 2020, where the pre-mechanics course FCI test was conducted in late February on N = 256 students. § We found that for this cohort (for paired data) N = 48 the average gain was at 24%. We will comment further on these results in Sec. 3.1. § From a cohort of 400 students 256 students sat the FCI, where for pre-and post-test cases there were 144 and 166 students, respectively. After removing several redundant attempts, there were First we should note that South Africa was placed in a hard lockdown in mid-March 2020, and teaching was switched within the period of a few weeks to a purely online format. Lectures were replaced with recorded video content, such that the number of hours of contact time were unchanged. To provide additional support, online platforms for engagement with students were employed (such as consultations using BlackBoard Collaborate Ultra, and WhatsApp discussion groups). As the easing of the lockdown occurred towards the end of the mechanics course, a post-FCI test was possible to administer to a smaller voluntary group of students (N = 166), of which N = 48 had done both the pre-test and post test. The methodology employed to unpack this collected data relied primarily on standard statistical parameters, including the mean, standard deviation, percent differences, p-values for the t-test difference of means, and correlations through R [33, 34] and a spreadsheet. ¶ Using our data from this 2020 cohort in Sec. 3 4 10 100 Table 2 : Mean, standard deviation (SD), minimum and maximum % marks as displayed in Fig. 1 . The paired data consisted of a subset (N = 48) of the N = 256 who sat either the pre-or post test. In this section we first present some analyses of the distribution of the scores for students in the pre-and post-tests (N = 256), starting in the left panel of Fig. 1 (where the right panel is for the paired data, N = 48). The difference in distributions (pre-compared to 256 data subjects in total. Those who took both tests (paired) were 48 in total. ¶ Interested readers who wish to familiarise themselves with the basics of statistical analysis, including the t-test, correlation, ANOVA, and measures of variation (e.g. standard deviation and standard error of the mean), among others, may want to consult Refs. [33, 35] . post-test) have well defined shifts indicating a gain, particularly for the paired data. In Table 2 the results for the pre-and post-test scores can be seen. The gain of G = 0.24 was calculated from the paired data and is the expected gain for a standard physics course [11] . The difference in mean scores was checked via the paired samples t-test (N = 48) and was not due to random fluctuations at the 95% confidence level (p-value = 0.000002614, two-tailed), implying we found a statistically significant difference in the means (pre-and post-test). This relationship between pre-and post-test scores can also be seen in Fig. 2 , where the correlation was found to be moderate and positive. It should be noted that the gain from the pre-test mean, as compared to the post-test mean, is not used to determine the gains [11] , although we have performed a question by question breakdown. In The means for the pre-or post-tests groups used an independent samples t-test and was found to be not due to random fluctuations at the 95% confidence level; p-value = 0.00001182, two-tailed. terms of numbers, Pearson's correlation coefficient, in Fig. 2 , gives a moderate positive correlation of r = 0.504 with p-value = 0.000261 < α at the 95% significance level (α < 0.05). As student ideas may not be clearly understood, we can identify them through analyses of question by question responses. Fig. 3 indicates the percentages of students who correctly answered each question in the pre-and post-tests, with the differences also shown (diagonal hatching/blue). A similar schematic is shown for the paired data (see bottom panel Fig. 3 ) for comparison. We note here the presence of negative gains for certain questions, which are indicative of poorer performance in the post-test. This "loss" is especially pronounced in Fig. 3, top panel, for questions 14, 20, 22, and 24; this also arises in the paired data of Fig. 3 for question 12, bottom panel; the negligible gains in question 21 for the former and questions 14 and 24 for the latter are also worth mentioning. The concepts assessed by these questions are standard topics such as projectile motion (questions 12 and 14), as well as kinematics and Newton's second law (questions 20-24). However, these questions possibly exploit scenarios with which students are unlikely to have had personal experience with (i.e. objects fired from cannons and motion in deep space) and visual tools like ticker-tapes and displacementtime graphs. In doing so, they ensure that students respond based on intuition gained from their mechanics course rather than empirical evidence gathered from daily life. Finally, poorer post-test performance in these more conceptual questions may demonstrate that students are not confident in their ability to apply their knowledge to unfamiliar scenarios. This may be a consequence of superficial learning or dependence on preconceived ideas rather than physics. The presence or development of "misconceptions" may also have come into play. Additionally, we note that these questions might indicate an issue with language ability, a concern shared by other studies conducted in regions where English is not the first language of most students [16, 19] . As discussed in the introduction we can also analyse particular sets of questions which can conceptually lead to polarising choices [16] . In Figs. 4 and 5 we can see the effect of the polarising questions: 5, 11, 13, 18, 29 and 30 in the FCI, e.g., see Ref. [16] . A similar pattern emerges for the cohort at UJ, where there clearly appears to be a subset where asides from the correct answer there is another polarising choice, and apart from Q18 in the "paired" data, Fig. 5 , we find the same dominant incorrect (polarised) response [16] . It may be that certain "misconceptions" drive this polarisation. For example, consider question 5, where the dominant answer of C can be read from the pre-test data of Figs. 4 and 5. This answer claims that the motion of a ball is driven by gravity as well as "a force in the direction of motion", which indicates the common misconception that motion requires an active force. The presence of a force in the direction of motion is favoured also in answers 11C, 13C, and 18D, as well as implied in 30E. Though there is a general decrease from pre-to post-test in the selection of these erroneous answers, the post-test data of Figs. 4 and 5 suggest that these "misconceptions" can be difficult to alleviate, as we have found at UJ. Such observations connect well to the work of Bani-Salameh [20] and others; we shall interrogate these ideas in greater depth in a future work with data from 2020-2021, see Sec. 4. It can certainly be inferred that there are subsets of incorrect answers where ""misconceptions"" in students' understanding consistently leads to the same kind of wrong answer [16] . Table 3 : Paired FCI results (in percent) for female and male participants (N = 48). Post-test (%) 80 60 40 20 Pre-test (%) Female Male Figure 6 : Correlations for combined scores in terms of gender. In this section we now analyse the differences between the male and female participants who sat for both pre-and post-tests (paired). In Table 3 the means for female and male participants are presented and it clearly appears that female participants have performed better in the FCI as compared to male participants. Interestingly, and it has also been found by Alinea & Naylor [15] , the table shows that although male participants did better in the pre-test, female participants had a higher average in the post-test. To try to verify these results further, where due to the average number of participants in each group being 24, we have (in Appendix A) performed multiple statistical tests to confirm if this difference in means is statistically significant. From this we have found that at the 95% significance level we can neglect the null hypothesis. We emphasize that besides parametric tests for normal distributions we have also performed a non-parametric Wilcoxon test that led to a statistical difference in the medians, see App. A. Besides the means, in comparison to Fig. 2 , in Fig. 6 the correlation for female and male groups was found to be mild and positive with: r F = 0.417; p-value = 0.05418 and r M = 0.662; p-value= 0.00001816, respectively. Clearly, there is a more reliable correlation for the male cohort with p-value < 0.05. Finally, this can be compared to the combined correlation (independent of gender, N = 48) where r = 0.504; p-value= 0.02757 in Fig. 2 . The reason for the apparently higher gain for the female part of the cohort might be due to, as found by Sadler and Tai [36] (also see Adams et al [37] ), professor-to-gender matching to student gender was second only to the quality of high-school physics course in predicting students' performance in college physics. It may be worth mentioning that at UJ, during the 2020 academic year, a female instructor was the senior academic for the mechanics course. As we discuss in Sec. 4, we will leave these preliminary results for a follow-up work, with more years of data. In this article we have used the Force Concept Inventory (FCI) to look at the conceptual understanding of a large cohort of physics/engineering students at the University of Johannesburg (UJ) during the 2020 academic year, see Sec. 3.1. Mid-semester, UJ went into lock-down and students then switched from a traditional lecture format to online platforms, yet this led to the very informative scenario where we have found no overall drop in conceptual gains (G = 0.24). This is reminiscent of what happened with regards to the Christchurch Earthquakes (2010-2011) which led to the closure of various high schools, where although a minority had negative impacts there were were many positives [38] . * * This was further established through the comparison of 2019 and 2020 semester marks at UJ (see Table 1 ) where we found no appreciable drop in marks. * * In terms of physics performance during COVID see Refs. [39, 40, 41] . The extent to which the teaching style used in UJ's Physics Department, as it involves active learning such as in-class discussions and problem-solving, weekly tutorials, etc., potentially contributed to this consistency. Note that this is worth probing through deeper comparisons with other investigations into forced transitions to online learning. In Sec. 3.2 we looked at a subset of questions where a polarisation of choices occurred, in that either the correct answer or one main incorrect answer dominated the post-test responses [16] . We found similar patterns for the students at UJ, which followed a very similar pattern to the data found in Refs. [16, 15, 17] . The importance of these questions relates to the fact that they ask the student to be able to understand certain particular concepts in physics, such as circular motion and motion requiring a force. In Sec. 3.3 we looked at a possible out-performance of female students on the overall gain in the FCI. As was also found by Alinea & Naylor [15] , although the male group started with a higher average pre-test score, their gain was less. As mentioned earlier, the main course lecturer was female, which may lead to professor gender matching in this cohort [36] . Although we rigorously checked that the difference in means was statistically significant (at the 95% confidence level, see App. A) we will report on a larger cohort inclusive of 2021 in forthcoming work, where we hope to investigate if other factors may be related to the diligence of female students and socioeconomic factors. The article has raised some questions, such as why the general performance for the group of paired students was higher than those who took either of the pre-or post-tests. This is often due to the fact that students who are diligent are more likely to take both pre-and post-tests. Usually, overall gains are taken only from paired data, which is then used to compare to other cohorts and institutions. However, the question of using "unpaired" pre-and/or post-test data sets in some form has not really been investigated in the literature (see however, Ref. [19] ) and we also intend to comment more on this issue in future work. As for other possible directions to investigate, besides extending the FCI to further years, which also have been disrupted by more COVID lockdowns, we intend to look at matriculation results in order to establish correlations between the FCI and high school exit grades in physics, maths and English scores. In the case of UJ, the first language of the students enrolled in the UJ FEBE (average 2015-2019): English 14.3%; Isixhosa 5.9%; African 1.5%; Other 78.3% [23] . This relates to comments by Bani-Salameh [19] , and also work performed by Alinea & Naylor [16] , in relation to performance on the English version of the FCI, where English is not the student's first language necessarily. It might also be interesting to look at correlations between the FCI and cognitive reflective tests [17] . Finally, it is important to recognise the limitations of the FCI, where as a multiple-choice tool designed to probe "misconceptions" rather than context-based understanding [32] , much of the insights obtained through its use depend on the researcher's interpretation of students' answers. It would be interesting to deploy the FCI with the added expectation that students justify each of their answers. This would allow for a more nuanced and less biased analysis that goes beyond understanding "misconceptions". An additional means of expanding our analysis of first year physics pedagogy would be to integrate principles such as "knowledge-in-pieces", which rests upon the notion that "knowledge depends on context" [32] . On a related note and to unpack fully the issue of whether or not the negative (or zero) gains could be due to limitations in the FCI, it would be worthwhile to get students to give written explanations for their choices. This would again advocate the use of technology in using conceptual testing. Hence, given the sometimes necessary switch to online platforms, such as Blackboard [42] etc., as well as obtaining question explanations, we would also able to study the engagement of students within this online learning environment (through their attendance and marks on continuous assessments). This would also include the time taken to complete various assessment tasks. It may be worth mentioning that two-way ANOVA with replication was unbalanced, as the two group sizes were different (16 and 32, respectively). However, we were able to double-check the results obtained by converting gender to a dichotomous variable (Female= 1, Male= 0) and used a linear regression. At the 95% significance level (or α < 0.05) we can reject the null-hypothesis whenever the p-value < 0.05. In conjunction with the t-test, a two-way ANOVA and a linear regression analysis both agree at the 95% significance level, and suggest that there is a statistically significant difference between the means of female participants and male participants. This was further confirmed in item (d), where we performed a non-parametric test (using medians) and found the critical value to be p = 0.03223 < α. These findings indicate a real difference in gender (for this group) with female participants having better gains than male participants, even though male participants started with a higher average pre-test score. Teaching problem solving through cooperative grouping. part 1: Group versus individual problem solving Peer instruction: Results from a range of classrooms The effect of interactive engagement teaching on student understanding of introductory physics at the faculty of engineering, university of surabaya, indonesia Why peer discussion improves student performance on in-class concept questions Retaining students in science, technology, engineering, and mathematics (stem) majors Active learning increases student performance in science, engineering, and mathematics The initial knowledge state of college physics students Force concept inventory Who needs physics education research!? Best practices for administering concept inventories Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses Analyzing false positives of four questions in the force concept inventory Examining the effects of testwiseness in conceptual physics evaluations Interpreting fci scores: Normalized gain, preinstruction scores, and scientific reasoning ability Gender gap and polarisation of physics on global courses Polarization of physics on global courses Cognitive reflection test and the polarizing force-identification questions in the FCI Enhancing force concept inventory diagnostics to identify dominant misconceptions in first-year engineering physics How persistent are the misconceptions about force and motion held by college students? Using the method of dominant incorrect answers with the FCI test to diagnose misconceptions held by first year college students of Basic Education of South Africa University of johannesburg faculty of engineering and the built environment 2019 annual report Gender differences in both force concept inventory and introductory physics performance On the vague meaning of "gender" in education research: The problem, its sources, and recommendations for practice Gender gap on concept inventories in physics: What is consistent, what is inconsistent, and what factors influence the gap? Gender differences in conceptual understanding of newtonian mechanics: A uk cross-institution comparison Reducing the fci gender gap The role of stereotype threats in undermining girls' and women's performance and interest in stem fields Peer Instruction: A User's Manual. Series in Educational Innovation Misconceptions reconceived: A constructivist analysis of knowledge in transition A friendly introduction to "knowledge in pieces": Modeling types of knowledge and their roles in learning Statistics: An Introduction Using R R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Statistics in Plain English Success in introductory college physics: The role of high school preparation Investigating the factors influencing professional identity of first-year health and social care students The impact of the canterbury earthquakes on successful school leaving for adolescents Teaching general physics in a COVID-19 environment: insights from taiwan Studying physics during the covid-19 pandemic: Student assessments of learning achievement, perceived effectiveness of online recitations, and online laboratories Teaching labs during a pandemic: Lessons from spring 2020 and an outlook for the future Blackboard collaborate EC and ASC are supported in part by the National Research Foundation of South Africa (NRF). AC is grateful for the support of the National Institute for Theoretical Physics (NITheP), South Africa. This study was done in compliance with the South African Protection Of Personal Information (POPI) Act, where all student data (including personal data) was anonymised and collected as part of the University of Johannesburg's physics course's online assessment platform. We would like to thank all staff and students who took part in this study. WN would like to thank useful discussions with Margaret Marshman, University of the Sunshine Coast. The authors are also grateful to Allan L. Alinea (University of the Philippines) for his useful comments. In this appendix we look at the differences in gender means for paired data (N = 48, comprising of 16 female participants and 32 male participants). We found that the mean of the gains for female participants (µ F G = 0.38) was greater than the mean for male participants (µ M G = 0.17). However, to clarify if the difference is purely a random fluctuation, we performed the following tests, using R [34] , at the 95% significance level (α < 0.05):(a) The t-test for independent samples (and unequal variances) had a p-value: 0.02757 < α for a single tail (directional difference µ F G − µ M G > 0). (b) A two-way ANOVA with replication led to an F -statistic: 4.5107 and p-value: 0.03909 < α.(c) A linear regression analysis also led to an F -statistic: 4.511 and p-value: 0.03909 < α.(d) A non-parametric two-sample Wilcoxon test led to medians of 0.2496296 and 0.1602564 for female participants and male participants, respectively, with a Wstatistic of W = 341 and p-value = 0.03223 < α.