key: cord-0267268-418uu0go authors: Brooks, Tessa Durham; Burks, Raychelle; Doyle, Erin; Meysenburg, Mark; Frey, Tim title: Digital Imaging and Vision Analysis in Science Project improves the self-efficacy and skill of undergraduate students in computational work date: 2020-10-26 journal: bioRxiv DOI: 10.1101/2020.10.26.353987 sha: 5df23b8e0bd11431345b7db021f69e38ca445624 doc_id: 267268 cord_uid: 418uu0go In many areas of science, the ability to use computers to process, analyze, and visualize large data sets has become essential. The mismatch between the ability to generate large data sets and the computing skill to analyze them is arguably the most striking within the life sciences. The Digital Image and Vision Applications in Science (DIVAS) project describes a scaffolded series of interventions implemented over the span of a year to build the coding and computing skill of undergraduate students majoring primarily in the natural sciences. The program is designed as a community of practice, providing support within a network of learners. The program focus, images as data, provides a compelling ‘hook’ for participating scholars. Scholars begin the program with a one-credit spring semester seminar where they are exposed to image analysis. The program continues in the summer with a one-week, intensive Python and image processing workshop. From there, scholars tackle image analysis problems using a pair programming approach and finish the summer with independent research. Finally, scholars participate in a follow-up seminar the following spring and help onramp the next cohort of incoming scholars. We observed promising growth in participant self-efficacy in computing that was maintained throughout the project as well as significant growth in key computational skills. DIVAS program funding was able to support seventeen DIVAS over three years, with 76% of DIVAS scholars identifying as women and 14% of scholars being members of an underrepresented minority group. Most scholars (82%) entered the program as freshmen, with 89% of DIVAS scholars retained for the duration of the program and 100% of scholars remaining a STEM major one year after completing the program. The outcomes of the DIVAS project support the efficacy of building computational skill through repeated exposure of scholars to relevant applications over an extended period within a community of practice. In many areas of science, the ability to use computers to process, analyze, and visualize large 28 data sets has become essential. The mismatch between the ability to generate large data sets 29 and the computing skill to analyze them is arguably the most striking within the life sciences. 30 The Digital Image and Vision Applications in Science (DIVAS) project describes a scaffolded 31 series of interventions implemented over the span of a year to build the coding and computing 32 skill of undergraduate students majoring primarily in the natural sciences. The program is 33 designed as a community of practice, providing support within a network of learners. The 34 program focus, images as data, provides a compelling 'hook' for participating scholars. Scholars 35 begin the program with a one-credit spring semester seminar where they are exposed to image 36 analysis. The program continues in the summer with a one-week, intensive Python and image 37 processing workshop. From there, scholars tackle image analysis problems using a pair 38 programming approach and finish the summer with independent research. Finally, scholars 39 participate in a follow-up seminar the following spring and help onramp the next cohort of 40 incoming scholars. We observed promising growth in participant self-efficacy in computing that 41 was maintained throughout the project as well as significant growth in key computational skills. 42 DIVAS program funding was able to support seventeen DIVAS over three years, with 76% of 43 DIVAS scholars identifying as women and 14% of scholars being members of an 44 underrepresented minority group. Most scholars (82%) entered the program as freshmen, with 45 89% of DIVAS scholars retained for the duration of the program and 100% of scholars 46 remaining a STEM major one year after completing the program. The outcomes of the DIVAS 47 Introduction 52 Science, technology, engineering, and mathematics (STEM) professions, even those not 53 traditionally steeped in quantitative models and data analysis, increasingly require 54 computational competence [1] . In particular, the natural sciences have experienced significant 55 increases in the amount of data generated by increased computing power, cheaper and more 56 rapid sequencing technologies, and the rise of interdisciplinary fields such as personalized 57 medicine, phenomics, digital agriculture, and climate science. Computation has become so 58 ubiquitous and necessary across the natural and physical sciences that it has been referred to 59 as the "third pillar of the scientific method," along with theory and experimentation [2] . A career 60 in the natural sciences increasingly requires that professionals are comfortable with basic 61 computational skills and quantitative analysis [3] [4] [5] . Beyond this, modern scientific exploration 62 may require the design of new software by developers with both specific content knowledge and 63 computational skills. As a potential "end user", a biologist, chemist, physicist, etc. has the 64 content knowledge, but may need computational skills training [6, 7] . Across the broad range of 65 STEM disciplines, too few students are being trained in computational and quantitative skills 66 that would enable them to develop useful software. In particular, undergraduate students in the 67 life sciences may be resistant to developing quantitative or computational skills due to previous 68 negative experiences or a perception that they "aren't good at" mathematics or computers [8] . 69 The result of these factors is a mismatch between the skills needed for success in research or 70 industry positions and the skills possessed by graduates and young professionals starting these 71 Project models the reality of the modern computational work environment, which is soundly a 104 team-based endeavor. This counters the stereotype that such work is largely solitary. 105 The general hypothesis of the DIVAS Project is that gradual, scaffolded exposure to -and 107 practice with -computational tools, centered on accessible and relevant applications, and 108 implemented in both simulated and authentic supportive professional environments, will impact 109 student self-efficacy, computational competency, and career path interest. We have taken the 110 approach of emphasizing growth in self-efficacy toward computing as the first necessary 111 indicator of growth in computational skill [25] [26] [27] . We also posit that as participants become 112 more familiar with computational tools, they will additionally show more interest in career paths 113 that would utilize said tools. Though our pilot program was restricted in size, its positive impact 114 on participants suggests that DIVAS program elements are well-suited to our broader goals of 115 fostering computation skills within a community of practice. We describe our approach here both 116 as a guide and an invitation. We hope to form new DIVAS partnerships to broaden the DIVAS 117 community and enable additional study on the efficacy of the approach we have taken. 118 To explore our hypothesis, a pathway of interventions was designed that comprise our 121 programmatic 'onramp' (Fig 1) . Each cohort of DIVAS scholars was introduced to our 122 community of practice via a one-credit, spring semester seminar (DIVAS Seminar I) and 123 engagement with the DIVAS Slack team. Work continued in the summer with a week-long 124 coding workshop, followed by a four-week long paired-programming session that allows DIVAS 125 scholars to put their recently acquired skills to use. DIVAS Scholars can participate in an 126 additional three weeks of research with DIVAS faculty to conclude their summer activities. In the the cohort takes DIVAS Seminar II. As with other team science endeavors, the DIVAS 130 community is offline and online, with Slack and Zoom playing significant roles in communication, 131 project management, co-working sessions, team meetings, etc. In the sections that follow, the 132 basic design of each intervention is detailed and intervention resources can be found at the 133 DIVAS Program Resources website [28] . popular way to build coding and data analysis skills [29] . On average, participants report 152 increased self-efficacy in coding and coding skills, based on pre-and post-workshop surveys 153 and on longitudinal surveys [29, 30] . However, workshops like those offered through The [29] . We designed a one-week coding workshop that includes two days of basic coding in 156 Python and three days of image processing using OpenCV libraries. The two-day introduction to 157 Python was modeled on an existing Carpentries workshop and can be found at GitHub [31] . The 158 overall design of the three-day image processing workshop was informed by Adrian 159 Rosebrook's 2016 book on the topic [32] . To keep students engaged with Python basics, 160 examples used during this section of the workshop were tailored toward image processing 161 projects. Students were also presented with two authentic and "solvable" research problems at 162 the beginning of the image processing portion of the workshop. For the first problem, 163 participants were asked to count bacterial colonies on a plate image. For the second, 164 participants were asked to track the progress of an acid-base titration captured on video. Our 165 workshop design provides students an opportunity to immediately apply their recently acquired 166 Python skills to write code to perform analysis tasks to address these two authentic problems. 167 The image processing portion of the workshop was adopted by The Carpentries in 2019 [30, 33] . industry in which two programmers work together, with one person assuming the role of the 174 "driver" who writes the code, and the other taking the role of "observer" who reviews the code 175 and makes suggestions. In introductory computer science courses, the use of pair programming 176 results in higher quality code, increased student enjoyment, improved pass rates for courses, 177 and improved retention in computer science majors for both men and women [17,34-36]. Also, 178 pair programming has been shown to increase the confidence of women in the programming the workshop to the completion of two consecutive two-week pair programming projects. Each 182 year, one project was morphometric in nature while the other was colorimetric. Image data sets 183 were found from public repositories or from the research of the faculty team. The project was 184 presented by a faculty member at the beginning of each project. DIVAS Scholars were randomly 185 divided into pairs. For pairs composed of students at different institutions, pair programming was 186 conducted virtually using Zoom. This arrangement allowed us to explore the feasibility of a fully To assess CT ability before any formal instruction in coding, participants were given a handout 287 that described a hypothetical cup stacking robot that could be given simple instructions to 288 achieve different configurations of cups. The exercise was adapted from the Hour of Code 289 lesson "Programming Unplugged: My Robotic Friends" [44] . Participants were asked to create a 290 series of commands to achieve a particular cup stacking arrangement. After writing their initial 291 set of commands, participants were asked to simplify their 'code', possibly by writing one or 292 more new commands. A different cup-stacking prompt was used after the DIVAS Seminar I. 293 After each subsequent intervention, the code developed in each one was used to assess CT 294 ability. The cup stacking prompts are under 'Assessment Tools' at the DIVAS Program Data Analysis. To investigate changes in student self-efficacy and career interest, scores within 298 each category were summed to determine a composite score for each individual. A paired-299 samples t-test was performed (alpha = 0.05) to determine if composite scores before and after a 300 given intervention were significant. For significant changes, the effect size was determined by 301 calculating Cohen's d. The change in CT scores within a year was determined by calculating a 302 total score for an individual artifact from each rater and determining the median value. A paired-303 samples t-test was performed to determine if CT scored had changed after an intervention. 304 Subscores within each of the areas of the rubric (Recognize, Analyze, Design, Implement) were 305 also calculated and evaluated using a paired-samples t-test to determine whether significant 306 changes within each area were observed. Effect sizes for significant differences were described 307 by calculating Cohen's d. (Table 2) . 316 Scholars start the program with a one-credit seminar in the spring and end the program with a 317 one-credit seminar the following spring, thereby participating in the full DIVAS pipeline as 318 presented in Fig 1 (above) . An additional source of self-efficacy information came from the voluntary completion of a IDEA 340 Student Ratings of Instruction system survey [45] , which is conducted at Doane University at the 341 end of each course and that we utilized in the DIVAS project. We analyzed self-reported 342 learning gains in the IDEA-defined learning objectives for the eleven scholars who completed 343 the survey (year 1 = 5, year 2 = 4, year 3 = 2). We found that scholars self reported strong gains in the objectives "Acquiring skills in working with others as a member of a team" and "Learning appropriate methods for collecting, analyzing, and interpreting numerical information." The three 346 cohorts rated both objectives at an average score of 4.45 out of 5 points. The Doane 347 institutional average over the period of this project on these learning goals are 3.72 and 3.56, 348 respectively. Overall, DIVAS Seminar I was effective in improving the self efficacy of Scholars 349 toward computing, and positively influencing their intended career path. Observationally, the 350 seminar was important in building rapport and a shared experience between all members 351 (faculty and students) in the community of practice. In year three, the DIVAS cohort completed 352 their photo diary project in tandem with 200-level graphic design students (Fig 2) . This There were no significant changes in intended career path over the three year period. However, 368 we did observe an increase in the standard deviation of the mean score. In looking at individual 369 responses, this increase in standard deviation seems to indicate that scholars became more 370 extreme at either end in their interest in incorporating computational skills in their future careers after this intervention. We did not find this concerning since this divergence in interest was 372 paired with a significant increase in self-efficacy. 373 374 At the end of each day of the workshop, we also asked participants to rate the percentage of the 375 day's material they felt they had mastered. Data was compiled for all participants, including 376 those who were not DIVAS scholars. We found a high average perceived mastery for the 377 Python/Bash/git portion of the workshop, and then a drop for the first two days of the computer 378 vision portion (Table 4 ). We believe this is due to the increased complexity in the subject matter. 379 By the third day, this metric rose as participants were able to use their newfound skills to 380 complete the challenge questions successfully. 381 382 We found the coding workshop format to be effective as it immersed scholars in an enriching 384 skill development environment. Though coding training was intensive, the participants' self-385 reported improvements in mastery support the observation that scholars see tangible benefits 386 from their persistence. The workshop also provided two cycles of challenge, learn, and achieve 387 -in the spirit of Challenge Based Learning [46] -to provide participants multiple opportunities to 388 struggle with new concepts and see the payoff. 389 surprising because students' self-efficacy was already high and near the ceiling of the 416 instrument, on average, following the coding workshop (Fig. 4) . However, given that students 417 were asked to solve a number of challenging problems largely independently, we see the 418 maintenance of self-efficacy throughout this programmatic period as significant. other's ideas and expertise to develop approaches to solving a variety of problems. We found 426 that scholars tended to work amongst themselves before seeking input from one of the faculty 427 mentors. We considered this both a healthy development of independence and teamwork that 428 reflected the increased confidence scholars gained in their individual and collective skill sets. 429 We also observed cases where one or more scholars would be given special authority by the 430 group. While this was often productive, we also observed that it sometimes contributed to an 431 over/underfunctioning dynamic between pairs. Because of this, we were especially mindful of 432 giving praise for taking risks and highlighting the specific strengths of each project and each 433 scholar separately. We also worked to minimize this over/underfunctioning dynamic when 434 selecting pairs for each project so as to maximize each student's engagement. "Learning appropriate methods for collecting, analyzing, and interpreting numerical information" 457 (4.33 ± 0.52). Scholars also responded positively to the statement, "My background prepared 458 me well for this course's requirements" (4.5 ± 0.55), reflecting the gains in self-efficacy we saw 459 in the survey data. The last year of the seminar occurred during the first wave of the COVID-19 460 pandemic. This resulted in a response rate to the self-efficacy and career path survey that was 461 too low to report. 462 463 Over the three years of the project, scholars experienced significant increases in self-efficacy 478 towards computing from the beginning of Seminar I to the end of summer programming (FIG 4) . 479 The most significant gains (p < 0.05) occurred during Seminar I and the coding workshops. The 480 impact of the DIVAS program on scholars' intended career paths was more subtle. Although 481 scholars did not show significant career path gains from the initial pre-test before Seminar I to 482 the end of summer research (p = 0.12), Seminar I resulted in significant gains in career interest 483 for all years combined, as did Seminar II for Years 1 and 2 (Tables 3 and 5) , both of which 484 include explicit career exploration. Scholars were also observed to become 'warmer' or 'colder' 485 to a career utilizing computing as they moved through the program. This effect is apparent in the 486 increased standard deviation in post-intervention career interest scores, which started at ± 2.3 487 after Seminar I, grew to ± 3.02 after the coding workshop, and increased to ± 3.46 after pair 488 programming/summer research. We see this as an encouraging progression, especially 489 because scholar self-efficacy grew steadily throughout the program. 490 planning. One scholar majoring in biology declared a minor in software development. A second 494 biology major switched to a bioinformatics major, and two scholars have taken non-required 495 electives that emphasize computational skills. One scholar participated in an external REU 496 program in computational and systems biology, and eight have continued research projects that 497 incorporate coding or computational thinking. Three DIVAS scholars have worked as peer tutors 498 for Doane's CCLA. One former scholar is pursuing a Ph.D. in chemical biology with a significant 499 computational component to their research and another student who participated in both the 500 coding workshop and paired programming is pursuing a Ph.D. in complex biosystems. 501 502 Overall, even given the small sample represented in this study, we see great potential in the 503 DIVAS approach of introducing novice students to computing through a media computing within 504 a community of practice. Thirteen of the 17 DIVAS scholars from the three years of the project 505 (76%) were women and 14% of scholars were members of an URM group, a significantly higher 506 percentage than the total percentage of women and URM group members in the majors 507 represented in the project or in the STEM workforce [50] . The large majority (82%) of scholars 508 entered the program as freshmen. We retained 89% of DIVAS scholars for the duration of the 509 program and retained 100% within a STEM major one year after completing the program. Our 510 findings suggest that the DIVAS approach to computational skills development is a positive 511 experience for students that warrants additional study through the implementation of DIVAS 512 program elements in a broader array of educational contexts. To this end, we hope to form new 513 DIVAS partnerships that will enable an expanded study on the efficacy of the DIVAS approach. 514 515 University participating in the project have also been supported by the National Science 520 Mathematical Sciences and Their Applications, Committee on the Mathematical Sciences in 533 2025. The Mathematical Sciences in 2025 Progress in computational thinking, and expanding the HPC 535 community Vision and Change in Biology Undergraduate 537 A Call for Action-Initial Responses. CBE-Life Sciences Education Why Scientists and Engineers Must Learn Programming A New Biology for the 21st Century. Frontiers in Ecology and the Environment Developing Scientific Software The state of the art 547 in end-user software engineering Attitudes towards computer science-computing experiences as a 549 starting point and barrier to computer science Remote Sensing and Human Health: New Sensors and New Opportunities Emerging Infectious Diseases The pixel: A snare and a delusion Information extraction from remotely sensed data Visualization 560 of image data from cells to organisms Experience report: CS1 in MATLAB for non-majors, with media computation and 562 peer instruction Computational solutions to 565 large-scale data management and analysis A Crash Course for Preparing Students for 567 a First Course in Computing: Did it Work? Journal of Engineering Education Success in introductory programming Imagineering inauthentic legitimate peripheral participation: an 574 instructional design approach for motivating computing education Professional Learning Communities and Communities of 578 Practice: A Comparison of Models, Literature Review. Online Submission Success with EASE: Who benefits from a STEM 581 learning community? Increasing Retention and Graduation Rates Through a STEM Learning Community Vogt CM. Faculty as a Critical Juncture in Student Retention and Performance in Engineering Programs What is a Community of Practice and How Can We Support It Theoretical foundations of learning environments Engaging Students: An Examination of the Effects of Teaching 593 Strategies on Self-Efficacy and Course Climate in a Nonmajors Physics Course Cooperative learning instructional methods for CS1: Design, 596 implementation, and evaluation Developing Self-Regulation and Self-Efficacy: A Cognitive 598 Mechanism behind the Success of Education Analysis of the Long-Term Feedback Survey Practical Python and OpenCV + Case Studies: An Introductory, Example 611 Driven Guide to Image Processing and Computer Vision Image Processing with Python Pair programming improves student 615 retention, confidence, and program quality In support of student pair-programming In Support of Pair Programming in the 619 Introductory Computer Science Course. null Improving learning 621 of computational thinking using computational creativity exercises in a college CSI 622 computer science course for engineers Proceedings Aspirations in CS1 Courses: Change and Relationships with Achievement. Proceedings of 626 the 2016 ACM Conference on International Computing Education Research Google for Education: Computational Thinking Managing the Development of Large Software Systems Programming Unplugged: My Robotic Friends Campus Labs Challenge based learner user guide elegans infection live/dead image set version 1 from the Broad Bioimage 645 Benchmark Collection The "burning ship" and its quasi-Julia sets Center for Computing in the Liberal Arts