key: cord-0633342-9077mdqd authors: Kozhevnikova, Natalya title: How does online teamwork change student communication patterns in programming courses? date: 2022-04-08 journal: nan DOI: nan sha: c3e8fac3016c0cb65e551b6ce5b46a1db1370289 doc_id: 633342 cord_uid: 9077mdqd Online teaching has become a new reality due to the COVID-19 pandemic raising a lot of questions about its learning outcomes. Recent studies have shown that peer communication positively affects learning outcomes of online teaching. However, it is not clear how collaborative programming tasks change peer communication patterns in the learning process. In this study, we compare communication patterns in MOOCs where peer communication is limited with those of a blended course in which students are involved in online peer instruction. We used a mixed-method approach comprising automated text analysis and community extraction with further qualitative analysis. The results show that students prefer to seek help in programming from peers and not the teacher. Team assignment helped to support this habit. Students communicated more positively and intensively with each other, while only team leaders communicated with the instructor reducing teacher overload. This shift could explain how peer communication improves learning outcomes, as has been shown in previous studies on MOOCs. 1 Introduction that could be hard to achieve online, especially in self-paced courses. Cognitive presence is a construction of meaning which is built by challenges, questions and exploration of problems. Teacher presence includes course organisation and management, mediation of social relationships, building understanding and student motivation. Moore et al. (2019) showed that lack of cohort does not significantly affect cognitive processing. It is course instruction design that has a stronger effect. Therefore, student-student interaction should be supported by the instructional design of MOOCs. Blended learning is an effort to combine the advantages of online learning while avoiding the challenges that MOOCs face (Anthony et al., 2020; Ashraf et al., 2021; Rasheed et al., 2020) . There is no established definition of blended learning but it is usually used to refer to some combination of online learning with offline classes and activities. One popular approach to blended learning is flipped classroom. In flipped classroom, students study theory in an online environment followed by an offline class where they can deepen their understanding with the help of the teacher (Alammary et al., 2014; Anthony et al., 2020; Tullis & Goldstone, 2020) . Flipped classroom fits well with the active learning model, since it spares the in-class time for intensive practise. Such face-to-face lessons are often conducted using peer-learning activities in which students help each other to solve case studies or practical tasks, thus teaching each other. Freeman et al. (2014) showed that active learning improves student results in STEM disciplines. Although beneficial, blended learning creates challenges for teachers and students. Students need to have self-regulation skills; teachers are often required to do more work on content creation and course management, and both have to be proficient with the technology (Brown, 2016; Rasheed et al., 2020) . Another approach to blended learning is practice in a virtual environment which is common in teaching medical students. Due to the nature of tasks, it is also extensively used to teach computer sciences. However, students often work on their online assignments independently, receiving little benefit from peer-learning. Current research shows that students have better results when working on programming assignments collaboratively. Pair programming is one of the approaches to collaborative software development. There are several studies on pair programming in education showing that this technique increases learning outcomes, especially for women (Hanks et al., 2011; Salleh et al., 2011; Umapathy & Ritzhaupt, 2017) . Peer-leading approach with small teams in STEM disciplines also shows positive impact on learning outcomes (Herman & Azad, 2020; Wilson & Varma-Nelson, 2016) . Another collaborative programming approach is introducing peer code review in programming courses. According to code review in educational practice improves understanding of computational concepts and increases learning engagement and satisfaction. Since industrial software development includes collaborative programming as well as peer code review and peer leadership, we supposed that programming disciplines could use IT-industry teamwork practices to improve results of online courses. The IT industry has well established approaches to remote software development. Existing online services help to manage teamwork and software development. Therefore we designed a blended learning course that utilises those practises in an educational environment. To manage student teamwork, we used services that are common for the industry: Trello, Github and Discord. Current experience of using Github in education shows that students find it convenient for collaborative work and feel more prepared for the future (Fiksel et al., 2019; Hsing & Gennarelli, 2019) . The use of Trello and Discord helps to decrease the amount of managerial work for the teacher while teaching students state-of-the-art industry tools. Student feedback on the collaborative software development course was positive, therefore we wondered if a similar design can be used to overcome the problems of instructor-paced MOOCs discussed above. In order to answer this question, we explored how online teamwork development affected peer communication patterns in both instructional designs. Therefore, in this study we investigate how communication patterns differ between traditional xMOOCs and the aforementioned blended course with collaborative project work. How does online teamwork change student communication patterns in programming courses? A PREPRINT Studies of non-English educational data are quite rare due to a smaller number of open online courses on other languages and therefore a relatively limited amount of data available. However, cultural and organisational differences could affect the learning process and student communication. Therefore, we tried to investigate whether our educational data support the state-of-the-art research on English courses. In terms of methodology, this study revealed one particular limitation in state-of-the-art automated text analyses. Students of programming disciplines use a mix of their native language, English and programming languages in their discussions. We were unable to find research on code mixing of this type. Also there is still little progress being made in the field of extracting knowledge from mixed natural and programming languages. This limitation raises problems with automated analysis of student questions in programming courses that are important for question answering educational bots. In this work, we decided to skip information that is expressed in different languages and conducted analyses for the primary (Russian) language only. Currently, steps are being made towards automating peer learning in an online environment. We showed that collaborative work similar to IT industry practises changes communication in a positive direction. The next step is to try and implement instructional designs which involve active peer-communication and peer-instruction in MOOCs. However, teamwork inside a MOOC can raise a bunch of new questions and require further investigation. This paper is structured into four main sections. In the first part, we discuss the data and methods that were used to conduct the analyses. The second part describes our results for MOOCs and the blended collaborative development course. In the third part of the paper, we discuss the main findings and limitations of the current work. We conclude by outlining possible directions for further research. Data for this study were collected from two "Open Education" MOOCs. "Open Education" is a branded instance of Open edX hosted by local Open Education association. The "Web development" and "Advanced web development" courses delivered via the platform are provided by ITMO University. These are instructor-paced courses. Students enrol twice a year and study in a cohort. The courses teach Python and its application to web development. They have a structure traditional for xMOOCs when video lectures are followed by tests and programming projects. The courses also offer exams for those who choose to pay for a certificate. For the "Web development" course, the data cover the period from autumn 2017 to spring 2020. As the "Advanced web development" course was released later, the respective data cover a shorter period: from autumn 2019 to spring 2020. The third dataset comes from the "Team software development methodology" course that we designed with a Community of Inquiry framework in mind. It aims to support collaborative learning while students study in a blended environment. Students engaged with learning materials in Moodle, discussed cases in offline lessons and worked on their own project with the help of real-world team development tools. Students were divided into several teams of 4-6 people, developed their application idea and created functioning software. Students used Discord for communication with peers and the tutor, Github for collaborative code development and Trello for agile management. Real-world services used in this course provide an opportunity to download activity data using their API. Following ethical guidelines, we received student consent to use the data for research. These data were downloaded, combined, cleaned and anonymised for further analysis. Dialogue data were collected from Discord team chats, comments from agile boards on Trello and from GitHub discussions and pull requests. Although the MOOCs and the collaborative blended course teach different subjects, we believe that they have a lot in common to make such a comparison valid. They both require programming skills. Though the "Team software development methods" course does not strictly limit the choice of a development technology, students often select web or mobile-based projects. We would like to note that students rarely have basic knowledge of web development by the start of the "Team development methodology" course and many of them decide to develop two sets of skills in one project. Therefore, we believe that students of the selected MOOCs and the collaborative blended course have similar prior knowledge and learning needs which justifies the comparison of communication patterns in these courses. In order to reveal how teamwork on a real-world project changes communication patterns in an online environment, we used mixed-methods analysis. We combined NLP (Natural language processing) analysis with a learning analytics approach. How does online teamwork change student communication patterns in programming courses? A PREPRINT Figure 1 : Visualization of a topic and top words: Programming. The best coherence metric values for xMOOCs topic modeling were achieved with 5 or 8 topics. However, manual review showed that this number of topics leads to an overlap and the results are hard to interpret. Only three of the topics were distinguishable enough to assign them some label. Topics extracted from the MOOC forum texts were reviewed and labelled manually as course management issues, programming and learning material comprehension. Programming is very code specific and its keywords include such terms as 'function', 'method', 'class', 'list', 'print' (see Figure 1 ). In these messages, students share specific details about difficulties in programming they encounter. Learning material comprehension, similar to Programming as it may seem, is less about programming and more about understanding lectures and analysing mistakes in course assignments. Course management issues are about certificates, deadlines and various technical problems. The topic keywords were added to forum data for learning analytics. Communities extracted from the data show that students tended to communicate only with one tutor (see Figure 2 ). Figure 3 shows extracted communities in MOOCs forums. As can be seen that within one community, students interacted more actively while intercommunity communication was very low. It is important to note that student-teacher communication is very intensive and communities are built around the teachers. We also discovered that students discussed different topics with teachers and with each other. Thus, in our data only 3650 messages were labelled Programming in teacher-student communication, while 6231 messages were labelled Programming in student-student communication. Learning material comprehension and course management issues, however, were discussed to a similar extent, as shown in Table 1 . The analysis revealed that there were several very active communities of students who created most of the messages, while most of the students communicated much less or used passive communication patterns like upvoting messages. Tonality of the messages was mostly neutral, however messages in student-teacher interactions were more likely to have a negative sentiment. It is also quite rare to see positive messages. The proportion of positive messages to negative messages in MOOCs was 0.4. Topics extracted from the collaborative course data were diverse. Students tended to communicate about their collaboration, technical problems and application ideas. They also asked the tutor about assignment details. Due to the small amount of messages, automated algorithms had less agreement, therefore, a lot of messages were labeled manually. How does online teamwork change student communication patterns in programming courses? A PREPRINT Figure 4 : The course management topic in the collaborative blended course. We selected four main topics in the data: course management, product management, meetups and programming. Unlike in MOOCs data, course management questions were less about problems and more about assignment requirements (Figure 4) . Students asked about the final presentation of the project and how their decisions would affect the final mark. Product management topic is about generating application ideas and evaluating these ideas against the available time and resources. Since this course had an offline part, students also discussed their online and offline meetings. Those discussions were labeled Meetups. Finally, all the situations that were connected to programming received the label Programming, respectively. It is no surprise that communities extracted from discussions were nearly the same as student teams. However, the structure of communication with the tutor was different from MOOCs. Students communicated using some hierarchy, where one or two leaders talked to the teacher and then discussed information inside their team ( Figure 5 ). With the teacher, students discussed such topics as course management and meetups topics ( Table 2 ). The content of discussion varied according to the interlocutor: for example, within the topic of meetups, when addressing the teacher, students inquired about offline lessons while online meetups were discussed in student-student interactions. Tonality of the communication was mostly neutral. However, students communicated in a more positive manner. The proportion of positive messages to negative messages in the collaborative blended course was 4. The results show that differences in instructional design significantly affect communication patterns. In a collaborative environment, students build strong relationships inside the team and communicate with peers more than with a teacher. Peer-support affects their attitude, as changes from neutral-or-negative to neutral-or-positive message sentiment show. The intensity of interaction revealed by our mixed-method approach makes it quite clear that collaboration reduces the amount of work for the teacher. We might assume that students with low self-regulation skills rely on their peers with highly developed self-regulation skills. Thus, they receive all the information from their team leader reducing the number of messages sent to the teacher inquiring about similar problems. A closer look at the topic of meetups also shows that students pay particular attention to time management and this could help students who struggle to plan their work without external support. From the results of this research, we can see that students tend to discuss programming tasks with each other and not with a teacher. This is true with the MOOCs data as well as with the blended collaborative course data. This could explain how intensive peer-communication increases learning outcomes. One reason might be that students consider the teacher being less available for discussions than their peers. Another reason might be that some students are rather shy to ask the teacher while communication with a peer could seem less stressful. Therefore, we might conclude that peer communication, indeed, has a direct influence on the development of programming skills. However, the above results could be affected by several limitations. The main limitation of this study is the size of the datasets. Even though the MOOCs dataset covered several years, peer-communication was not very intensive and student-teacher communication dominated. The limited amount of text data on the collaborative learning course, in its turn, can be attributed to a small number of students. Thus, the data were unbalanced which could have affected the results. Incorporating student-content clickstream data could help to overcome this imbalance and study communication patterns in more detail. Another source of uncertainty arises from the decision to skip programming code that students post in their questions. In programming disciplines, code brings a lot of meaning and natural language just helps to clarify the student inquiry. However, we were unable to find a reliable solution for topic modelling of code mixing of natural and programming languages. Therefore our topics are very general combining all the programming problems in one cluster. Despite this limitation, we believe that this did not affect our results since topic modelling was used only as a part of manual qualitative analysis. Current research on code mixing is mostly devoted to mixing two natural languages (Jose et al., 2020) . Some studies aim to support developers in information extraction (Palomba et al., 2018; Rodriguez & Carver, 2019) , but most of them rely on tagged data like questions from StackOverflow or compare the code only . There are also some attempts to use topic modeling in bug reports to help find bugs in the code (Lam et al., 2017; Wang et al., 2018; Zhou et al., 2012) , but these results cannot be applied in a question answering context. Mixing programming code with natural languages is common in developer communication in messengers or e-mails. Therefore, future work on topic modeling of the source code in the context of question answering is required to better understand how students communicate about programming tasks. How does online teamwork change student communication patterns in programming courses? A PREPRINT A natural progression of this work is to find out how communication patterns might change if more peer communication is introduced to MOOCs through collaborative assignments. This could raise new research questions. First, students in MOOCs often select this type of learning for the opportunity to study independently, therefore, a need to rely on a peer to complete assignments might cause frustration and even lead to a dropout. However, there is evidence that the number of students who can actually work independently and successfully finish the course is quite small (Reich & Ruipérez-Valiente, 2019) . Therefore, despite some independent students might drop out, there is a reason to believe that overall learning outcomes and student engagement might improve. This is especially true for blended MOOCs that are used as an addition to offline academic courses. A shift to collaborative programming assignments could also require more management from the teacher, thus creating overload. Even though our study showed that the number of questions to teachers decreases, it did not measure the amount of teacher management needed to facilitate the conversations within Trello and Github. We suppose that some of those tasks could be automated by an educational bot. Thus, the bot could answer some common organisational questions, provide students with online tutorials on tools and automate some project management tasks. The development of such educational bots might also partially rely on the results of this study and other research similar in nature. Latest changes in higher education due to the pandemic will affect the way we teach for a long time. Even when there are no strong restrictions, lessons learnt from online teaching will keep changing the educational landscape. In this study we investigated how different online instructional designs affect the way students communicate with each other and with a teacher. We analysed two approaches to teaching online: instructor-paced MOOCs and a blended course with a collaborative team assignment. We used MOOCs forum data and log data from team services that were used in the blended collaborative course. After preprocessing the text, we conducted topic modeling and sentiment analysis. We also extracted communities from communication networks. The results show that incorporation of teamwork helps to restructure communication patterns within a course. While the number of questions to the teacher reduces, the number of peer-to-peer interactions increases. Students start to communicate in a more positive manner and help each other to manage time and tasks. These findings can help teachers to choose an appropriate instructional design for their courses. For example, online project teamwork could become a good solution to the problems of instructor-paced MOOCs and require further research. Blended learning in higher education: Three different design approaches Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis Blended Learning Adoption and Implementation in Higher Education: A Theoretical and Systematic Review. Technology, Knowledge and Learning A Systematic Review of Systematic Reviews on Blended Learning: Trends, Gaps and Future Directions Blended instructional practice: A review of the empirical literature on instructors' adoption and use of online tools in face-to-face teaching. The Internet and Higher Education BTM: Topic Modeling over Short Texts Emergency Remote Teaching" on the UK Computer Science Education Community. Pages 31-37 of: United Kingdom & Ireland Computing Education Research conference From massive access to cooperation: lessons learned and proven results of a hybrid xMOOC/cMOOC pedagogical approach to MOOCs Using GitHub Classroom To Teach Statistics Active learning increases student performance in science, engineering, and mathematics The first decade of the community of inquiry framework: A retrospective. The Internet and Higher Education Pair programming in education: a literature review A Comparison of Peer Instruction and Collaborative Problem Solving in a Computer Architecture Course. Pages 461-467 of Students' and instructors' use of massive open online courses (MOOCs): Motivations and challenges Using GitHub in the Classroom Predicts Student Learning Outcomes and Classroom Experiences: Findings from a Survey of Students and Teachers. Pages 672-678 of Language identification in code-switching scenario A Survey of Current Datasets for Code-Switching Research Self-regulated learning strategies predict learner behavior and goal attainment in Massive Open Online Courses Bug Localization with Combination of Deep Learning and Information Retrieval Using peer code review to improve computational thinking in a blended learning environment: A randomized control trial Setting the pace: examining cognitive processing in MOOC discussion forums with automatic text analysis Twitter Topic Modeling by Tweet Aggregation Automatic Test Smell Detection Using Information Retrieval Techniques. Pages 311-322 of Challenges in the online component of blended learning: A systematic review Exploring the Space of Topic Coherence Measures. Pages 399-408 of The MOOC pivot Comparison of Information Retrieval Techniques for Traceability Link Recovery. Pages 186-193 of Empirical Studies of Pair Programming for CS/SE Teaching in Higher Education: A Systematic Literature Review Factors impacting university students' online learning experiences during the COVID-19 epidemic Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation From Louvain to Leiden: guaranteeing well-connected communities Why does peer instruction benefit student learning? A Meta-Analysis of Pair-Programming in Computer Programming Courses: Implications for Educational Practice Analyzing instructional design quality and students' reviews of 18 courses out of the Class Central Top 20 MOOCs through systematic and sentiment analyses. The Internet and Higher Education Bug Localization via Supervised Topic Modeling. Pages 607-616 of Small Groups, Significant Impact: A Review of Peer-Led Team Learning Research with Implications for STEM Education Researchers and Faculty A biterm topic model for short texts The State of MOOCs from 2008 to 2014: A Critical Analysis and Future Visions. Pages 305-327 of: Computer Supported Education Understanding Student Motivation, Behaviors and Perceptions in MOOCs Where should the bugs be fixed? More accurate information retrievalbased bug localization based on bug reports The author would like to humbly thank Prof. Dmitriy Shtennikov from ITMO University for sharing the data from MOOC courses and his helpful contribution to data collection and analysis.