key: cord-0298634-6okegnrg authors: Grassl, Isabella; Fraser, Gordon title: Scratch as Social Network: Topic Modeling and Sentiment Analysis in Scratch Projects date: 2022-04-12 journal: nan DOI: 10.1145/3510458.3513021 sha: bb65518c60573d747a78f7b9168118f7b34a1455 doc_id: 298634 cord_uid: 6okegnrg Societal matters like the Black Lives Matter (BLM) movement influence software engineering, as the recent debate on replacing certain discriminatory terms such as whitelist/blacklist has shown. Identifying relevant and trending societal matters is important, and often done using social network analysis for traditional social media channels such as twitter. In this paper we explore whether this type of analysis can also be used for introspection of the software world, by looking at the thriving scene of young Scratch programmers. The educational programming language Scratch is not only used for teaching programming concepts, but offers a platform for young programmers to express and share their creativity on any topics of relevance. By analyzing titles and project comments in a dataset of 106.032 Scratch projects, we explore which topics are common in the Scratch community, whether socially relevant events are reflected and how how the sentiment in the comments is. It turns out that the diversity of topics within the Scratch projects make the analysis process challenging. Our results nevertheless show that topics from pop and net culture in particular are present, and even recent societal events such as the Covid-19 pandemic or BLM are to some extent reflected in Scratch. The tone in the comments is mostly positive with catchy youth language. Hence, despite the challenges, Scratch projects can be studied in the same way as social networks, which opens up new possibilities to improve our understanding of the behavior and motivation of novice programmers. Societal issues such as the Black Lives Matter movement (BLM) have an impact also on software engineering (SE), for example with the proposal to replace discriminatory terms such as whitelist/blacklist and master/slave. Analysis of such societal events is therefore relevant for SE as these might serve as a catalyst to question information technology conditions and for adapting contemporary patterns to a new discourse system [4, 17] . Research investigating societal events typically focuses on social media analyses that use text mining to extract semantic information from posts [1, 7, 13, 24] . Can we also learn about societal events from SE? A thriving subfield in SE is the educational programming environment Scratch [23] , and its millions of predominantely young users 1 . Prior research has investigated Scratch programs and its users [2, 12] . However, the Scratch ecosystem offers more than just code: There is also surrounding textual information in terms of project titles and users interacting by commenting and liking each other's projects [14, 27, 32] . This raises the question whether the analysis of social networks [9, 13, 33] can be transferred to the analysis of Scratch projects. A central challenge here is that Scratch not only provides predominantly very short textual information, but above all that this is written mainly by children and teenagers. The aim of this paper is to identify to what extent text mining is possible in Scratch projects and their comments, and whether socially relevant events are reflected. The first research question is: RQ1: What topics can be extracted from Scratch projects? We conduct a first time application of automated text analysis of Scratch projects, using the title of a project, in analogy to news headlines and Twitter hashtags [13] . We use Top2Vec [3] , a state of the art method from the field of machine learning and natural language processing for this purpose. The interfacial structure of the projects resembles that of a social network-with the possibility of commenting, sharing and liking. Increased offensive behavior like hatespeech is reported for comments in traditional social networks [8, 34, 35] . To determine whether such behavior also exists in Scratch comments and how semantic fields relate to the sentiment, our second question is: RQ2: What is the tone of the Scratch comments? We use sentiment analysis, which also has its origins in analysis of social networks and recommendation systems. We analyze the project comments with the multi-class sentiment tool VADER [18] . Our results show that Scratch projects contain net and pop cultural references, but societal topics such as the Covid-19 pandemic are also referenced. The majority of comments of the projects are positive and the users express their interest in the projects, but again we find some indicators that political events like the BLM movement are present. With these insights, this paper lays the foundation for further interdisciplinary studies in Scratch as a social network. It also identifies new challenges and provides new opportunities for the application of machine learning in programming education, where text analysis is challenging due to the short, domain-independent language used by young people. In this paper, we bring together concepts from analyses of social networks and the community of young programmers in Scratch. Although analyses concerning socio-cultural issues represent a relatively young branch of research in SE, there are already some research reports on the content and structured analysis of social networks like GitHub or Stackoverflow, similar to analysis in traditional social media [19, 22] . In particular, the identification of trending topics, which activities lead to better and faster reputation scores, as well as the effect of gender has been the focus of prior research [4, 6, 17, 26] . Furthermore, language identification on Stackoverflow revealed that a consistency between the tags provided by users and the classification with a speech recognition tool is often not given [10] . Similarly, the impact and benefits of sentiment analysis for SE especially in OSS like GitHub repositories have been recently discussed [15, 21, 28, 29] . Even a cross-linking study between GitHub and Twitter has recently been conducted [11] . Identifying affective and social factors is relevant as they impact product and collaboration quality, productivity, and employee satisfaction. Scratch is one of the most popular introductory programming environments. Shared projects can be commented on, and include symbols for Love-it (♥), Favourite (★), Remix ( ), which is similar to sharing in traditional social networks. Looking "inside" a Scratch project reveals the code of the underlying figures (Sprites) and the environment (Stage) of the program. Scripts are created by visually arranging blocks representing programming instructions. The popularity of Scratch has raised attention in research, especially in the fields of computer science education and SE. There have been qualitative studies, e.g., regarding the digital competence of young people and their handling of digital media by means of stories in Scratch [30] , to encourage them to develop their own projects and not just consume them. Some prior research also exists on the application of machine learning in Scratch, such as using a predictive analysis to determine whether comments in a project are project-related [32] , using latent class analysis to investigate the use of programming concepts [12] or using topic modeling to identify gender-dependent topics within projects [14] . This is fundamental to better understand how children program in Scratch in order to improve the learning environment and thus ensure long-term interest and sustainable learning outcomesespecially for underrepresented groups in SE. We randomly sampled publicly shared Scratch projects using the REST API provided by the Scratch website. We excluded remixes, as well as projects that contain less than ten blocks, ten views and ten loves, as such small and unpopular projects contain little information, but add noise. We retrieved 124.160 projects, created in the period from 26.04.2007 to 18.08.2021. After preprocessing and restricting to projects created on or after 01.01.2019 to focus on recent trends, the final dataset consists of 106.032 projects for the topic analysis. For the sentiment analysis, we included only projects which contained more than ten comments and therefore, from the original 124.160 projects only 21.786 projects remained, which was again reduced after the same preprocessing steps. The final dataset for the sentiment analysis consists of 16.816 projects. Since we apply textual analysis, we excluded images and audio files. However, these could be included in future research, which we support by providing our analysis source code as open source. 2 3.2.1 Preprocessing. As with any text analysis, several preprocessing steps are required on the data. We parsed the text into tokens (here single words), normalized to lower case and removed punctuation as well as stop words defined in the NLTK library, as they generally have no deeper or sporadic meaning, but distort the results. We also removed any characters which are not in the given ASCII range from 0 to 122, but made no filtering of English words, because the accuracy of the filters is very poor and generates many false-positives. However, the pre-trained model of Top2Vec is multilingual. Since the inflectional form of a word does not provide relevant additional semantic information, lemmatisation with the popular Wordnet-Lemmatiser 3 is used to reduce terms to their lemma, where the frequency of the original terms is preserved. Furthermore, we removed customized stop words relating to the project type such as "game", "animation", "platformer" as well as domainspecific terms such as "sprite" or "remix". After the preprocessing, each sample (Scratch project) contains several features-the terms of the title for the topic analysis and the terms of the comments for the sentiment analysis-which serve as input for the models. : Topic Analysis. The title as basic feature represents the input parameter while the topic represents the target parameter of the model. One advantage of Top2Vec is that it automatically generates the number of topics itself and, unlike alternative approaches, does not require choosing the number beforehand. Each topic from the model is associated with a number of generated keywords that add more contextual understanding of the projects' semantics. To identify the tone of the projects, we use multi-class sentiment analysis. To find the most accurate tool, two independent researchers manually classified two comments from each of 500 randomly selected projects from the dataset, hence a set of 1000 randomly selected comments. The comments were labelled into three basic moods (positive, neutral, negative) and compared to the classification of SentiStrength 4 , VADER 5 and Stanford CoreNLP 6 . SentiStrength's F1 score (0.743) is comparable to VADER (0.740), with Stanford CoreNLP (0.687) scoring lower. Since VADER is open source and has the advantage of a compound score, it was selected over SentiStrength for further analysis. VADER is specifically adapted to sentiments expressed in social media and provides the percentage by which a text is rated positive, negative, or neutral. In addition, a compound score is calculated by adding the valence scores, adjusting them according to the rules, and then normalizing them between -1 (most extreme negative) and +1 (most extreme positive). Based on prior work [18] the threshold for positive sentiment is > 0.05 and for negative sentiment < -0.05. To assess the overall sentiment of a project, we consider the average per comment. The most common terms over all projects are visualized by word clouds to determine semantic fields. The analysis with Top2Vec looks at the words with the highest co-occurrence. If there are important words in the proximity, the word becomes more important, but this does not have to be the case in children's projects and one should therefore also look at individual important terms with keyword analysis, tf-idf or more advanced neural networks [5] . Besides the classification into three basic moods with VADER, there are also several shades in between that enhance the understanding, e.g., anger, fun or sadness. SentiStrength-SE or Senti4SD, which are adapted specifically for SE might provide another perspective on the content; however, we explicitly decided against such one as our input is from children without reference to SE terminology. Concerning external validity, the results of the topic analysis may not generalize to other scenarios or networks in SE such as GitHub. We considered all projects with a minimum of ten blocks as well as for the sentiment analysis with a minimum of ten project comments because they introduced too much noise into the data after one experiment. Nevertheless, even from these small projects interesting insights could be gained, e.g., showing self-painted backgrounds. Further filtering steps might improve the data, e.g., by excluding buggy projects. To determine the topics of the Scratch projects, a topic analysis was performed using Top2Vec. The model generated 1,441 topics with associated keywords. Due to the large number of generated topics and because an automatic evaluation of such unsupervised machine learning approaches is not possible, we reviewed only the 100 most frequent topics, i.e., the topics to which the most projects belong, manually. We manually classified these 100 topics into 10 types of topics, which are illustrated in Table 1 with examples, and will be elaborated in the following 7 . The ID references the rank of the topic in terms of how common this topic is. There were also many topics that contained a combination of different topics, but did not form a coherent topic themselves, and therefore were not discussed further, but might be attractive for further studies. The first topic in Table 1 , labelled inconsistency, corresponds to the most projects by far, captured by keywords that are unrelated filler words like one, bit or ok. There are also abbreviations like idk [i don't know] or youth language like dude. The model seems to assign all projects that are not associative to this topic. This might be due to the common use case for Scratch as the students are just messing around to try out some programming concepts in a class and therefore do not devote much attention to the titles. The second type of topic also contains some incoherent terms (interestingly also the term covid), which are all to some extent proper names. However, this topic seems to consist of independent concepts that have no common context: Sharkyshar is a popular YouTube channel as well as a popular Scratch user. PFP is the abbreviation for profile pic or an icon that you have as your profile picture, while stickmini is a video game and tvokids is a Canadian children's programming television network. Therefore, this topic is an example of the model combining independent terms without context. Nevertheless, this indicates that the extraordinary event of the Covid-19 pandemic is being processed in some form in the users' projects. So, similar to social networks [9, 13, 33] , children in a programming environment are also responding to societal events. To what extent this really influences programming and how it is represented should be the subject of future work. Some topics are related to the children's lifeworld. One semantic field consists of different animals (Topic 5), which are often used as characters in Scratch, since the mascot of Scratch is also a cat. In Scratch there are many different animals as characters and often children use their pets as models for their avatars in the program. Topic 28, music, deals with songs, lyrics, playlist and musicals as well as different music genres such as rap and rock. Here, too, the Scratch environment lends itself to this topic, since there are various stages, dancing figures and a dedicated block category sound, where a lot of sounds are available and you can also upload your own sounds. In addition, music is a specific project type on the Scratch website. In addition, common everyday life scenarios such as holidays and celebrations are popular among the topics, which are represented by the terms Christmas, Halloween or birthday parties (Topic 90). Such events seem to appeal to children, as other studies confirmed the use of Scratch projects as "gifts" for Mother's Day [31] . In particular, Scratch reinforces all possible ways of being creative with their starter projects 8 on the website, where celebrations and parties are an appealing topic for newbies. Many topics deal with references to net and pop culture, reflected by terms such as Pokemon, Kirby, Minecraft, Sonic, which all refer 7 The complete list of generated topics as well as the dataset and the analysis are made available for replications at https://github.com/se2p/semantic-analysis. 8 https://scratch.mit.edu/starter-projects Similarly, rather shiny fantasy worlds (Topic 99), characterized by rainbows, balloons and unicorns, exist. These worlds show that the users' interests are diverse and it would therefore be intriguing to determine whether patterns can be derived between preferred topics and other factors such as gender, age or origin. In particular, the goal of further work should be to generalize this topic analysis and to link the topics with the project statistics and users in order to gain further insights into the interests of the Scratch users. RQ1 Summary. Topics from net and pop culture can be found in Scratch projects. In addition, the exceptional societal event, the Covid-19 pandemic, was also reflected in the projects. Overall, the comments from the projects consist of 49.362 nouns, 3.751 verbs, 3.064 adjectives and 1.005 adverbs. Figure 1 shows the most common terms in Scratch project comments according to their sentiment. In total, there are 14.434 positive projects, 1.933 neutral projects and 449 negative projects. It is clearly evident that comments of a positive nature clearly outweigh negative comments. In contrast to other social networks such as Instagram [16] , Twitter [34] or YouTube [25] in which cyber-bullying, hatespeech as well as radicalizations have already established themselves [8] , Scratch still seems to be a friendly platform. Similar to the recent social platform TikTok [35] , this positive communication may be due to the fact that (1) the age group of users is rather low and (2) there are less controversial or political debates than, e.g., on Twitter. The positive comments are dominated by the term thank. The expression of thanks is a sign of appreciation-either from users to the project creator or from project creators in response to comments on their projects. In addition, there are strong positive words such as love, best, great, awesome or pretty, cute or nice. These terms in combination with the other terms like character, work, job or done result in phrases that praise good performance or a pleasant implementation of the program. In particular, acronyms like lol [laughing out loud] or xd [smiley face] dominate also the positive project comments. Here, the specific speech situation is unclear, whether in its euphoric or ironic meaning. In either case these acronyms exist for identification as well as differentiation from other groups. A prerequisite for this is to understand the content of the term and to put it into context-lol, for example, is a sign for colloquial language of younger persons, i.e., youth slang [20] . The word field of negatively connotated comments is dominated by the word people, which cannot be inferred in the context. However, among the negative comments, it is striking that the acronym BLM [black lives matter] as well as the terms George, Flyod, police, racist and black can be found, referring to the event which was the reason for the movement in the USA. This is a strong indicator that societal and political relevant events were reflected in the projects or at least mentioned in the comments. As a consequence, Scratch might not be just a programming platform, but might have the potential of being a social media channel, which should be a aim for further research [7] . In addition, there are also strong negative words like kill(ed) or die, which could be read in the context of the BLM movement, but might be also related to the project type: Especially in games, it is often mentioned in the comments that the game players died very quickly, such as I already died at level 2. The use of the terms death or kill usually has no pejorative connotation in this context. The neutral comments contain expressions of astonishment like uh and oh, which have to be interpreted depending on the context, which this analysis does not provide. In addition, the terms gaming and engineer stand out, which on the one hand refer to the popular project type game and on the other hand to the type of producer of a project. Overall, neutral words appear to be of little significance. We observe in Fig. 1 that words such as make, one, dont, or know overlap between sentiments, implying that they are not uniquely assignable or simply context-independent. Since these words are attributed so little specific information content, they might have to be ignored for further studies. Often terms also denote ambiguity, but this seems to be less evident in the projects. Overall, it seems that positive comments tend to focus on the project in general and its implementation (good job), while negative comments tend to focus on the content level of the project. This, however, would need to be verified in further studies. RQ2 Summary. The tone in the comments is mostly positive and mostly specific terminology from youth language is used. Also, political events like the BLM movement are referenced. We identified topics from pop culture as well as from society in Scratch projects. The tone of the discussions around these projects was mostly positive. The intersection of those two aspects are not only relevant for research purposes, but also for educational use. Since we discovered a rather unexpected variety of topics (RQ1), and also controversial topics like the BLM movement in the comments (RQ2), we hypothesize that socially and political controversial topics might be discussed more controversially in the comments in Scratch. In this context, likes, loves and remixes might be helpful to determine which project topics are particularly popular and the extent to which the community approves or disapproves these topics. Our findings in detecting cultural and political topics provide the foundation for investigating in detail whether Scratch also represents a political network where different political positions are present and debated. In this respect, since we know that gender-specific preferred themes exist [14] and since we have identified such topics with shiny and gloomy fantasy worlds (RQ1) in this paper as well, a social network analysis in combination with our approach would help to determine whether certain user groups concentrate on certain topics. In the current discourse on socialization in SE, identifying influencers of the Scratch community, their range of topics, their commenting behavior would provide insights into the relationship between social and programming behavior [27] . Similar to social media, it is necessary to understand in Scratch with what topics and attitudes children and adolescents are engaged or confronted. Since we identified a broad range of diverse topics in RQ1, we can learn what motivations there actually are for children to program in Scratch [2, 14] in order to ensure that children, especially underrepresented groups such as girls, are initially attracted to introductory courses in the first place and also to follow up on children's continued long-term interest after they have been introduced to Scratch [12] . As the results of RQ2 imply, there is the potential to incorporate Scratch into the classroom besides CS also for cross-curricular education, e.g., when discussing politically relevant topics in the projects and also highlighting different perspectives in the commentary narratives. With regard to digital literacy, which is becoming increasingly important, the children can be made aware of how social media operate using the Scratch platform and be familiarized with real social networks. On the one hand, Scratch can be used to playfully demonstrate the strengths of social media, e.g., in the exchange of controversial topics, but also where it is important to be careful, e.g., with negative comments. Since we observed some phrases like good job (RQ2), it may be possible to determine from the comments whether a project receives a lot of positive or negative feedback and loves and likes primarily because of its topic, or because of its implementation; this would be of relevance from a computer science education point of view. In this context, it is also relevant whether the comments are purely affective, which may be substantive in terms of mood, but do not provide any information about the nature and manner of the project, i.e., whether there is also constructive criticism and suggestions for improvement of the implementation in the comments. In this paper we explored whether social network analysis can be applied to Scratch. We particularly found topics from net and pop culture, but there are also references to societal topics, such as the Covid-19 pandemic. This supports the interpretation of Scratch not just as a programming language, but also as a segment of society, which provides many opportunities for further interdisciplinary studies. For example, likes, loves and remixes could be analyzed with the aim of determining which topics are particularly liked or shared. In an educational context, the correlations between project themes and the degree of experience of the user might be relevant for introductory courses in computer science didactics. User groupspecific preferences might be identified in order to enable children, especially girls, to be better supported on an educational level with regard to their enthusiasm towards programming. Since the topic modeling was challenging, a supervised machine learning approach might support more concrete topic classification and predicting thematically trends on the platform in future research. This work is supported by the Federal Ministry of Education and Research through project "primary::programming" (01JA2021) as part of the "Qualitätsoffensive Lehrerbildung", a joint initiative of the Federal Government and the Länder. The authors are responsible for the content of this publication. Top concerns of tweeters during the COVID-19 pandemic: infoveillance study How Kids Code and How We Know: An Exploratory Study on the Scratch Repository Top2vec: Distributed Representations of Topics What are developers talking about? An analysis of topics and trends in Stack Overflow Deep LDA : A New Way to Topic Model Building reputation in StackOverflow: An empirical investigation All lives matter, but so does race: Black lives matter and the evolving role of social media Hate speech review in the context of online social networks An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit Man vs Machine -A Study into Language Identification of Stack Overflow Code Snippets Need for Tweet: How Open Source Developers Talk About Their GitHub Work on Twitter Youth Computational Participation in the Wild: Understanding Experience and Equity in Participating and Programming in the Investigating COVID-19 News Across Four Nations: A Topic Modeling and Sentiment Analysis Approach Data-driven Analysis of Gender Differences and Similarities in Scratch Programs Sentiment analysis of commit comments in GitHub: An empirical study Detection of Cyberbullying Incidents on the Instagram Social Network Influence analysis of Github repositories Vader: A parsimonious rule-based model for sentiment analysis of social media text BitCoin Meets Google Trends and Wikipedia: Quantifying the Relationship between Phenomena of the Internet Era Langenscheidt. Hä?? Jugendsprache unplugged Sentiment analysis for software engineering: how far can we go My Invisalign Experience": Content, Metrics and Comment Sentiment Analysis of the Most Popular Patient Testimonials on YouTube The Scratch Programming Language and Environment Twitter for sparking a movement, reddit for sharing the moment:# metoo through the lens of social media Thou shalt not hate: Countering online hate speech Gender differences in participation and reward on Stack Overflow Examining the Relationship between Socialization and Improved Software Development Skills in the Scratch Code Learning Environment A benchmark study on sentiment analysis for software engineering research Sentiment and Emotion in Software Engineering Mother's Day, Warrior Cats, and Digital Fluency: Stories from the Scratch Online Community Blind Spots in Youth DIY Programming: Examining Diversity in Creators, Content, and Comments within the Scratch Online Community Novice Programmers Talking about Projects: What Automated Text Analysis Reveals about Online Scratch Users' Comments Public Opinions towards COVID-19 in California and New York on Twitter Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection Research Note: Spreading Hate on TikTok