key: cord-0062072-lyk9seq5 authors: Liu, Caixia; Zou, Di; Chen, Xieling; Xie, Haoran; Chan, Wai Hong title: A bibliometric review on latent topics and trends of the empirical MOOC literature (2008–2019) date: 2021-04-17 journal: Asia Pacific Educ DOI: 10.1007/s12564-021-09692-y sha: fe5587bc012233edc05ed95ab12280e2bfa1e1ca doc_id: 62072 cord_uid: lyk9seq5 Massive Open Online Courses (MOOCs) have become a popular learning mode in recent years, especially since the outbreak of COVID-19 in late 2019, which had resulted in a significant increase in associated research. This paper presents a bibliometric review of 1078 peer-reviewed MOOC studies between 2008 and 2019. These papers are extracted from three influential databases, the Web of Science (WOS), Scopus, and the Education Resources Information Center (ERIC). The MOOC literature analysis with a bibliometric approach identified the research trends, journals, countries/regions, and institutions with high H-index, scientific collaborations, research topics, topic distributions of the prolific countries/regions and institutions, and annual topic distributions, after which the representative research and research implications were discussed. This review gives researchers a deep and comprehensive understanding of current MOOC research and identifies potential research topics and collaborative partners, which supports MOOC-related future research. The COVID-19 outbreak in late 2019 put online learning back in the spotlight and made online education one of the hottest topics in education. In the past few years, the rapid development of information and communication technology has resulted in major changes in education delivery, with online learning developing rapidly. Compared with traditional learning, online learning has fewer time and space constraints, making learning more flexible for both teachers and learners. As a typical online education form and a powerful substitute for the classroom, MOOC, an acronym for Massive Open Online Course, is an online course for the public and the latest development of distance education (Deng & Benckendorff, 2021) . MOOCs originated in Canada in 2008 when the 12-week "Connectivism and Connective Knowledge" course was facilitated by Stephen Downes and George Siemens at the University of Manitoba (Boyatt et al., 2014; De Waard et al., 2012) . The "Massive" in the MOOC acronym indicates that there are no enrollment limitations and the "Open" indicates that learners are free from geographical constraints, course sizes, temporal boundaries, entry requirements, or financial restraints (Dodson et al., 2015) . "Online," of course, refers to learning through the internet (Thompson, 2011) . Downes (2008) categorized two main types of MOOCs: networks of distributed online resources (cMOOCs) and structured learning pathways centralized on digital platforms (xMOOCs). cMOOCs are based on connectivism learning theory (Siemens, 2004) , which emphasizes creation, creativity, autonomy, social networking, and connected and collaborative learning (Saadatdoost et al., 2015) , whereas xMOOCs have more traditional classroom settings, the instructor and learner roles are differentiated, and the courses are similar to formal university courses, with a combination of pre-recorded video lectures with quizzes, tests, and other assessments (Rabin et al., 2019) . In sum, xMOOCs are centered on professors rather than a community of students (Online Education Blog of Touro College, 2013) and focus on knowledge duplication (Dodson et al., 2015; Siemens, 2012) , and cMOOCs focus on knowledge creation and generation. In the past few years, there has been increased research interest in MOOCs. This study took a bibliometrics approach to review the MOOCs academic research with the aim of providing a deeper understanding of the research status, trends, and priority topics, and to provide guidance for future research. Therefore, this study was driven by the following research questions. (1) What was the annual trend of MOOC research? (2) Which journals, countries/regions, and institutions were the major MOOC research contributors? (3) What were the scientific collaborations among major countries/regions and institutions? (4) What were the main research topics of empirical MOOC studies? (5) What have the topic distributions and the annual topic distributions been in the prolific countries/regions and institutions? After reviewing MOOC-related research in the section of Literature review, the Methods section introduced the bibliometrics review method. Then the Results section presented the analysis of the descriptive and qualitative statistics, such as the article and citation counts, the most prolific countries/ regions and institutions, the scientific collaborations, the main topics, trends, and correlations, annual topic distributions in the most prolific countries/regions and institutions. The Discussion section provided an in-depth discussion, the limitations of this study, and the possible areas for future research, and the Conclusion briefly reviewed the main points of this paper. MOOC research was analyzed from macro-and micro-perspectives to identify the macro-development trends and the specific (micro) research directions or issues, respectively. The macro-perspective of the MOOC review focused on the issues of MOOC itself, such as the related literature number, MOOC classification, research methods, topics, annual trend, and social ethics. Liyanagunawardena et al. (2013) conducted the first review of 45 MOOC research articles published from 2008 to 2012 in academic journals, for which a quantitative analysis was conducted on article classification, contributor distribution, annual research trends, MOOC classifications, and possible future research directions. Similarly, Veletsianos and Shepherdson (2016) analyzed 183 articles published from 2013 to 2015 using both qualitative and quantitative methods and came to three main conclusions: (1) most articles were by American and European researchers; (2) only a few papers were widely cited with nearly half not cited, and (3) quantitative methods were more favored with the data mainly collected using surveys and automated methods. However, the research was based on a very small portion of the available data, which restricted the understanding of MOOCs. Different from these two reviews, Saadatdoos et al. (2015) explored and analyzed 32 MOOC research studies from education and information system perspectives, from which a holistic MOOCs definition was derived and relevant theories and issues extracted, which significantly contributed to the creation of a MOOCs research domain structure; however, this study lacked any broader, deeper analysis of MOOC research institutions, collaborations, and other factors. Ebben and Murphy (2014) analyzed 25 empirical studies from 2009 to 2013 that chronologically conceptualized MOOC scholarship themes under (1) connectivist MOOCs, engagement, and creativity from 2009 to 2011/2012; and (2) xMOOCs, learning analytics, assessment, and critical discourses on MOOCs from 2012 to 2013. However, the research only had a MOOC scholarship perspective and only a limited number of papers were reviewed. With a focus on MOOC research methods and topics, Zhu et al. (2018a, b) conducted a systematic review of 146 empirical MOOC studies in five key journals from 2014 to 2016, for which they divided the research methods into quantitative, qualitative, and mixed methods to reveal the relationships between research topics and research methods, and then comprehensively analyzed the trends, research methods, author locations, MOOC delivery countries, and primary journals; however, only a limited number of articles were extracted from Scopus and only a three year time period was examined, which limited the research findings. Deng et al. (2019) conducted a narrative review of 102 MOOC research articles published between 2014 and 2016 using a Perceive, Process, Perform (3P) Model focused on learner factors, teaching contexts, learner engagement, and learning outcomes, and found that there was little evidencebased research on the non-mainstream MOOCs consumers, there was an oversimplification of the role of the learner factors in the evidence-based MOOC research, and that research between teaching and learning helped progress the understanding of MOOC research. However, the focus of the analysis was on the findings rather than on the methodological approaches and the use of the 3P model introduced some ontological constraints. Rolfe (2015) conducted a systematic review of 68 pre-2014 MOOC focused articles from a socio-ethical perspective and developed a socioethical dimensional MOOCs framework that encompassed MOOC pedagogy and quality, the social inclusion afforded by MOOCs, learner diversity and equality, and the digital and social media literacy of the open learners. However, as there have been many more MOOC articles since 2015, further reviews and research are needed. Detailed information on the review articles: research keywords, databases, article types, time ranges, methodologies, and article numbers, is shown in Table 1 . The micro-perspective of the MOOC review studies mainly focused on particular aspects/topics related to MOOC users, such as student participation, active learning strategies, engagement and retention, academic engagement, and selfregulated learning, with the greater number of these studies being conducted since 2018. For example, based on 38 articles from 2012 to 2015, Joksimović et al. (2018) conducted a systematic review of the approaches to model learning in MOOCs that specifically examined the approaches to defining and measuring learning outcomes, learning contexts, student engagement, and the association between the identified metrics and measured outcomes, after which a framework was suggested to study the associations between the contextual factors such as demographics and classrooms and individual needs, student engagement, and learning outcomes. Paton et al. (2018) analyzed 38 articles from 2013 to 2017 focused on learner engagement and retention in vocational MOOCs education and training, from which six functional approaches were identified to improve learner retention and promote engagement: (1) good quality instructional course design; (2) well-developed assessment tasks aligned with course objectives; (3) learner collaboration opportunities; (4) instructor commitment to timely contextualized communication; (5) course achievement certifications, and (6) further study pathways. As both the above reviews analyzed 38 articles within a narrow time frame of about 3-4 years, there was a need to extend current empirical knowledge to explore more findings. Davis et al. (2018) investigated 126 MOOC studies published between 2009 and 2017 from an active learning perspective and found that the three most effective active learning strategies were cooperative learning, simulations and gaming, and interactive multimedia, and Guajardo Leal et al. (2019) focused on MOOC learning engagement and reviewed 176 articles published from 2015 to 2018, finding that most related articles were from the United States, Australia, and the United Kingdom and most had employed qualitative exploratory methods. Since 2019, there has been a greater MOOC research focus on self-regulated learning (SRL). For example, presented a systematic review of empirical research on SRL in MOOCs focused on the effects of SRL on learning, SRL strategies, and SRL interventions, and suggested some MOOC designs to promote SRL. Wong et al. (2019) conducted a systematic review on SRL that paid greater attention to the human factors in SRL, such as prompt feedback, integrated support systems, and other human factors, finding that human factors (e.g., gender, cognitive abilities, prior knowledge) played an important role in effective SRL, which suggested that to provide the support that best fits each individual learner, learning analytics could be used. However, these SRL-related reviews only examined between 21 and 42 articles; therefore, as there has been an increase in SRL-related MOOC literature in recent years, this topic needs further exploration. Table 2 lists the related review article information: topic, scope, methodology, number, and journals. Therefore, while there have been significant MOOC review research studies, there have been some limitations. First, most studies employed systematic rather than bibliometrics reviews, and although Zheng and Yang (2017) claimed that their study was a bibliometrics MOOC review, the research mainly focused on the development trends and popular topics in a four-year span, and while they tracked the evolution of MOOC studies using statistics and identified the popular subjects using a co-word network atlas based on keywords, the research lacked a larger scope or a deeper exploration of MOOCs. Second, past reviews have tended to examine less than 100 papers and only three-to-four-year time periods, which could have hindered the effectiveness of any statistical analyses. As shown in Table 1 , almost all reviews were published before 2017, with very few studies from macro-perspectives having been conducted in the past three years. Third, little past review research has systematically conducted topic analyses, topic distributions, and the cooperative research. In contrast to these earlier studies, this bibliometrics study examined 1078 studies over 10 years and, therefore, provides a more detailed picture of MOOC topics, development trends, cooperative partners, collaborative organizations, topic distributions of the prolific countries/regions and institutions, annual topic distributions, and further discussed the representative research work, and research implications. The most used method in MOOC-related review research has been a systematic review approach that describes and justifies the paper identification methods in such a way that it can be replicated (Fink, 2010; Liyanagunawardena et al., 2013) . The bibliometrics approach focuses on "the application of mathematics and statistical methods to books and other media of communication (Pritchard, 1969, p. 349) ." Howkins (1981) claimed that Bibliometrics implied the quantitative analysis of the bibliographical features of the body of literature. That is to say, bibliometrics utilizes quantitative analysis and statistics to describe patterns of publication within a given field or body of literature and has been considered as an effective statistical method for evaluating scientific publications . This paper, therefore, adopted a bibliometrics approach to conduct a qualitative analysis of related MOOC research. The data were collected from a search of journal articles published from 2008 to 2019 in three electronic databases: the Web of Science (WOS), Scopus, and the Education Resources Information Center (ERIC). The search strings "MOOC(s)," "Massively Online Open Course(s)," and "Massive Online Open Course(s)" were used to screen the titles, abstracts, and keywords, and specific criteria applied to ensure relevance. For example, the selected studies had to be MOOC English language empirical studies in peerreviewed journals. The specific inclusion and exclusion criteria are listed in Table 3 and the flowchart for the dataset acquisition is shown in Fig. 1 . The data search from WOS, Scopus, and ERIC identified 1078 articles with the analysis focused on the five research questions. To answer question #1, the annual number of MOOC empirical articles published between 2008 and 2019 is calculated and the curves of 11 annual numbers fitted to determine the MOOC research rules and trends. To answer question #2, the major contributors to MOOC research were calculated and programmed. To answer question #3, a social network analysis approach (Bastian et al., 2009 ) was taken to analyze and visualize the collaborative scientific research relationships and the prolific countries/regions and institutions. To answer question #4, structural topic modeling (STM) based on the R package was employed (Chen et al., 2020a, b, c; Roberts et al., 2014a, b) to identify the topics of the 1078 articles from the abstracts. To answer question #5, a graphing tool named Cluster Purity Visualizer (Swamy, 2016) was first implemented to obtain a basic distribution graph of the topic distributions for prolific countries/regions and institutions. Then, the JavaScript packages d3.v3.js 1 and clusterpurityChart.js 2 were used to conduct the layout adjustment and coloring of the basic graph. The analysis results were displayed by article and citation counts, the prolific countries/regions and institutions, topic identification, trends, and correlations, prolific country/ regional and institutional distributions, scientific collaborations, and the annual topic distributions. Figure 2 shows the annual empirical MOOC research counts from 2008 to 2019, from which it can be seen that before 2013, there were significantly fewer MOOC-related articles as MOOC theory was developing at this time; however, from 2013, there was increasing academic interest as MOOC theory was evolving. By 2019, the number of published papers was around three times greater than in 2014. When the annual number of published articles was fitted (y = 3.305128x 2 − 12,879.64x + 12,939,030 with R 2 = 0.974, p = 4.605 × 10 -7 ), the results showed a parabolic function with the right part of the curve exhibiting a galloping increasing trend. Table 4 lists the top journals ranked by the H-index. Four bibliometric indicators were employed to evaluate the most prolific countries/regions and institutions: H for Hirsch index (Hirsch 2005) ; A for article count; C for citation count; and ACP for average citations per article. H (32), which indicated that these two journals have had a significant influence on MOOC research. The International Review of Research in Open and Distance Learning had the highest ACP, and although this journal had only ten MOOC articles, the citation counts were 2184, indicating that these MOOC articles were of high quality and had significant influence. The Internet and Higher Education ranked The top 11 countries/regions ranked by the published article numbers are listed in Table 5 . The 11 most prolific countries/regions contributed 876 articles or 81.26% of the total (1078). The USA (51), UK (28), and Spain (27) were the most prolific, and China, Australia, and Canada each contributed 22 papers. Canada (75.12) ranked first for the ACP, followed by the UK (49.76), Australia (48.90), and the USA (40.35). Table 6 shows the top 11 institutions ranked by H-index, which together contributed 15.58% of the total articles. Of these 11 institutions, five were from the USA and two were from the UK. Purdue University (PU), the Massachusetts Institute of Technology (MIT), Pennsylvania State University (PSU), and Harvard University (HU) were the most prolific institutions, further indicating the USA's dominant MOOC research position. MIT, PSU, and HU had the top three H-indices, MIT ranked first for the citation count (2393), followed by HU (2058) and the Open University (OU) (1720), and MIT ranked first for the ACP (about 132.9), followed by HU (about 114.3) and the OU (about 122.9). Social network analysis (Bastian et al., 2009 ) was used to visualize the collaborative scientific research relationships between the most prolific countries/regions and institutions. The collaboration networks were built using Gephi, 3 open-source software for graph and network analysis. The analysis was conducted in three steps. First, the input data covering a node sheet and an edge sheet were prepared. The node sheet had four columns: id number, label for countries/ regions and institutions, group for indicating continent of countries/regions, or countries/regions of institutions and authors, and article count size; and the edge sheet had three columns: the source and the target for the corresponding coauthorship pairs, and the weight of the collaborative article numbers. Second, the node and edge sheets were used to visualize the co-authorship network using the Fruchterman Reingold algorithm. Finally, the node size and node color were configured based on the article count and group data. The countries/regions and institutions were represented using different node sizes and colors, with the node size denoting the corresponding article number, and the node color indicating the continents to which the corresponding country/region belonged. Figure 3 shows the collaborative network for the 30 most prolific countries/regions, each of which had greater than ten published articles. The collaborative network had 30 nodes and 112 links. It can be seen from the node size that the USA had the largest number of articles (266) and had collaborated with 23 countries/regions, followed by China (172), Spain (116), and the UK (103), each of which had respectively collaborated with 11, 18, and 15 countries/ regions. The USA collaborated with the most countries/ regions, followed by Spain, the UK, the Netherlands, China, Australia, and Germany, with the number of collaborated articles being, respectively, 87, 46, 48, 43, 43, 31, and 22 . Therefore, the USA collaborated on the most articles, with China and Canada being the most important partners with 15 and 13 articles. China collaborated mostly with the USA, followed by Hong Kong and Canada. Spain collaborated with 18 countries/regions; nine articles with the Netherlands and five with Chile; and in addition to the countries already mentioned, Canada, Turkey, France, Sweden, and Belgium also closely collaborated with other countries/regions. The collaborative scientific research relationships between the 37 most prolific institutions are illustrated in Fig. 4 . The most prolific institute was the University of Technology Malaysia (UTM) with 26 articles, followed by PU (21), PSU (20), MIT (18), University Carlos III of Madrid (UCM) (18), and HU (18), each of which had respective collaborations of 0, 4, 1, 10, 10, and 6 articles. Therefore, although the UTM ranked first for the number of published articles, it had no collaborative relationships with other institutes. Of the 37 most prolific institutions, the University of Edinburgh (UE) collaborated with the most institutions (6), with the University of South Australia (UniSA) being its main partner for five of these six articles. Semantic coherence is based on the frequency of individual words and the co-occurrence of the frequency of different word pairs and is maximized when the most probable words in a given topic frequently co-occur (Silge, 2018) . If words have a high probability of appearing in a topic and a low probability of appearing in other topics, the corresponding topic is considered exclusive (Kuhn, 2018) . Figure 5 shows the semantic coherence and exclusivity scores for 26 topics with the topic numbers ranging from five to 30. In the figure, each point represents a model with its name and indicates how many topics were considered. For example, the point labeled "15-topic model" represents a model fitted with 15 topics. It can be seen that 14 and 15 topics achieved higher semantic coherence and exclusivity values, which indicated that more potential terms within the topic occurred in the same document and more terms were exclusively affiliated with the single topic. Two domain experts independently compared models with different numbers of topics by inspecting the representative terms and articles (Jiang et al., 2018) , and finally a 15-topic model was identified for the qualitative evaluation as this number of topics was found to have the greatest semantic consistency within the topics and exclusivity between the topics. Based on the estimated article-topic and topic-term distributions, the probability of an article or term belonging to a topic was determined, with the most representative articles and terms in a single topic receiving the highest assignment probabilities. Table 7 shows the 15-topic STM analysis results with the representative terms, the topic proportions within the whole corpus, the suggested topic labels, and the topical trends. The six most-discussed topics were educational data mining and visualization (10.51%); cMOOCs and healthcare MOOCs (8.37%); MOOCs for languages (7.64%); demographic features of MOOC learners (7.62%); peer and formative assessment (7.30%); and flipped learning for MOOCs (7.03%). In Table 7 ↑(↓) indicates not significant (p > 0.05) increasing (decreasing) trends, and ↑↑(↓↓), ↑↑↑(↓↓↓), and ↑↑↑↑(↓↓↓↓) indicate significant increasing (decreasing) trends at, respectively, p < 0.05, p < 0.01, and p < 0.001. Therefore, educational data mining and visualization, learner perceptions and satisfaction, business and entrepreneurship for MOOCs, and SRL had significantly increasing trends while the remainders were not significantly increasing and some were decreasing. MOOCs for languages, regional and local MOOC practices and research, flipped learning for MOOCs, teacher education, course gamification and recommendations, peer and formative assessments, and xMOOCs were found to have increasing trends, but MOOCs for institutions, demographic features of MOOC learners, semantic data and finance MOOCs, and cMOOCs and healthcare MOOCs were found to have decreasing trends. In Table 7 , educational data mining and visualization had the most significant increasing trend. To mine specific, deeper content on this topic, the representative terms in the dataset were further analyzed, from which it was statistically found that the educational data mining and visualization topic was focused on three main factors: analytics (analysis), behavior, and prediction. The related analytics (analysis) factors could be divided into two: the technique, methodology or tools, such as big data analyses and qualitative and quantitative analyses; and research content, such as study pattern analysis and video learning analytics. In terms of behavior, the studies showed a particular interest in learner community behaviors, learning behaviors, behavior modeling, and video-watching behaviors. As to the prediction, the related research included student dropout and performance predictions, learning behavior predictions, predictive analytics, retention rate predictions, grade predictions, and student retention predictions. Increasingly, more research was focused on data The annual proportions within the whole corpus of identified topics are visualized in Fig. 6 . The first focus was on the topic evolution, which was identified using a topic model. The main significantly increasing trends were learner perceptions and satisfaction, educational data mining and visualization, and SRL. The evolution curve for peer and formative assessment reached a peak in 2013, which indicated that this MOOC topic had attracted the most research attention in 2013 but had fallen out of favor by 2015. Research interest in MOOCs for institutions, the demographic features of MOOC learners, business and entrepreneurship for MOOCs, and MOOCs for institutions first fell, then rose, and then fell again. There were two distinct peaks for flipped learning for MOOCs, MOOCs for language, and xMOOCs in 2013 and 2018, 2013 and 2015, and 2013 and 2017 . Figure 7 shows the topic distributions for the top nine countries/regions and institutions ranked by the H-index and the annual topic distributions. Figure 7a shows the particular research topics for each prolific country/region or institution. Educational data mining and visualization was the most active topic in the USA, at UCM, and at PSU. The research interest in the UK, Canada, and Turkey was cMOOCs and healthcare MOOCs, in China was flipped learning for MOOCs, in Taiwan was learning perceptions and satisfaction, in the Netherlands and PU was teacher education, and in Spain, the OU of the Netherlands, and Anadolu University (AU) was the demographic features of MOOC learners. The annual topic distributions are shown in Fig. 7c , in 2009, the demographic features of MOOC learners had the greatest research focus; in 2011, cMOOCs and healthcare MOOCs were the most popular topics; and in 2012, the demographic features of MOOC learners and semantic data and finance MOOCs were the most popular. This review examined 1078 studies to reveal the interesting trends and hidden relationships in MOOC research up to 2019. Research methods and data collection methods were examined using descriptive and quantitative statistics, which included analyses of article counts, citation counts, prolific countries/regions and institutions, scientific collaborations, topic identification, trends and correlations, prolific country/regional and institutional topic distributions, and annual topic distributions. These findings provide important The representative empirical MOOC research articles revealed the MOOCs research trends; therefore, to better understand each topic, in this section, the most representative research work in each topic is further analyzed. The research on educational data mining and visualization was mainly focused on using educational data mining techniques to predict, analyze, or explore the issues related to MOOCs, such as academic performances or behaviors. For example, An et al. (2019) explored the learning resource mention identification in MOOC forums using an LSTM-CRF model and evaluated the strategies using a dataset from the Coursera online forum. This paper provided solutions to identifying resource mentions for real learning resources and demonstrated a classic educational data mining research mode. The research focus for MOOCs for languages was on the language learning MOOC users or the courses. For example, Mustikasari (2017) used a descriptive qualitative approach to investigate MOOC English teaching materials and the professional teaching development provided by joining MOOC, concluding that developing a MOOC for Madrasah English teachers was challenging and providing suitable teaching materials was vital; therefore, this paper was useful in highlighting the importance of MOOC materials and the willingness of English teachers to develop MOOCs, and was beneficial to MOOC language research. Peer and formative assessment focused on MOOC peer reviews, assessments, and assessment tool development. Meek et al. (2017) discussed MOOC peer reviews by investigating student participation, performance, and opinions in a MOOC peer-review task by evaluating student topic summary data using a qualitative peer-review process that compared the summarizes to student demographic data and performance, and found that the student opinions regarding the usefulness of the peer-review tasks were mixed, concluding that instructional design strategies were needed to improve Zhu et al., (2018a, b ) developed a small private online course-based flipped classroom teaching model that was driven by curriculum ontology, which they applied to a teaching plan and verified in the Electronic Commerce MOOC, which is a valuable reference for hybrid teaching. Learner perceptions and satisfaction research has mainly tended to examine the perceived behaviors, satisfaction, and intentions associated with MOOC use; for example, Wu and Chen (2016) used a framework that integrated a technology acceptance model and a task fit technology model to examine the factors influencing MOOC adoption and investigate MOOC continuance intentions. Therefore, this focus assisted researchers to gain a better understanding of learner perceptions and satisfaction. SRL research was focused on how learners guide their learning in terms of effectiveness, strategies, etc., in the MOOC learning environment. For example, Onah and Sinclair (2017) investigated and assessed SRL using a MOOC platform (eLDa) to compare self-directed learning and instructor-led learning, and concluded that self-directed learning was able to provide learners with better SRL skills. Business and entrepreneurship for MOOCs research has tended to focus on entrepreneurship and business courses, with most articles using empirical cases to design entrepreneurship MOOCs or verify suitable MOOC platforms to teach or develop entrepreneurship. To understand how the inclusion of issues related to entrepreneurship in MOOCs could positively impact participants, Beltrán Hernández de Galindo et al. (2019) analyzed the incorporation of entrepreneurial competencies in MOOCs to develop educational innovation and collaborative project attributes and investigated whether MOOC discussion forum interactions had resulted in entrepreneurial opportunities. Teacher education research has looked at various elements associated with teacher development. For example, Kennedy and Laurillard (2019) examined the use of co-design models in MOOC projects to deliver teacher professional development (TPD) and developed a ToC model that could be applied to TPD for mass displacement, which could assist in the professional development needs of MOOC teachers with MOOC in mass displacement. Course gamification and recommendation research has mainly examined the elements associated with MOOC course improvements, such as gamification and recommendations. For example, in a classic research case focused on using information technologies to generate content recommendations, Pang et al. (2018) proposed an adaptive recommendation for MOOC that had scoring and learning durations as features and combined collaborative filtering techniques and time series to improve recommendation accuracy, which better satisfied the learners and reduced dropouts. xMOOC research has focused on the development of evaluation criteria. For example, Nkuyubwatsi (2016) examined the learning materials, activities, assessments, and scalability in ten xMOOCs, the findings from which could inform open education policies and practices. Regional and local MOOC practices and research has generally focused on empirical or case studies in a specific region. For example, Aljaraideh (2019) conducted a case study on respondents from universities in Jordan to explore the challenges and benefits of using MOOCs in higher education, identified the possible barriers to MOOCs at Jerash University, and found there was general acceptance by faculty that the MOOCs would be an advantage for users. cMOOCs and healthcare MOOCs research is specifically focused on cMOOCs and healthcare education. For example, Li et al. (2016) analyzed the content of messages posted by learners and instructors in online course learning spaces for a case study, the findings from which provided valuable information on student difficulties and needed support strategies for cMOOC learning. Research on the demographic features of MOOC learners has examined the specific characteristics of MOOC learners. For example, Lee and Chung (2019) analyzed K-MOOC learner data: number of participants, average completion rate, and participant backgrounds, provided by the National Lifelong Learning Agency and compared Korea's K-MOOC and the United States' edX. Research on MOOC learners' demographic features can reveal the current state of MOOC programs and address possible issues. The MOOCs for institution research focus have been on the development or application of MOOCs in some institutions and institutional cooperation. For example, Glencross and St Denny (2017) investigated the MOOC application for voting in the UK referendum on EU membership, which contributed to the public understanding of and engagement with EU-related politics and policy issues. With a focus on semantic data and finance MOOCs, Siddike et al. (2017) explored current microfinance MOOC education using a semi-structured interview research strategy, identified the current advantages and possible drawbacks for the adoption of MOOCs for microfinance education, and presented a MOOC framework to offer financial literacy to the poor, with the main findings being able to be extended to other courses. The STM analysis provided future MOOC topic directions. Because of the growth in big data, the most potential new topic is educational data mining (EDM) and visualization. EDM is the analysis of various types of educational data by using statistical, machine learning, and deep learning algorithms (Chen et al., 2020a, b, c; Romero & Ventura, 2010) . EDM and analysis have brought new ways to solve long-term research problems in the field of traditional educational technology. MOOCs are able to continuously record all static and dynamic data throughout the entire teaching activity, such as the number of logins, interactive responses, and the time cost of learning each video, without affecting the activities of either the teachers or the students. Therefore, MOOCs provide effective big data for EDM. EDM harnesses the power of emerging artificial intelligence technologies (i.e., machine learning and neural networks) to mine the MOOCs' big data (i.e., logs) (Chen et al., 2020a, b, c) and conduct practical educational assessments, predictions, and interventions, research employing the MOOC data is expected to remain a research hotspot. However, as learners work directly with the MOOC platforms, their satisfaction is a significant factor affecting the continuous use of such platforms (Lu et al. 2019) ; therefore, to improve service quality, improve evaluation systems, and enhance teaching quality, it is expected that research into learner perceptions and satisfaction will continue to be important. Based on the statistics shown in Table 7 , another potential hot topic in the future is SRL, which is how learners can become masters of their own learning (Zimmerman & Schunk, 2012) . It is an internal mechanism that is composed of learners' attitudes, abilities, and learning strategies. Self-regulated students have been found to select and use self-regulating learning strategies to achieve their desired academic outcomes on the basis of feedback about learning effectiveness and skills (Zimmerman, 1990) . SRL also has a profound influence on the way of teachers interact with the students and learning content organization (Yang, 2020) . Because of MOOCs' time, space, supervision, and management constraints, it is particularly important to ensure students have self-regulating learning abilities. Therefore, exploring the issues surrounding these skills when approaching MOOC learning and content development strategies (e.g., video production, classroom management, and organizational forms) could improve these SRL abilities. The global changes in the research trends inferred from the annual topic distributions and article counts can help researchers assess how government policies, technological developments, and major life changes impact and drive change in the topic of research. In the first two or three years since MOOCs were first proposed in 2008, MOOCs have been in the stage of exploration and development. The emergence of cMOOCs gave rise to innovative pedagogical and technical approaches (Ebben & Murphy, 2014) , which then attracted focused research. As shown in this study, the cMOOCs and healthcare MOOCs ranked first in the 2009-2012 research stage, followed by the demographic features of MOOC learners and semantic data and finance MOOCs. The increasing globalization of health care has highlighted the inadequacy of many health care services around the world; therefore, to meet this need, health care education needs to have an international perspective (Hovenga, 2004) . Online courses such as MOOCs provide a useful platform for the delivery of this type of healthcare education, which is why medical MOOC development has been increasing rapidly, especially in developed countries such as Australia, Canada, and the United Kingdom (Maxwell et al., 2018) . For example, MEDU (www. med-u. org) has hundreds of medical educators and offers virtual patient content. Accordingly, MOOC research on medical education course development, implementation, engagement outcomes, and other practical considerations has been a popular research area since 2009 and since 2013, the number of MOOC studies has increased significantly. Besides the increase in the cMOOC and healthcare MOOC research, from 2013 to 2016, teacher education and educational data mining and visualization research were the second and third most researched areas. In October 2012, the US Department of Education published "Enhancing Teaching and Learning through Educational Data Mining and Learning Analysis," which pointed out that the mining and analysis of educational big data could promote teaching system reform at US colleges, universities and K-12 schools, educational data mining (EDM) has received wide attention, and the research into EDM and visualization ranked first from 2017 to 2019. Research into learner perceptions, satisfaction, and SRL also received significant attention during this period, indicating that the needs, feelings, and experiences of learners and MOOC learning methods were being seen more important, which was consistent with the growth in person-centered or student-centered education (Zucconi, 2016) and the use of smart technology to facilitate interactive teaching and learning (Leverage Edu, 2020). The identification of the scientific collaborations and topic distributions could also assist MOOC researchers to find research partners and funding. For example, the researchers at PU have been primarily focused on teacher education. The university launched a teacher education program (TEP) that includes online Master's degree programs and online certificate programs for teachers and potential teachers. They may provide more practical experience in teacher education in scientific collaboration. For example, Watson from PU provided some research on MOOCs' attitudinal learning in MOOCs by examining the instructors' attitudinal dissonance (Watson et al., 2017) and the learner's attitudinal change in a MOOC (Watson & Kim, 2016) . PSU and UCM have had a greater research focus on education data mining and visualization. For example, Wong et al. (2015) from PSU used a keyword taxonomy approach to analyze large quantities of MOOC forum data and identify the types of learning interactions taking place in forum conversations, and from the UCM analyzed the predictive power for anticipating assignment grades in a MOOC. MOOC research from MIT, HU, the OU of the Netherlands, and AU has mainly focused on the demographic features of MOOC learners. For example, Hansen from HU and Reich from MIT collaborated to analyze course participant features, such as economic background, education, and age using data from 68 MOOCs offered by HU and MIT between 2012 and 2014 (Hansen & Reich, 2015) . DU and MIT collaborated to research MOOC assessment; for example, Comer and White (2016) from DU designed and deployed an English MOOC writing assessment course, concluding that writing assessment could be effectively adapted to the MOOC environment. The UE has published research on MOOCs for institutions, learner perceptions and satisfaction, and cMOOCs and healthcare MOOCs. For example , Skrypnyk, (2015) analyzed the roles of course facilitators, learners, and technology in the flow of information in a cMOOC, and Murray (2014) Since its emergence in 2008, MOOCs have become a popular education-focused research topic. In particular, the online education demand generated by the global COVID-19 pandemic has elevated MOOCs to the forefront of education delivery. This study examined 1078 empirical MOOC articles published between 2008 and 2019 with the aim of assisting MOOC researchers to gain a deeper and more diverse understanding of the current MOOC research foci, trends, and hidden relationships by analyzing the annual article and citation counts, the most prolific countries, regions, institutions, and scientific collaborations, etc. This review provides researchers and educators with a detailed and comprehensive picture of the MOOC research trends and topics up to 2019, which could help them to build upon MOOC studies, address novel and popular topic areas, and find collaborative research partners. However, as this review only focused on the empirical articles published before 2019, there have been many more MOOC focused papers published in 2020, and therefore, further systemic analyses of MOOCs research methods and research topics will be conducted in future research. Funding This research is supported by the FLASS Internationalization and Exchange Scheme (FLASS/IE_A03/18-19) at The Education University of Hong Kong, and the Teaching Development Grant (102489) at Lingnan University, Hong Kong. Data availability Requests for the data can be addressed to the corresponding author. The authors declare that they have no conflict of interest. Ethical approval Approval for conducting this research was received from the anonymous organization. Massive open online learning (MOOC) benefits and challenges: A case study in Jordanian context Systematic review of discussion forums in massive open online courses (MOOCs) Self-regulated learning in MOOCs: Lessons learned from a literature review Resource mention extraction for MOOC discussion forums Gephi: An open source software for exploring and manipulating networks Trends and patterns in massive open online courses: Review and content analysis of research on MOOCs A Bibliometric Analysis of the Research Status of the Technology Enhanced Language Learning A multi-perspective study on artificial intelligence in education: Grants, conferences, journals, software tools, institutions, and researchers. Computers and Education: Artificial Intelligence Application and theory gaps during the rise of artificial intelligence in education. Computers and Education Artificial Intelligence Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of computers & education Adventuring into MOOC writing assessment: Challenges, results, and possibilities. College Composition and Communication Big course small talk: Twitter and MOOCs-A systematic review of research designs 2011-2017 Activating learning at scale: A review of innovations in online learning strategies Merging MOOC and mLearning for increased learner interactions A contemporary review of research methods adopted to understand students' and instructors' use of massive open online courses (MOOCs) What are the key themes associated with the positive learning experience in MOOCs? An empirical investigation of learners' ratings and reviews Progress and new directions for teaching and learning in MOOCs Possibilities for MOOCs in corporate training and development Places to go: Connectivism and connective knowledge The pedagogical quality of MOOCs based on a systematic review of JCR and Scopus publications Unpacking MOOC scholarly discourse: A review of nascent MOOC scholarship Modern education: A significant leap forward Advancement and the foci of investigation of MOOCs and open online courses for language learning: A review of journal publications from Conducting research literature reviews: From internet to paper Remain or leave? Reflections on the pedagogical and informative value of a Massive Open Online Course on the 2016 UK referendum on EU membership Systematic mapping study of academic engagement in MOOC. International Review of Research in Open and Distributed Learning Democratizing education? Examining access and usage patterns in massive open online courses Entrepreneurship competencies in energy sustainability MOOCs An index to quantify an individual's scientific research output Globalisation of health and medical informatics education-what are the issues? Unvocational used of online information retrieval systems: Online bibliometric study Scientific research driven by large-scale infrastructure projects: A case study of the Three Gorges Project in China How do we model learning at scale? A systematic review of research on MOOCs The potential of MOOCs for large-scale teacher professional development in contexts of mass displacement Using structural topic modeling to identify latent topics and trends in aviation incident reports Systematic literature review on self-regulated learning in massive open online courses Lessons learned from two years of K-MOOC experience A case study on learning difficulties and corresponding supports for learning in cMOOCs MOOCs: A systematic study of the published literature Understanding key drivers of MOOC satisfaction and continuance intention to use Massive open online courses in US healthcare education: Practical considerations and lessons learned from implementation Analysing the predictive power for anticipating assignment grades in a massive open online course Is peer review an appropriate form of assessment in a MOOC? Student participation and performance in formative peer review. Assessment & Evaluation in Higher Education Prediction in MOOCs: A review and future research directions Participants' perceptions of a MOOC Developing massive open online course (MOOC): Need analysis of teaching materials for madrasah English teachers Positioning extension massive open online courses (xMOOCs) within the open access and the life long learning agendas in a developing setting Assessing self-regulation of learning dimensions in a stand-alone MOC platform What is the difference between xMOOCs and cMOOCs Adaptive recommendation for MOOC with collaborative filtering and time series Engagement and retention in VET MOOCs and online courses: A systematic review of literature from Statistical bibliography or bibliometrics An empirical investigation of the antecedents of learner-centered outcome measures in MOOCs Methodological approaches in MOOC research: Retracing the myth of Proteus stm: R package for structural topic models Structural topic models for open-ended survey responses A systematic review of the socio-ethical aspects of Massive Online Open Courses Educational data mining: A review of the state of the art Exploring MOOC from education and Information Systems perspectives: A short literature review Application of MOOCs for borrowers' financial education in microfinance Connectivism: A learning theory for the digital age MOOCs are really a platform Training, evaluating, and interpreting topic models Roles of course facilitators, learners, and technology in the flow of information of a cMOOC. The International Review of Research in Open and Distributed Learning Cluster purity visualizer 7 things you should know about MOOCs Who studies MOOCs? Interdisciplinarity in MOOC research and its changes over time A systematic analysis and synthesis of the empirical MOOC literature published in 2013-2015 Enrolment purposes, instructional activities, and perceptions of attitudinal learning in a human trafficking MOOC. Open Learning A team of instructors' use of social presence, teaching presence, and attitudinal dissonance strategies: An animal behaviour and welfare MOOC Analyzing MOOC discussion forum messages to identify cognitive learning information exchanges Supporting self-regulated learning in online learning environments and MOOCs: A systematic review Continuance intention to use MOOCs: Integrating the technology acceptance model (TAM) and task technology fit (TTF) model Cultivation of college english self-regulated learning ability and adjustment of teachers' roles The rise of MOOCs: The literature review of research progress and hot spots of MOOCs education in mainland China A systematic review of research methods and topics of the empirical MOOC literature Design and implementation of curriculum knowledge ontology-driven SPOC flipped classroom teaching model Self-regulated learning and academic achievement: An overview Self-regulated learning and academic achievement: Theory, research, and practice The need for person-centered education Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations