key: cord-0975984-38hrwzlb authors: Ye, Yingxin Estella; Na, Jin-Cheon; Oh, Poong title: Are automated accounts driving scholarly communication on Twitter? a case study of dissemination of COVID-19 publications date: 2022-03-25 journal: Scientometrics DOI: 10.1007/s11192-022-04343-4 sha: 40bc2c9a0ca54c8ea38898602a1600efe6d98c28 doc_id: 975984 cord_uid: 38hrwzlb From a network perspective, this study analyzes 659 users mentioning sampled COVID-19 articles 10 or more times on Twitter with a focus on their roles in facilitating the process of scholarly communication. Different from existing studies, we consider both the user types and the automation of accounts to profile influential users in the network of research dissemination. Our study found that similar to academic users, non-academic users can also be active players in communicating scientific publications. The results highlight the intensive interactions between human users and automated accounts, including bots and cyborgs, which accounted for 45% of connections among the top users. This study also demonstrates the important role of automated accounts in initiating and facilitating research dissemination. Specifically, (1) bot-assisted academic publishers showed the highest amplifier scores, which measures a user’s tendency of being the first to share information and reach out to others within their trusted networks, (2) 5.28% of the selected articles was first tweeted by automated research feeds, ranking the fourth among the 22 classified user groups, and (3) bot-assisted publishers and automated feeds of generic topics and news alerts were highly ranked in authority, a network measure to quantify the degree to which a user consumes important resources of relevant topics. In the conclusion section, we discuss future directions to improve the validity of Twitter metrics in assessing research impacts. The rapid development of technology has expanded the concept of scholarly communication beyond academic publishing to include informal research dissemination and scientific discussion on the social web (Sugimoto et al., 2017) . Researchers and scientists have been actively using social media platforms for academic work, for instance, tracking the latest research trends, connecting or collaborating with fellow researchers, and educating and communicating with the general public (Alshahrani & Rasmussen Pennington, 2018; Hambrock, 2017; Puschmann & Mahrt, 2012) . Serving a variety of audiences, the social web has opened up the boundaries between academia and the general public in scholarly communication (Na & Ye, 2017; Vainio & Holmberg, 2017) . The uptake of the social web in scholarly communication has led to the rise of altmetrics, metrics that track and quantify the attention scholarly works receive on online platforms (Robinson-Garcia et al., 2017) . As a major source of altmetrics, Twitter metrics have caught the attention of many researchers. On one hand, Twitter is believed to have the potential of tracking fast-paced conversations about academic literature and capturing the broader impact of research (Hassan et al., 2017) . For instance, in a qualitative study conducted by Holmberg and Vainio (2018) , respondents, including both academic and the general public, related Twitter mentions of scientific publications to "emotionally engaging topic", "respected publication channel", "timeliness of the topic", "novelty of the topic" and "popularize topic". On the other hand, Twitter metrics have been heavily questioned by academic communities. Robinson-Garcia et al. (2017) expressed their disappointment towards Twitter counts in assessing research impact, as they observed: "obsessive single issue tweeting, duplicate tweeting from many accounts presumably under centralized professional management, bots, and much presumably human tweeting duplicative, almost entirely mechanical and devoid of original thought". To further discuss the validity of Twitter metrics as research impact indicators, it is critical to trace back to the Twitter conversations about academic literature and to learn more about users who are mentioning related documents on Twitter. Taking publications related to COVID-19 as a case study, this study aims to examine the role of various users, specifically those who have generated a large volume of tweets, in the process of research dissemination and scholarly communication. The research questions of the current study are as follows: (1) Who are the most active tweeters citing COVID-19 publications in terms of the number of tweets? (2) What are the patterns of connections among users who mention academic literature on Twitter? (3) What role do different types of users play in the process of research dissemination? Analyzing users by both the type of users and the level of automation in content generation, this study will contribute to the growing literature of Twitter metrics studies. Treating bots and cyborgs as active players in the network of research dissemination, this study aims to draw new insights into their implications on Twitter mentions of academic articles in the context of altmetrics. On a practical level, it aims to provide useful information for altmetrics aggregators and relevant parties, for instance, universities, and funders, who would like to weave Twitter metrics into the fabric of research impact assessment. Additionally, we suggest potential solutions to enhance the validity of Twitter metrics in assessing research impact. This paper is organized as follows: the next section reviews related works, followed by the research method and design. Next, we present our research findings. Finally, the paper will conclude with discussions on the implications of automated accounts on scholarly communication, along with future research directions. As demonstrated in existing studies, a variety of users from different backgrounds communicate scientific publications on Twitter. For example, to characterize profiles of Twitter users citing academic literature, Díaz-Faes et al. (2019) identified four predominant types of users: (1) users who relate their profiles to their personal and private lives, (2) users who use Twitter to express their own opinions and views, (3) users who are members of academic and scientific communities, and (4) users whose profiles reflect their professional roles. Similarly, through a review of related literature, Sugimoto et al. (2017) also recognized the variety of users, i.e. researchers, science communicators, and practitioners, who use Twitter for academic purposes. Another focus of previous research is on locating influential users in the context of Twitter metrics. Adopting a network approach, Said et al. (2019) found that the vast majority of the top 20 users are academic publishers who occupy the central positions in Twitter networks displaying high eigenvector centrality and PageRank centrality. Didegah et al. (2018) shed light on the important role of individual citizens and researchers in the landscape of scholarly communication. Differences were also observed across disciplines. For instance, individual professionals were actively tweeting articles in biomedical and health sciences, while civil society organizations had preferred in life and earth sciences. Previous research has also emphasized the prevalence of bots in tweeting academic literature. Didegah et al. (2018) found that almost two-thirds of Twitter users who tweeted life and earth sciences articles were bots. Similarly, in the study of Robinson-Garcia et al. (2017) , half of the top 25 Twitter users mentioning microbiology articles were detected as bots, contributing 4% of tweet mentions to the pool of their sample data. Analyzing Twitter accounts with the handle "arxiv" Haustein et al. (2016) identified that 47 out of the 51 sampled accounts were automated platforms and topic feeds, producing 87,389 (87%) and 10,040 (10%) tweets respectively. Yu (2017) , assessing the Twitter altmetric data, attributed the high discrepancy between the number of posts and the number of unique users, high as 30,000, to the excessive bot activities. In summary, to provide a comprehensive picture of scholarly communication on Twitter, it is important to recognize the heterogeneity of participants, including the existence of automated accounts. To enrich the current literature on Twitter mentions by characterizing the role of users in the research dissemination process, this study considers both the users' social background as well as the automation of their accounts. In addition, we pay special attention to the implications of automated accounts on scholarly communication. Viewing automated accounts as active players in social networks, network analysis has been fruitfully employed in studies on automated algorithms like bots. Existing studies have provided evidence on the hyper-social nature of bots through Twitter network analysis. For example, Kušen and Strembeck (2020) , extracting network motifs from user interactions on Twitter during riot events, observed emotional exchanges between bots and humans. By analyzing Twitter conversations during three global political events in 2016, Schuchard, et al. (2019) uncovered that bots attempt to initiate interactions with humans and assessed their influences on the dynamics of social networks among users with various centrality measures. These studies commonly suggest the possibility that automated Twitter accounts can affect and facilitate the process of scholarly communication on Twitter. However, most previous studies exclusively focus on the quantitative aspects of bots, such as the number of bots or the volume of tweets that they generate, neglecting the role automated accounts play in scholarly communication or research dissemination. To fill this research gap, our study treats automated accounts as equally important as human users and examines their involvement and influence in the process of research dissemination. In addition to the volume of tweets that a user has generated, node centrality measures are commonly used to assess the importance of a user in the network of Twitter metrics. For example, Lee et al. (2017) adopted various centrality measures, including degree, betweenness, eigenvector centralities, and PageRank to identify influential users in the Twitter networks tagged with the official AoIR (Association of Internet Researchers) conference hashtags. Similarly, eigenvector centrality and PageRank were used by Said et al. (2019) to examine users who have the highest influential power. A similar approach was utilized in the study of van Schalkwyk et al. (2020) , where the researchers mapped degree centrality with the popularity of a user and whereas betweenness centrality with the extent to which a user is bridging different communities in the network. The current study adopts a similar approach. Drawing upon the above-mentioned studies, and the analysis of centrality measures across different networks by Oldham et al. (2019) and Newman (2018) , we selected five node centrality measures to examine the influence of users who have mentioned sampled academic works on Twitter (Table 1) . Besides, additional analysis was conducted to understand the role of various users in maintaining the connectivity of networks through node-removal methods. Indegree and outdegree centrality, the fraction of incoming/outgoing edge that a node is connecting, are considered as popularity. In the context of Twitter communication, nodes with high outdegree centrality are users who serve as idea starters or sources of information, whereas nodes with high indegree centrality can be considered as information consumers or curators Betweenness centrality A measure of how often a given node lies on the shortest path between two other nodes, hence it can be considered as contribution to bridging subgroups or communities in the network. In the context of information dissemination, nodes with high betweenness centrality can facilitate information flow across communities in an efficient way Hubs & Authorities Authorities are nodes that contain useful information on particular topics. Hubs refer to nodes that provide references to the best authorities. In the context of information dissemination, nodes with the highest authorities are the major group of users who are consuming or curating important resources within the dissemination network, while hubs represent influential users who serve as influential sources of information First, we retrieved the publications related to COVID-19 through Scopus using the query string constructed by Kousha and Thelwall (2020) . To examine Twitter users' reactions to the most recent publications, we refined the search results to English-written journal articles published in May 2020. Twitter mentions of selected articles were collected through Altmetric.com, a popular altmetrics service provider, through DOIs, permanent identifiers of articles. 765 articles out of 1252 articles from Scopus were matched, among which 86% of articles received at least one Twitter mention. We further narrowed down our sample to include articles with over 10 Twitter mentions, accounted for 58% of the matched articles. As we did not have full access to all tweets of articles with over 10,000 Twitter counts Altmetric Explorer while collecting the data, we excluded articles with over 10,000 Twitter counts (n = 4) from the dataset. Inconsistences of publication date between the information from Scopus and Altmetric.com were observed during the process of data collection. Hence, we crosschecked Crossref API to exclude articles that were not first made available in May 2020. Finally, our data includes 417 articles from various research areas, including health sciences (69.96%), life sciences (17.08%), social sciences & humanities (5.14%), physical sciences (4.94%), and multidisciplinary (2.88%). According to the statistics provided by Altmetric.com, these selected articles had been mentioned in 153,098 unique tweets and by 100,620 unique users as of the date of data collection, June 22, 2020. As some tweets and user accounts were not active at the time of data collection, we extracted information about 151,480 valid tweets from Twitter API. Lastly, utilizing Twitter API, we collected information about related tweets and Twitter users using their Twitter IDs. Figure 1 describes the steps of data collection. To analyze the impacts of different user groups on the process of research dissemination, we profiled top users who generated a relatively large volume of tweets. From the pool of 151,480 tweets, we identified 697 (0.7%) out of 99,619 users who posted more than 10 unique tweets about the sampled COVID-19 publications. In this study, we first classified users by the level of automation of accounts and then divided them into different groups based on information presented in their Twitter profiles, for instance, their occupations, affiliation, type of organizations, etc. First, based on the automation of accounts in content generation, users were classified as humans, cyborgs, and bots based on the definitions used in the study of Chu et al. (2012) and S. Haustein (2016) : (1) Human: A Twitter account with evidence of intelligent and original content. Human accounts usually demonstrate the facile use of language and vivid interactions with other users. It is common among them to share their real-life experience and express views and feeling on Twitter. (2) Bot: A Twitter account that generates Twitter activities via Twitter API or other social media management tools in a repetitive, excessive, and disordered manner. Bot accounts may repeatedly spread or react to content for the same sources, generate an extremely high volume of tweets in a short period or at some time intervals, etc. (3) Cyborg: A Twitter account that shows evidence of both human and bot involvement. Cyborg accounts can be either bot-assisted humans or human-assisted bots. Two coders, one of the authors and an Engineering graduate who has prior usage experience with Twitter, worked independently to classify account types after a training session. The classification of accounts was conducted according to the procedure described below: To tag a Twitter account, the coders first opened the Twitter page of each user, and scanned clues of human or bot behaviors as exemplified above. The coders also considered characteristics such as the use of language, the media content, and the topics covered. To assist with the classification, basic information of each selected user was provided: (1) Twitter user name and screen name, (2) description in the user profile and the URL of the personal page, (3) age of the account (as of on June 1, 2020), (4) counts of followers and friends, (5) statuses count, (6) account verification, (7) location of the user, (8) the number of tweets that the user had created in our dataset, (9) the number of unique articles that the user had tweeted, (10) the number of tweets per article. Additionally, utilizing Botometer API (Yang et al., 2019) , a bot classification tool, we also supplied a set of automation scores for coders' reference. The Botometer display a score (ranging from 0 to 5) which estimates the likelihood that the account is a bot, as well as the Botometer Complete Automation Probability (CAP) score (both universal and English), were presented. Taking all these into account, users were tagged as humans, bots, and cyborgs. Each user should fall into one category that best describes their behavior only. Removing 17 accounts that appeared to be non-existent, suspended or private during our tagging, 659 valid user accounts were classified. Cohen's kappa coefficient between the coders' classification results is 0.75, which indicates an acceptable level of intercoder reliability. The author of this paper was responsible to resolve the conflicts. Second, based on the information presented in users' profiles, such as description and profile URLs, we divided selected top users into eight groups (see Table 2 ). In this study, a network approach is adopted to research the roles of different users in the process of information diffusion, specifically the dissemination of scientific publications. Two types of networks were constructed. First, networks were constructed at the article level. For each network G = (V, E), V represents the users, including both top users and unclassified users (N = 99,619), who have mentioned the article on Twitter, while E corresponds to the direction of information flow between users. To determine the direction of flow, we first extracted all interactions among users that tweeted the article through (1) retweets, (2) replies, (3) @mentions, and (4) quote statuses. From 151,480 tweets collected, 200,029 user interactions were extracted. One tweet may involve various types of user interactions. Among these interactions, the percentages of retweets, replies, @mentions, and quote statuses were 55.70, 6.05, 25.57, and 12.68, respectively. A limitation is that we cannot trace the full paths of retweets as the retweeted status of each tweet is always pointed to the original tweet. Intermediary retweets can only be represented if they were quote tweets containing a retweet with additional content or changes added. Next, based on the time sequences when these tweets were created, we drew the directed edges using earliest tweets. In other words, between two users, their connection can only be single-directed based on the time they reacted to the publication (see Fig. 2) . The article-level networks are unweighted. A total of 417 article-level networks were generated. Table 3 summarizes the descriptive statistics of the networks. The sizes of the networks range from 6 to 8447 nodes and 2 to 8206 edges. On average, the largest connected components constitute 57.86% of the networks, and 11.22% of the nodes are among the top users (N = 659) in our sample. Second, article-level networks were merged to construct a corpus-level network. This provides a more comprehensive picture of the roles of different users in the process of research dissemination on Twitter. Figure 3 shows an example of how two articlelevel networks are merged. Both directions and weights of edges are considered in the merged dissemination network. The resulted network contained 99,619 nodes and 125,085 edges. The largest connected component was accounted for 79.68% of nodes in First, we examined the contribution of individual users to maintaining the connectivity of the network, and thereby, to information dissemination by applying node removal methods. For each of the 659 classified users, we removed the corresponding node and its edges from a given network and measured the change in the connectivity of that network. The connectivity of a network is quantified by the size of the largest connected component (LCC) of the network (i.e., the maximum number of reachable nodes, which indicates how well a network is connected) (Newman, 2018) . The change in network connectivity is measured by the relative size of LCC reduced after the node removal, . The decrease in network connectivity due to the removal of a node indicate the degree to which that node contributes to maintaining the connectivity of the network, and thereby, to information dissemination. The node removal method was applied to both the corpus-level network and all article-level networks. To quantify the contribution of a user's contribution, the mean decrease in network connectivity were taken. Kruskal-Wallis H tests were conducted to assess the differences in the contribution to network connectivity across humans, cyborgs, and bots. Second, to draw a picture of the directions of information flows, we summed up the number of connections in all article-level networks based on the type of connections, for instance, from bots to bots, from humans to cyborgs, etc. In addition to the proportion of edges, we also identified the initiators which are the first in their chains of information dissemination by analyzing the mentions to each article. Figure 4 shows how initiators are identified. To better illustrate the timeliness of users' reactions to academic literature and whether the reaction has facilitated the dissemination of articles, the amplifier score is calculated. An amplifier refers to a user who enjoys being the first to share information and intensively reaches out to users within their trusted networks (Tinati et al., 2012) . Adapted from the formula proposed by Wang and Zheng (2014) , the amplifier score in this study was calculated as follows For each article art among art u , n articles that were mentioned by a user u, we identify the group of users U art who reacted to the same article. Through this, we evaluate the rank of the user u among U art based on the time they tweeted the article in an ascending order. U u first (0or1) indicates whether the user u initiated the first connection in his or her information chain. The mean score per article is further computed to reflect the tendency of the user u to be an amplifier in the process of research dissemination. Finally, we researched the roles of classified users in the corpus-level network through a variety of selected node centralities in Table 1 . Also, we investigated the proportion of different connections, for instance, bots to cyborgs, bots to humans, etc. The data analysis was performed with Python 3.7.4 (NetworkX 2.5 for network analysis and SciPy 1.5.4 for statistical analysis). Among the top users (N = 659) who tweeted more than ten times, 35.96% of the users were classified as automated accounts (Bots: 13.20% and Cyborgs: 22.76%) and the rest of them as humans. The selected top users were composed by academic researchers and institutions (22.15%), health science practitioners (21.55%), non-health science practitioners (5.77%), and academic publishers (3.79%). Only 3% of the top users were topic feeds and news alerts, whereas 2.12% of them were research feeds that target at scientific publications. Mass media (1.67%) were involved in the discussions of COVID-19 publications as well. Figure 5 shows a breakdown of users by automation levels and user groups. Account automation appeared to be prevalent across different types of users, especially research feeds and topic feeds and news alerts. Academic publishers and nonhealth science practitioners may also employ automation tools for tweeting activities to a different extent. The selected top users contributed 13,179 (8.7%) to the total number of tweets extracted (N = 151,480). Among the tweets generated by top users, around 55.9% (n = 7,368) was tweeted by humans. The tweet volumes originated from bots and cyborgs were 17.6% (n = 2,320) and 26.49% (n = 3,491), respectively. Contributing 16.83 and 16.62% of tweets out of the pool of tweets generated by selected top users, health science practitioners and academic communities can be considered as the most active users. In terms of the average number of tweets per user is considered, bot-assisted academic publishers (M = 41.5, sd = 51.95) and automated accounts, such as automated feeds of research (M = 33.09, sd = 20.62) and automated feeds of generic topics and news alerts (M = 31.38, sd = 31.7), were tweeting more aggressively (Fig. 6 ). 82.71% of nodes in the corpus-level network were connected with at least one neighbor. This implies an active flow of information in the process of scholarly communication on Twitter. 97.63% of humans and 96.67% of cyborgs were connected nodes. Even among bots, it was surprising that the percentage of connected nodes was as high as 89.66. Our analysis of the interactions among the top users revealed cross-cutting interactions across different groups of users. A majority of interactions happened within and between humans and cyborgs, e.g., human-to-human (40.23%), human-to-cyborg (20.23%), and cyborg-to-human (17.12%). However, it was evident that information may also flow from humans to bots (7.39%) and from bots to humans (0.34%). As depicted in Fig. 7 , intensive interactions were found between academic communities and practitioners, both those from health science and non-health science domains. This suggests that academic publishers are an important information source in research dissemination. Articles tweeted by academic publishers reach various user groups including researchers and practitioners as well as mass media. Automated feeds of research or generic topics may also receive information from academic publishers. It is worth highlighting that automated accounts (e.g., automated research feeds) served a role to deliver research publications to researchers and practitioners. The results showed that cyborgs reacted very fast to the newly published articles. Among the sampled articles, 19.66% was first mentioned by cyborg accounts. As shown in Table 4 , 6% of these articles were first tweeted by bot-assisted academic publishers. Academic communities and health science practitioners, both manually-managed and semi-automated, were among those who reacted timely to COVID-19 publications. Automated research feeds were very active as well, with 5.28% of articles first mentioned by them. Through the amplifier scores (see Fig. 7) , it is not difficult to tell that, regardless of the level of automation, academic publishers, members from academic communities, and health science practitioners were among the first to initiate the dissemination and discussion of scholarly works on Twitter. Bot-assisted academic publishers (M = 367.66, We identified 100,040 chains of article dissemination. Among them, 90.8% involved only two users, 8.14% involved three users, and around 1% of the chains contained four users. The longest chain detected involved six users, and only one such case was found. This suggests that the dissemination of COVID-19 publications on Twitter may only occur within small circles of users. This should also be attributed to the limitation of Twitter API, through which we can only extract the originated tweet rather than complete paths of retweets. Figure 8a shows the article-level network with the largest number of nodes among which 15% are classified users. In this example, 83.47% of dissemination chains involved only two users. Among these one-step disseminations, a majority of them began from a bot-assisted academic publisher (33.23%) or a practitioner (human) in the domain of health science (29.71%) and ended with another user. For example, the flow of dissemination can be originated from a bot-assisted academic publisher to all types of academic researchers and institutions (1.28%) or an account of manually-managed or semi-automated research feeds (0.64%). 16.53% of dissemination chains had three or more users involved, with 95% of them starting with bot-assisted publishers. The longest chains of dissemination had a length of four. In these two cases, the pattern of the information flow is academic publishers (cyborgs)-unclassified users-academic researchers and institutions (humans)health science practitioners (humans). To be more specific, the article was first tweeted by the publisher (bot-assisted), and quoted by an associate professor of medicine who is the author of the article with additional content added. An assistant professor of medicine further reacted to the author's tweet and left positive comments. The comment was further retweeted by a resident physician and a family doctor respectively. Another influential unclassified user describes himself as an editor of an academic medical journal in his Twitter profile. To examine the importance of users in the process of research dissemination, we measured the changes in network connectivity by removing nodes corresponding to the users. The results suggest that humans, bots, and cyborgs exercise different levels of influence on the network. The removal of a human account from the corpus-level network (N = 99,619) reduced the size of the largest connected component (LCC) by 0.03% on average (Mdn = 0.003, sd = 0.16). The LCC size was reduced by 0.04 percentage on average while removing a cyborg account (Mdn = 0.001, sd = 0.23). In general, bots seemed to have lower impacts on the network as the LCC only drops 0.004% on average when the node removal was applied. (Mdn = 0.001, sd = 0.01). This difference across groups was statistically significant by the Kruskal-Wallis H test, H = 34.21, p < 0.01, implying that human accounts play a more important role in maintaining the connectivity of the network. We conducted another experiment to remove all humans, bots, and cyborgs, respectively. The removal of all humans (N = 422, 0.42%) from the corpus-level network led to decrease in network connectivity by 15.00% (Fig. 9 ). When all cyborgs (N = 150, 0.15%) were removed, the network connectivity decreased by 5.66%. As shown in Fig. 9c , in contrast to Fig. 9a a few major clusters driven by cyborgs disappeared after cyborgs were removed. This can hinder the spread of research publications. While, for bots (N = 87, 0.09%), excluding them from the network, we lost 0.3% of the nodes in the LCC. Removing all bots may not lead to a significant loss of major clusters in the network, however, the dissemination of articles can still be affected as the number of nodes decreases. Similar tests were also employed in the article-level networks. When a human account was removed from an article-level network, on average, the LCC proportion dropped 2.15% (Mdn = 0.21%, sd = 8.46%), followed by a cyborg account (M = 2.10%, Mdn = 0.26%, sd = 9.56%) and a bot (M = 0.5%, Mdn = 0.15%, sd = 0.88%). This Fig. 9 The largest connected component before and after node-removal-the corpus-level network. a The LCC before nodes were removed. b-d LCC after all nodes of humans, cyborgs and bots were removed respectively. For clearer visualization, edges that appear only once in the corpus-level network and edges between unclassified users are hidden from the graphs. The size of nodes refers to the outdegree of the node within the graph difference was not significant in the Kruskal-Wallis H test. However, the difference across humans, cyborgs, and bots regarding the change of the LCC size before and after all nodes of them were removed respectively from the article-level networks was found to be significant, H = 72.64, p = 0.01. On average, removing all humans from the process of article dissemination, the LCC of the article-level network reduced by 13% (Mdn = 0.74%, sd = 23.88%), while removing all cyborgs could lead to a shrink of 14.27% (Mdn = 0.00%, sd = 27.77%). A minor change was observed when bots were removed. Only 1.6% of the LCC was lost (Mdn = 0.00%, sd = 4.78%). Network before and after node-removal-an example of article-level networks. a An example of article-level networks (Number of nodes: 472) before and after nodes were removed. b-d Correspond the network after all nodes of humans, cyborgs and bots were removed respectively. The size of nodes refers to the outdegree of the node within the graph. The color of nodes refers to the type of account automation Figure 10 shows a sample article-level network before and after humans, cyborgs, and bots are removed, respectively. The selected article is the one with the largest number of nodes among which 15% are classified users (see Fig. 8 as well). As shown in Fig. 10b and c, removing humans and cyborgs created a destructive impact on the dissemination of articles. For example, in Fig. 10c a large cluster dominated by a human account (human science practitioner) disappeared after the cyborgs were omitted. A simple reason is that this user's source of information is a bot-assisted academic publisher. Similarly, removing all humans from the network caused obstacles to the diffusion of information due to the loss of major clusters within the network. When comparing to humans and cyborgs, bots seemed to have less influential power in the process. As shown in Fig. 10d , the general skeleton of the network still remained even after removing all bots. Figure 11 shows that the average proportion of the largest connected component reduces when the nodes of users were removed based on the type of users. Our results underscored the importance of academic publishers, either assisted by bots or manually managed by humans, in the dissemination of articles. For instance, removing a bot-assisted publisher from the corpus-level network led to a shrink of 0.35% (Mdn = 0.0002) in terms of the size of LCC on average, while considering the article-level networks, the size of LCC reduced 27.68% (Mdn = 0.0009) on average. In general, humans appeared to have more influential power in the article dissemination network. For instance, removing a researcher from the corpus-level network, the size of the largest connected component shrank 0.06% on average Fig. 11 Avg. % of LCC Reduced after Node-removals by Type of Users. The size change of LCC when a top user is removed based on the type of user. Top 5 user groups with the highest mean/median values on the LCC are highlighted. The rankings of user groups were calculated before values were rounded to four decimal places (Mdn = 0.0000). With a 0.04% (Mdn = 0.0001) of the LCC dropped on average identified, health science practitioners were also among one of the most influential groups of users. In general, the active role of automated accounts should be recognized. For example, when we removed an automated topic feeds and news alert account (bot) from the article-level networks, the average proportion of the LCC reduced was 1.97% (Mdn = 0.0000). We characterize users by their centralities in the network. Figure 12 . Humans showed the highest median outdegree centrality and authorities whereas cyborgs showed a higher median indegree centrality. In terms of mean values, cyborgs ranked the highest across all centrality measures except betweenness centrality and outdegree centrality. In general, humans and cyborgs showed a similar level of node centrality. In other words, cyborgs have the potentials to initiate and facilitate scholarly communication on Twitter. With relatively high outdegree centrality and high values in hubs, both humans and cyborgs can be idea starters in the process. In the meanwhile, they were actively consuming or curating relevant information as indicated by indegree centrality and authorities. Similar to humans, cyborgs may also serve as bridges within the networks of research dissemination. Additionally, outliers with extremely great influence were observed in all three groups. First, consistent with previous studies (Díaz-Faes et al., 2019; Sugimoto et al., 2017) , our findings confirmed the heterogeneity of users who mention or communicate scientific publications on Twitter. For instance, both academic users (i.e. researchers and publishers), and non-academic users such as practitioners from a variety of domains as well as mass media were actively disseminating COVID-19 publications on Twitter. When considering the number of users and the volume of tweet, academic researchers and institutions, as well as health science practitioners, were the most active tweeters citing COVID-19 publications. Similar to previous studies (Didegah et al., 2018; Haustein et al., 2016; Robinson-Garcia et al., 2017) , we observed the prevalence of automated accounts among users who generated a large volume of tweets. It was evident that automated accounts, especially botassisted academic publishers and automated feeds of both research and generic topics & news alerts, were tweeting more actively than humans. Regarding the patterns of connections among users, it is worth nothing intensive interactions across humans, bots, and cyborgs. The interactions were mainly dominated by academic publishers, academic researchers and institutions, and health science practitioners. Their high amplifier scores and outdegree centrality suggest that they served as important information sources in the process of research dissemination. Another finding worth our attention is that automated accounts played an active role in scholarly communication. This can be reflected through their efforts to facilitate the dissemination of articles and their potentials of being influential and efficient disseminators. A considerable portion of connections sourced from bots and cyborgs were the initiators in their chains of information diffusion. For instance, bot-assisted academic publishers had the highest amplifier scores. Another evidence is that automated feeds of research may extract research publications from research databases such as PubMed and bioRxiv, and further spread the articles to researchers, practitioners and the general public. Additionally, the experiments of node removal and user characterization using node metrics confirmed the advantages of cyborgs. Without the participation of automated accounts, the flow of information may become less effective and efficient. The highly skewed data in the network metrics within different groups indicated the heterogeneity of Twitter users in disseminating or discussing scientific publications. On the one hand, there existed extreme outliers regarding the volume of tweets generated, as well as the level of influence. On the other hand, users that are classified into the same group may not share similar behavioral patterns. For example, a bot-assisted academic publisher may tweet differently from a bot-assisted academic researcher, and an account of automated research feeds may have a different motivation for tweeting when compared to a politics-related bot, etc. Therefore, it may not be ideal to generalize the behavioral patterns or the roles of users based on either their levels of automation in the content generation or user types in Twitter metrics studies. A major limitation of this study is that not all COVID-19 publications and Twitter mentions were exhaustively captured as only DOIs were used to retrieve the sampled articles. As we did not have a well-performed classification algorithm to categorize sample users, a relatively small number of users were studied. Another issue should be the data quality regarding the date of publication. Even though three sources were employed to cross-check the date to ensure that the article was first made available in May 2020, tweets posted earlier, e.g., December 2019, were observed. Lastly, we admit that the findings of this paper may not apply to other topics or subject disciplines. It will be good if text analysis could be performed to further interpret the motivations of different users for disseminating or communicating scientific publications on Twitter. Our study enriches the understanding of the Twitter altmetrics study by demonstrating the active role of automated accounts in the process of research dissemination. Similar to human accounts, bots and cyborgs can also initiate and also facilitate the process of communication. In addition, our analysis revealed the flows of information in research dissemination. This will enrich the understanding of scholarly communication on Twitter and Twitter mentions in the context of altmetrics. To examine the validity of Twitter mentions in assessing the impact of research, it is critical to understand the meaning of the Tweets. Hence, future studies should pay extra attention to the motivation of different users, including automated accounts, for disseminating or communicating academic works on Twitter. It is suggested that both the social backgrounds of the users and the automation of accounts should be taken into account. It will be good to compare the role of users in different types of social networks, e.g., the network of research dissemination versus the network of user interaction. Additionally, as different network characteristics were observed across different user groups, our study shows the potentials of future studies to develop automatic user classifiers based on network topology and node metrics. This will greatly benefit studies of Twitter metrics on a large scale. Sources of self-efficacy in researchers' use of social media for knowledge sharing Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? Towards a second generation of 'social media metrics': Characterizing Twitter communities of attention around science Investigating the quality of interactions and public engagement around scientific papers on Twitter Digging deeper and finding the gems of a social media platform for a community of academic researchers Measuring social media activity of scientific literature: An exhaustive comparison of scopus and novel altmetrics big data Scholarly Twitter metrics Tweets as impact indicators: Examining the implications of automated "bot" accounts on Twitter Why do some research articles receive more online attention and higher altmetrics? Reasons for online success according to the authors COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls You talkin' to me? Exploring human/bot communication patterns during riot events Mapping a Twitter scholarly communication network: A case of the association of internet researchers' conference Content analysis of scholarly discussions of psychological academic articles on Facebook Networks Consistency and differences between centrality measures across distinct classes of networks Scholarly blogging: A new form of publishing or science journalism 2.0? Science and the Internet The unbearable emptiness of tweeting-About journal articles Mining network-level properties of Twitter altmetrics data Bot stamina: Examining the influence and staying power of bots in online social networks Scholarly use of social media and altmetrics: A review of the literature Identifying Communicator Roles in Twitter Highly tweeted science articles: Who tweets them? An analysis of Twitter user profile descriptions Communities of shared interests and cognitive bridges: The case of the anti-vaccination movement on Twitter On macro and micro exploration of hashtag diffusion in Twitter Arming the public with artificial intelligence to counter social bots Context of altmetrics data matters: An investigation of count type and user category