key: cord-0602186-8wuemfep authors: Chen, Ninghan; Chen, Xihui; Zhong, Zhiqiang; Pang, Jun title: The Burden of Being a Bridge: Understanding the Role of Multilingual Users during the COVID-19 Pandemic date: 2021-04-09 journal: nan DOI: nan sha: d39744bc42cd75d363bd650729e4696983b72b18 doc_id: 602186 cord_uid: 8wuemfep The outbreak of the COVID-19 pandemic triggers infodemic over online social networks. It is thus important for governments to ensure their official messages outpace misinformation and efficiently reach the public. Some countries and regions that are currently worst affected by the virus including Europe, South America and India, encounter an additional difficulty: multilingualism. Understanding the specific role of multilingual users in the process of information diffusion is critical to adjust their publishing strategies for the governments of such countries and regions. In this paper, we investigate the role of multilingual users in diffusing information during the COVID-19 pandemic on popular social networks. We collect a large-scale dataset of Twitter from a populated multilingual region from the beginning of the pandemic. With this dataset, we successfully show that multilingual users act as bridges in diffusing COVID-19 related information. We further study the mental health of multilingual users and show that being the bridges, multilingual users tend to be more negative. This is confirmed by a recent psychological study stating that excessive exposure to social media may result in a negative mood. The current coronavirus COVID-19 pandemic is a global health crisis of our time. The outbreak of the COVID-19 pandemic leads to an outbreak of information on major online social networks (OSNs), including Twitter, Facebook, Instagram, and YouTube (Cinelli et al. 2020) . In this massive COVID-19 outbreak and constantly changing situation, OSNs, thanks to their globally available services, have become essential for people all over the world to seek up-tothe-minute and local information. Moreover, due to the curfew measures and social distancing, people have been spending much more time on OSNs. As a result, social networks have joined conventional media, e.g., TV and radio, to become an indispensable channel for governments and news agents to publish COVID-19 related information. A recent study demonstrated the existence of infodemic on social media during the COVID-19 pandemic and its negative impact on the control of the virus (Cinelli et al. 2020) . The term infodemic outlines the danger of misinformation during epidemic disease outbreak. As a result, it is very important for governments (and new agents) to design effective strategies to ensure that their official message outpace misinformation and efficiently reach the public. Some regions that are worst affected by the virus, including European Unions, America and India, have one additional difficulty, that is, their nature of multilingualism. Cross-language channels should be created for people to receive trustworthy information no matter which language they speak. As studied in the literature, multilingual users on social media can work as bridges between language communities (Eleta and Golbeck 2012; Hale 2014) . Understanding of the role of multilingual users in information diffusion is thus crucial in practice for governmental departments to design their information release strategy. As far as we know, in spite of the extensive studies explaining and predicting information diffusion (Wang et al. 2017; Chen et al. 2019) , it is still unclear whether multilingual users also play a bridging role in information diffusion on social media as they do in social network connections, i.e., whether they act as bridges connecting message originators and message receivers. The information on OSNs can reflect reality, but the infodemic it triggers can also affect people. Frequent exposure to social media is likely associated with an increase in mental health problems (Gao et al. 2020) , including vicarious traumatisation , depression (Zhong, Huang, and Liu 2020) and anxiety (Zhong, Huang, and Liu 2020; Amsalem, Dixon, and Neria 2020) . Based on these psychological findings, if multilingual users indeed play a critical role in information diffusion, then the resulted excessive exposure to COVID-19 related information may make multilingual users become more stressed or anxious, compared to monolingual users. If this is the case, additional attention should be devoted to multilingual users' mental health. To sum up, in this paper, we aim to answer the following two research questions: RQ1: What role do multilingual users play in the diffusion of COVID-19 related information? RQ2: Are multilingual users more negative than monolingual users during the COVID-19 pandemic? If yes, is the negativity related to their role in information diffusion? The information outbreak on social media incurred by the COVID-19 pandemic provides us with valuable data to answer these two questions. We crawled data from Twitter for almost 7 months since the beginning of the COVID-19 pandemic which come from a multilingual region severely hit by the virus: the Greater Region of Luxembourg (GR). GR is a cross-border region centred with Luxembourg and composed of its adjacent regions of Belgium, Germany and France. Luxembourg is famous for its multilingualism, as most Luxembourgish people can speak four languages and about 50% of its population are foreigners 1 Luxembourg is severely affected by the virus and ranked in the top three countries in Europe in terms of the number of new infections in every 100,000 inhabitants. Based on the dataset we collected from Twitter, we examine the bridging role of users in information diffusion by quantifying their influences within information cascades (Chen et al. 2019; Wang et al. 2017 ). More specifically, we propose two new cascade bridging measures, which allow us to comprehensively understand and quantify how much an OSN user plays the role of as a bridge in COVID-19 related information diffusion. Then we continue to analyse users' mental health status, i.e., being negative or positive, based on the sentiment of their posted tweets during the collection period. In the end, we investigate the correlation between users' role in information diffusion and their mental health. Our contributions of this paper can be listed as follows: • We empirically and quantitatively demonstrate the bridging role of multilingual users in the diffusion of COVID-19 related information. • We analyse users' mental status from the sentiment of their posted messages during the pandemic and discover that multilingual users are more likely to be negative. • We validate the result of a recent psychological study on social media, revealing the positive correlation of a user's exposure to social media with her/his anxiety. Indeed, we find that users' mental health is strongly correlated with their bridging performance in information diffusion, but has weak or no correlation with their topological properties in the social network. In this section, we describe how we build our GR-ego Twitter dataset, referred to as GR-ego dataset in the rest of the paper. In addition to its popularity and large number of active users, we have three other reasons to select Twitter. First, the language of tweets is provided and we can use it to identify multilingual users. If a user only retweets tweets in one single language, we consider her/him as monolingual. Otherwise, this user is multilingual. Second, the user-input locations of the posters can be used to find users located in GR. Last, it allows us to track the diffusion process of a Twitter message. Specifically, the ID number of the original message is attached if a message is retweeted. We can use it to find other users retweeting the same message and with the retweeting time stamps, we can approximately simulate the diffusion cascade of the message. We give in Table 1 Our GR-ego dataset consists of two components: (i) the social network of GR users which records GR users and the following relations between them; (ii) the tweets posted or re-tweeted by GR users during the pandemic. As we will shown in the following sections, these two types of information allow us to capture the diffusion process of COVID-19 related tweets as well as to analyse users' mental sentiment. We perform three steps to collect our GR-ego dataset. Step 1. Meta data collection. At this step, we aim to collect a set of seed users in GR who are actively involved in COVID-19 discussions. Due to the poor efficiency to crawl tweets according to keywords, we make use of a public dataset of COVID-19 related tweets (Chen, Lerman, and Ferrara 2020) . Restricted by privacy policies, this dataset only consists of tweet IDs. With these IDs, we subsequently crawl their content through the Twitter API. We extract the messages posted in the period between the early stage of the pandemic (January 22nd, 2020) and the end of the first wave of the pandemic (mid July, 2020), for about 7 months. In Twitter, geography information, e.g., the locations of the posters and the original users if messages are re-tweeted, is provided by Twitter users themselves. As a result, they are usually ambiguous and do not have a unified format. We leverage the geocoding APIs, Geopy 2 and ArcGis Geocoding 3 to regularise the location format and remove the ambiguity. For instance, the location of the poster of the example message in Table 1 , i.e., Moselle, is transformed to a preciser and machine-parsable address: Mosselle, Lorraine, France. With the transformed geo-locations, we collect the Twitter messages from the greater region. In the end, we obtain 128,310 tweets in total from 8,872 GR users. Step 2. Social network construction. At this step, our purpose is to search more GR users from the seed users and construct the GR-ego social network. We use an iterative approach to gradually enrich the social graph until it stops growing. We start with the seed users. For each user, we obtain their followers and only keep the ones that have a mutual following relation with the seed user. The reason is that such users have much larger probabilities to reside in GR, which ensures a good efficiency of construction. The locations of the new users are obtained through their posts and processed in the same way as in the previous step. Only users from GR are added to the social network as new nodes together with their following relations with other existing users. Specifically, if user u follows u , a directed edge from node u to u will be added. After the first round, we continue going through the newly added users one after another by adding their mutually followed friends that do not exist in the current social network. This process will continue until no new users are added into the network. In our case, it takes 5 iterations before termination. We take the largest weakly connected component of the constructed social network as the GR-ego social network. Table 2 : Statistics of the GR-ego social network. Step 3. Timeline tweets crawling. At this step, we collect tweets posted or re-tweeted during the pandemic by GR users to construct the dataset of Twitter messages. Recall that these past tweets will be used to analyse the mental status of the posters during the pandemic. According to the Twitter policy, we can only download the last 3,200 tweets in a user's timeline. Even with this limit, it is still rather time-consuming to download the tweets of all the users in the GR-ego social network constructed in the previous step. Therefore, we select a subset of representative users with a sufficiently large size and crawl their tweets. Specifically, we choose the 8,872 users in the meta dataset together with their followers and followees, which add up to 15,255 users. From these users, we collect 10,994,189 Twitter messages in total. Figure 1 presents the numbers of the tweets in different languages. We can see that the collected tweets are posted in a very diverse set of languages. English, as a universal language, is still the dominant language and the distribution of other languages are consistent with that of the nationalities of the GR inhabitants. We conduct two processing steps to obtain information required for our analysis. A cascade records the process of the diffusion of a message. It stores all activated users (in our dataset, users retweeting the message) and when they are activated, e.g., the order of activation. In this paper, we adopt a widely accepted model to represent the cascade of a message: cascade tree (Wang et al. 2017) . The user who first posts the message is the root of a cascade tree. The users who retweet the message but have no 1XPEHURIWZHHWV Figure 1 : Distribution of different languages of GR Twitter messages. followers retweeting the message comprise the leaf nodes. An edge from u to u is added to the cascade if u follows u and u re-tweeted the message after u. If many users who u follows retweeted the message, then we select the last one retweeting the message as the parent node of u . Figure 2(b) shows a cascade example for the social network as depicted in Figure 2 (a). In this example, user u 4 can be activated by the messages retweeted by either u 1 or u 3 . Since u 3 retweeted after u 1 , we add the edge from u 3 to u 4 to indicate that the retweeting of u 3 has activated u 4 . We denote the root node of a cascade C by r(C). We call a path that connects the root and a leaf node a cascade path, which is actually a sequence of nodes ordered by their activation time. For instance, (u 1 , u 3 , u 4 ) is a cascade path in our example indicating that the diffusion of a message started from u 1 and reached u 4 in the end through u 3 . In the rest of this paper, we represent a cascade tree as a set of cascade paths for the purpose of simplicity. For instance, the cascade in Figure 2 (b) is denoted by the following set {(u 1 , u 2 , u 7 , u 8 ), (u 1 , u 3 , u 4 ), (u 1 , u 3 , u 6 )}. We follow the method in (Kupavskii et al. 2012; Tsur and Rappoport 2012) to construct cascades. Recall that when a tweet's status is 'Retweeted', the ID number of the original tweet is attached to the retweeted message (see in Table 1). We first create a set of original tweets consisting of the tweets in our meta data with the status as 'Original'. Second, for each original tweet, we collect the user IDs that have retweeted the message. At last, we generate the cascade based on the following relationship in our GRego social network and their retweeting time stamps. We eliminate cascades with only two users where a message is just retweeted once. Table 3 summarises the statistics of the collected cascades. In total, 29,710 cascades are built with 83, 904 users involved. An interesting observation is that multilingual users are very active in diffusing COVID-19 related information. Among all the participants, only 5.04% are multilingual users but they participate in 68.5% of the cascades. On average, each multilingual user diffuses 13.11 messages, which is 2 times more than that of monolingual users. Another observation is that the cascades with multilingual users have 2 more users on average. Previous works (Zhou, Jin, and Zafarani 2020) leverage user-provided mood (e.g., angry, excited) or status to analyse users' mental status. However, such information is not available in most popular social networks (e.g., Twitter). Fortunately, with the recent fast advance of sentiment analysis methods (Balahur and Turchi 2012; Devlin et al. 2018 ), we can still obtain users' sentiment by analysing their posted Twitter messages. Sentiment analysis for multilingual text faces a serious lack of resources (Balahur and Turchi 2012) . Most works choose to translate the original language into a resource-rich language, such as English, and then analyse with existing lexicons or annotated dataset (Denecke 2008) . Transformer-based representation learning methods (Devlin et al. 2018 ) allow us to obtain users' sentiment by analysing their posted tweet without translation. In this paper, we make use of the methods for polarity sentiment analysis. In other words, we only classify the sentiment of a message as negative or positive. We build an end-to-end deep learning model to conduct the classification which is composed of three components. As the first component, we use XLM-RoBERTa (Ou and Li 2020), a pre-trained multilingual language model, to obtain the embedding of tweet content text. The embedding results are feed into our second component, a fully-connected ReLU layer with dropout. At last, we add a linear layer on top of the pre-trained model's outputs for regression with an activation function, i.e., sigmoid. We use cross-entropy as our loss function and optimise it with the Adam optimizer in our representation learning process. We train our model on the Sentiment140 dataset (Go, Bhayani, and Huang 2009 Figure 3 : Sentiment distribution of diffused COVID-19 related information (left) and users' timeline tweets (right): Negative COVID-19 information is diffused while users tend to be positive. with a binary sentiment value indicating positive or negative. We split the dataset and take the first 80% of the tweets as the training set and the rest 20% for testing. We assign other training parameters following the common principle used in existing works. Specifically, we run 10 epochs with the maximum string length set to 128 and dropout ratio to 0.5. When tested with Micro-F1 score and accuracy metrics, we get an accuracy of 85.48% and F1 Score 84.11%. Before applying our sentiment classification model, we clean the tweet contents by removing all URLs, mentioned usernames and the word 'RT'. Figure 3 summarises the results for the COVID-19 related tweets and the user timeline tweets. The results from users' timeline tweets are consistent with the existing work (Pak and Paroubek 2010), i.e., users tend to post positive messages in online social media. It is also reasonable that 62.8% COVID-19 related tweets are negative considering the continuous large-scale infections. In this section, we will answer the first research question (RQ1). Specifically, we will quantitatively analyse the role of multilingual users in the diffusion of COVID-19 related information. We will check whether multilingual users act as bridges between the originator and activated users. We propose two new measures to quantify multilingual users' importance in information diffusion from two different levels. The first one is defined from the level of users and evaluates an individual user's overall performance in COVID-19 information diffusion. It allows us to compare users in terms of their overall bridging importance. The second one focuses on the level of cascades. It compares the bridging performance of multilingual users and monolingual users, as two separate groups, for each cascade. It can be used verify whether the multilingual group outperforms the other group. We evaluate each user's overall performance in the diffusion of all observed Twitter messages. As a user can participate in the diffusion of many messages, before calculating the user's overall performance, we start with her/his importance on the diffusion of one single message and then combine her/his importance of all messages into one single measurement. Intuitively, we consider a user as more important in the diffusion of a message according to three criteria: 1. the user participates a larger number of cascade paths; 2. the cascade paths with the users have larger lengths; 3. more users get activated directly from him. The first two criteria assign more importance to users who can activate more users with her/his information sharing behaviour. It has been pointed out in the literature (Zhou et al. 2010 ) that cascades tend to be wide but not deep and wider cascades can facilitate the receipt of the message by more diverse people. In the scenario of multilingualism, more users activated directly by a multilingual user will imply a larger chance of diffusion across different language communities. Given a cascade path S = (u 1 , u 2 , . . . , u n ), we use S * (u i ) (1 ≤ i < n) to denote the subsequence composed of the nodes after u i (including u i ), i.e., (u i , u i+1 , . . . , u n ). For any u that does not exist in S, we have S * (u) = ε where ε represents an empty sequence and its length| ε |= 0. Definition 1 (Cascade bridging value) Given a cascade tree C and a user u (u = r(C)), the cascade bridging value of u in C is calculated as: Note that our purpose is to evaluate the importance of users being a transmitter or sharer of a message. Therefore, the concept of cascade bridging value is not applicable to the root user, i.e., the message originator. Example 1 We take user u 3 in Figure 2 (b) as an example. He participates two out of the three cascade paths, i.e., S 1 = (u 1 , u 3 , u 4 ) and S 2 = (u 1 , u 3 , u 6 ). Thus S * 1 = (u 3 , u 4 ) and S * 2 = (u 3 , u 6 ). We then have α C (u 3 ) = 2/3+2/3 3 ≈ 0.44. Note that we do not simply use the proportion of users that get activated from a user in a cascade to evaluate her/his bridging role. This is because it cannot distinguish users with directly activated followers. Taking u 2 in Figure 2 (b) as an example, according to our definition, α C (u 2 ) = 0.25 is smaller than α C (u 3 ) because u 3 directly activated two users. However, if we take the proportion of activated users, the values of these two users will be the same. With a user's bridging value calculated in each cascade, we define user bridging magnitude to evaluate her/his overall importance in the diffusion of a given set of observed messages. Intuitively, we first add up the bridging values of a user in all his/her participated cascades and then normalise the sum by the maximum number of cascades participated by a user. It captures not only the bridging value of a user in each participated cascade, but also the number of cascades he participated in. This means, a user who is more active in sharing COVID-19 related information is considered more important in information diffusion. Definition 2 (User bridging magnitude (UBM)) Let C be a set of cascades on a social network and U be the set of users that participate at least one cascade in C. A user u's information diffusion importance is calculated as the average cascade bridging value over the cascades in C, i.e., With this measure, for any two users, we can compare their UBM values and learn which one plays a more important role in information diffusion. Empirical verification. We analyse and compare the overall importance of multilingual and monolingual users in information diffusion during the pandemic based on their UBM values. If multilingual users play a more important bridging role, we should have i) multilingual users normally have larger UBM values; and ii) a larger proportion of multilingual users should have good UBM values than monolingual users. We will verify whether these two expectations can be observed in our GR-ego dataset. Figure 4 shows the UBM distribution of the two groups of users from three perspectives. The first general observation is that over 97% of users' UBM values lie in the range between 0.0 and 0.4. We consider UBM that is smaller than 0.1 as weak and that larger than 0.2 as strong. From the box plots on the left, we can see that the median and mean (labelled by green triangles in the boxes) of multilingual users' UBM values are over 2 times as big as those of monolingual users. This indicates that in general multilingual users have larger UBM values. We show the complementary cumulative density function (CDDF) in the middle and the probability density function (PDF) of UBM on the right. We have two observations. First, for a significantly large proportion of monolingual users (approximately 78%), their UBM values are weak, i.e., smaller than 0.1. By contrast, only 46% multilingual users have weak UBM values. This indicates that a large number of monolingual users cannot effectively activate other users to diffuse messages. Second, it is clear that compared to monolingual users, more multilingual users have large UBM values. About 23% of multilingual users hold strong UBMs which is over 3 times more than monolingual users. From the above analysis, we can see that our two expectations are observed in our dataset. Therefore, we conclude that multilingual users can activate more users in diffusing COVID-19 related messages. Our first measure focuses on the level of users and evaluates their overall performance across all observed information cascades. It does not consider the relative bridging performance of multilingual users and monolingual users within the cascades. We take an example to explain this. Example 2 We still take the cascade in Figure 2 (b) as an example. Suppose u 2 is the only multilingual user. After calculating the cascade bridging values, we can learn that α C (u 2 ) = 0.25, and as a multilingual user, u 2 plays a more important role in diffusing the message than all monolingual users except for u 3 with α C (u 3 ) = 0.44. In this example, we still cannot say multilingual users as a whole play a more important bridging role in this cascade. Therefore, we propose a second measure to compare the performance of multilingual users to monolingual users. Given a cascade C with multilingual users, we use U mul C to denote the set of multilingual users and U mon C the set of monolingual users in the cascade. Then we calculate an integrated value through a function γ from the bridging values of multilingual users and that of the monolingual users, represented by α mul C and α mon C , respectively. The integrated value can be the mean, median or maximum. Formally, Note that the integration function γ should be instantiated according to practical requirements. We say multilingual users play a bridging role in a cascade C when α mul C > α mon C . In the end, we use the notion cascade bridging magnitude (CBM) to quantify the importance of multilingual users as a whole in information diffusion. Definition 3 (Cascade bridging magnitude (CBM)) Let C mul ⊆ C be the set of cascades involving at least one multilingual user. Then the bridging magnitude is calculated as the following: where 1(x < y) is an indicator function which returns 1 when x < y and 0, otherwise. Experimental evaluation. In Table 4 , we list the results about the CBM values calculated with our GR-ego dataset. In our analysis, we instantiate the integration function γ with maximum, media and mean. Instead of just showing the cascade bridging magnitude of multilingual users, we also present the statistics in another two cases. Specifically, we use the term 'hold' to indicate the case when multilingual users play a more important bridging role in the cascade (i.e., α mul C > α mon C ), while 'not hold' indicates the opposite case. When multilingual users and monolingual users have the same integrated bridging value, it is 'uncertain' who are more important in the cascade. Before analysing the statistics, we will set a criterion that if multilingual users performs better in more than 50% cascades with multilingual participants, we will say the bridging role of multilingual users in the level of cascades strongly holds. If it is between 30% and 50%, considering the comparably small number, we say the bridging role weakly hold. Otherwise, the bridging role does not hold. The obvious observation from Table 4 is that multilingual users has a cascade bridging magnitude of 63.64% on average under all the three integration functions (i.e., maximum, mean and median). This means that multilingual users play a more important bridging role in about 65% cascades, which almost doubles that of monolingual users. Considering the small percentage of multilingual users in all the participants (i.e., 5.04%), we can conclude that multilingual users' bridging role in COVID-19 related information diffusion in the level of individual cascades strongly holds. Summary of the section. From the above discussion, we can see that multilingual users perform dominantly better at both the user level and the cascade level. Therefore, we conclude that multilingual users play an important bridging role in COVID-19 related information diffusion. In the previous section, we have succeeded in demonstrating that multilingual users play a bridging role in diffusing COVID-19 related information. In this section, we proceed with answering our second research question (RQ2), i.e., to check whether multilingual users are more negative and the correlation with their bridging role. As we mentioned previously, we analyse the mental wellbeings of users based on the sentiments of their posted messages and verify whether multilingual users are more negative. In Section 3, we have obtained the sentiments of users' timeline messages. We make use of the notion of subjective well-being score (SWB) proposed in (Zhou, Jin, and Zafarani 2020) to quantify the extent of mental positivity of a user based on their past posts. Definition 4 (Subjective well-being score (SWB)) We use N p (u) and N n (u) to denote the number of positive posts and the number of negative posts of a user u, respectively. The subjective well-being score of u, represented by swb(u), is calculated as: A larger SWB value indicates the corresponding user is more positive. Empirical verification. We proceed to compare the mental well-beings of the two groups of users. We say a user is consistently positive if her/his SWB value is larger than 0.5 and a user is consistently negative if her/his SWB value is smaller than 0.0. We select the value of 0.5 as only 20% users in our dataset have larger SWB values. Similar to the previously analysis, we first give our criteria that will be used to determine whether multilingual users tend to be more negative. Three conditions should be simultaneously satisfied. First, in general, multilingual users are more likely to be negative. Second, a larger proportion of multilingual users have consistent negative mental status. Third, a smaller proportion of multilingual users are consistently positive. Figure 5 presents the distributions of the SWB values of multilingual and monolingual users. We now go through the three conditions one after another. From the plot box on the left, we can clearly see that the mean and median SWB value of multilingual users are over two times smaller. This implies that multilingual users are generally more negative than monolingual users. From the rest two distributions, we can get about 20% multilingual users are consistently negative while only 8% monolingual users are consistently negative. Furthermore, about 5% multilingual users are consistently positive which is only one fourth of that of monolingual users. Therefore, the last two conditions are satisfied. From the above analysis, we conclude that multilingual users are indeed more negative than monolingual users. We continue to understand whether the comparatively larger negativity of multilingual users actually results from their dual bridging role in social network connections and information diffusion. To achieve this goal, we verify whether a user's sentiment is correlated to their bridging role. In terms of global network connections, we take the in-degree and out-degree of nodes, i.e. the number of followees and followers, to represent users' topology features, as commonly used in the literature (Zhou et al. 2010; Agarwal et al. 2020) . With respect to the bridging role in information diffusion, we use the measure user bridging magnitude as we focus on the correlation of a user's bridging performance with her/his own mental sentiment. In Table 5 , we show the correlation coefficients calculated for the three selected features. It is clear that the out-degrees have no relation with users' SWB values due to the small correlation coefficient 0.008. With the correlation coefficient of −0.69, we can interpret that users' information diffusion bridging performance quantified by our proposed measure is strongly correlated to their mental well-being. According to the correlation coefficient of 0.08, the in-degrees have a weak correlation with SWBs. We visualise the correlation of UBM values and in-degrees with SWB values in Figure 6 for the purpose of cross-checking. As we discussed previously, 97% of users' SWB values mainly span between 0 and 0.4. Thus, we can only concentrate the part lying in this range. We can see that UBM decreases almost linearly when SWB increases while the in-degree remains unchanged. This means in-degrees actually do not correlate to SWB values. In summary, we conclude that a user's mental sentiment during the COVID-19 pandemic is strongly correlated to her/his user bridging magnitude but is not correlated with their topology properties in the social network, e.g., indegree and out-degrees. The role of being undisputed universal lingua franca of information diffusion on a global level for English has been questioned in the COVID-19 outbreak (Piller, Zhang, and Li 2020) . It has been found that on Twitter nearly 49% of posts are written in a language different from English (Hong, Convertino, and Chi 2011) . With their ability to cross the language barrier on social networks, bilingual and multilingual users have attracted special attention from researchers Golbeck 2014, 2012; Kim et al. 2014 ) as well. Hale (Hale 2014 ) studied multilingual users from the view of the topological structure of social networks and found that without multilingual users, social networks will be disconnected. This implies multilingual users play a bridging role in the connectivity of the network. Eleta et al. (Eleta and Golbeck 2012) discovered that multilingual users also act as bridges between communities speaking different languages. With regard to information diffusion, it is studied that nonnative English speakers have higher influence than native English users (Kim et al. 2014 ). Agarwal et al. (Agarwal et al. 2020) showed that multilingual users play a special role in cross-lingual diffusion and multilingual users select the language for their tweet according to audiences (Johnson 2013; Murthy et al. 2015; Nguyen, Trieschnigg, and Cornips 2015) . Different from the existing works in the literature, in this paper, we investigate multilingual users' importance in diffusing messages, namely whether messages can reach more user due to the participation of multilingual users. In this paper, we have successfully answered two research questions. The first question is what role multilingual users play in COVID-19 information diffusion. The second is whether multilingual users' mental health will be negatively affected by their role in information diffusion during the pandemic. In the following, we briefly describe our findings and discuss their potential value in practice. For the first question, through the Twitter dataset we collected during the COVID-19 pandemic, we have empirically shown multilingual users have been playing an important bridging role in diffusing COVID-19 related information. Thanks to their active participation and influence, COVID-19 information can be spread to more users. Our finding is of great value in practice for the multilingual countries and regions to fight against infodemic and mitigate its damage. Official messages should be released in a carefully designed manner so that a large number of multilingual users could be activated in the early stage of the diffusion process. An interesting piece of future work along this direction is thus to study optimal approaches to activate multilingual users as many as possible within a given time window. For the second question, we discovered multilingual users are at higher risk of being affected by infodemic compared to monolingual users due to their repeated exposure to COVID-19-related information (Holmes et al. 2020) . At the time of writing, the pandemic is still evolving. We believe that this could draw special public attention on the mental health on multilingual users and may incur studies in the multilingual countries and regions that are badly hit by the virus on new approaches to mitigate this potential mental health crisis. In this paper, our study is based on the data during the COVID-19 pandemic. One of our future works is thus to check whether our findings in this paper also hold in the general information diffusion process. Characterising user content on a multilingual social network The coronavirus disease 2019 (COVID-19) outbreak and mental health: current risks and recommended actions Multilingual sentiment analysis using machine translation? COVID-19: The first public coronavirus Twitter dataset Information Cascades Modeling via Deep Multi-Task Learning Using sentiwordnet for multilingual sentiment analysis Bert: Pre-training of deep bidirectional transformers for language understanding Bridging languages in social networks: How multilingual users of Twitter connect language communities? Multilingual use of Twitter: Social networks at the language frontier Mental health problems and social media exposure during COVID-19 outbreak Twitter sentiment classification using distant supervision. CS224N project report Global connectivity and multilinguals in the Twitter network Multidisciplinary research priorities for the COVID-19 pandemic: a call for action for mental health science Language matters in twitter: A large scale study Audience design and communication accommodation theory: Use of Twitter by Welsh-English biliterates Sociolinguistic analysis of Twitter in multilingual societies Prediction of retweet cascade size over time Media exposure and anxiety during COVID-19: The mediation effect of media vicarious traumatization Do we tweet differently from our mobile devices? a study of language differences on mobile and web-based twitter platforms Audience and the use of minority languages on Twitter YNU OXZ @ HaSpeeDe 2 and AMI : XLM-RoBERTa with Ordered Neurons LSTM for Classification Task at EVALITA 2020 Twitter as a Corpus for Sentiment Analysis and Opinion Mining Linguistic diversity in a time of crisis: Language challenges of the COVID-19 pandemic What's in a hashtag? Content based prediction of the spread of ideas in microblogging communities Cascade Dynamics Modeling with Attention-based Recurrent Neural Network Mental health toll from the coronavirus: Social media usage reveals Wuhan residents' depression and secondary trauma in the COVID-19 outbreak Sentiment Paradoxes in Social Networks: Why Your Friends Are More Positive Than You? Information resonance on Twitter: watching Iran