key: cord-0467293-olh6jxbt authors: Chen, Ninghan; Chen, Xihui; Zhong, Zhiqiang; Pang, Jun title: From #Jobsearch to #Mask: Improving COVID-19 Cascade Prediction with Spillover Effects date: 2020-12-13 journal: nan DOI: nan sha: ff9ea7aaa058aadc25757260944b62db3098ba57 doc_id: 467293 cord_uid: olh6jxbt An information outbreak occurs on social media along with the COVID-19 pandemic and leads to infodemic. Predicting the popularity of online content, known as cascade prediction, allows for not only catching in advance hot information that deserves attention, but also identifying false information that will widely spread and require quick response to mitigate its impact. Among the various information diffusion patterns leveraged in previous works, the spillover effect of the information exposed to users on their decision to participate in diffusing certain information is still not studied. In this paper, we focus on the diffusion of information related to COVID-19 preventive measures. Through our collected Twitter dataset, we validated the existence of this spillover effect. Building on the finding, we proposed extensions to three cascade prediction methods based on Graph Neural Networks (GNNs). Experiments conducted on our dataset demonstrated that the use of the identified spillover effect significantly improves the state-of-the-art GNNs methods in predicting the popularity of not only preventive measure messages, but also other COVID-19 related messages. The outbreak of the COVID-19 pandemic leads to an outbreak of information in major online social networks (OSNs), including Twitter, Facebook, Instagram, and YouTube [1] , which is called infodemic. On one hand, due to physical isolation and social distancing, people spent much more time on OSNs, engaging in expressing opinions, catching up-tothe-minute development of the pandemic and even looking for medical support and knowledge to ease mental depression and seek psychological comfort. This new change in information perception makes OSNs become an essential communication channel for healthcare departments and medical staff to disseminate official policies and professional advice about effective measures to prevent the spread of COVID-19 virus, e.g., wearing masks, vaccination and social distancing. Misinformation and false news also take advantage of social media to spread with unprecedented speed and volume. Largescale dissemination of misinformation significantly misleads people and causes public panic. As a result, this information explosion on social media hinders effective pandemic response and increases public confusion about who and what preventive measures to trust [2] . One widely accepted solution to combat infodemic is known as cascade prediction. Its purpose is to learn the popularity of messages given its early adopters. Accurate prediction can help people catch hot information that deserves attention and assist healthcare department identify misinformation that will require fast response to control the impact in advance. Research on cascade prediction has been sustained, with a large number of prediction models developed. Earlier models rely on hand-crafted features extracted from demographic profiles of early adopters [3] , [4] , [5] and the subgraphs composed of early adopters and their relationships [6] . The recent advances of representation learning techniques lead to end-to-end representation-based prediction models [7] , [8] . Particularly, the application of graph neural networks (GNN) allows to simulate cascading effects over social networks, and further improves the performance of cascade prediction [6] . In spite of the various diffusion patterns exploited, the works mentioned so far have not considered the spillover effect of a user's exposed information over social media on his/her behaviour of forwarding a message and becoming part of its diffusion, which we call info-exposure spillover effect for short. We say a user is exposed to a message if the user posted the message or perceives it from his friends on social media. Here, we adopt the definition of behaviour spillover effect which intuitively means "the observable and causal effect that a change in one behaviour has on a different, subsequent behaviour" [9] . For example, tweets about unemployment and job-searching may make a user who read them perceive the severity of the pandemic and thus more likely retweet tweets about preventive measures like stay-at-home. We hypothesise the existence of this info-exposure spillover effect according to the previous studies related the COVID-19 pandemic. Park et al. [10] demonstrate that information with medically oriented thematic framework has a wider spillover effect on COVID-19 issues in a Twitter context. Racist information can have a spillover effect on the mistrust of medical system [2] and lead to a lack of trust in the information released by these systems. In this paper, we focus on the messages related to COVID-19 preventive measures considering their importance in the combat against the pandemic. We collected a dataset from Twitter and successfully validated the existence of the info-exposure spillover effect of users' exposed messages on their decision to retweet messages related to preventive measures. This allows us to extend existing state-of-the-art cascade prediction models relying on GNNs. According to our evaluation on our dataset, our extended models can increase the cascade prediction performance up to 23% in COVID-19 messages related to preventive measures. Meanwhile, we observed that the use of info-exposure spillover effect can also increase the accuracy in predicting the popularity of other COVID-19 related messages. Cascade prediction has become attractive after studies shed light on some key properties of information cascades that can be predicted [3] , [11] . In general, the cascade prediction methods can be divided into two classes: macro-level prediction and micro-level prediction. Micro-level prediction aims to predict users who will be activated during the information diffusion, while macro-level cascade prediction directly calculates the final size of targeted cascades. The idea of most micro-level methods are based on the Independent Cascade model (IC) [12] , which calculates the probability of influence between every pair of users [13] , [14] . These methods rely on a number of assumptions that overly simplify the real situation such as the complete observation of diffusion processes [15] . Although Deepinf [16] uses an endto-end deep learning method to overcome such assumptions, micro-level methods generally do not perform well in predicting cascade future size as they require simulating the entire diffusion process. In this paper, as our target is popularity prediction, we opt for macro-level methods. Macro-level prediction methods can be divided into three categories as a result of technological evolution, i.e., statistical prediction model, machine learning-based methods and deep learning-based methods. The development of macro-level prediction starts with statistical models such as SEISMIC [17] and Weibull [11] . Then, the advancements of machine learning lead to methods using manually designed features extracted from text content, temporal and demographic information, and network structure [11] , [3] , [4] . Deep learning-based methods overcome the deficiency of machine learning-based methods of constructing manual features and capture effective features automatically. DeepCas [18] and DeepHawkes [19] use Recurrent Neural Networks (RNNs) to capture cascading sequences in place of manually designed features. However, RNNs are limited in capturing structural information. This limitation is addressed by graph neural networks (GNNs) [20] . Intuitively, GNNs update the representation of each node by recursively aggregating the representations of its neighbours. In this way, the iterated node representation summarises both structural and representation information in neighbourhoods. CasCN [21] utilises a dynamic Graph Convolutional Network (GCN) to learn the structural information of the cascade. CoupledGNN [22] (CGNN) effectively addresses cascade prediction with two GNNs, capturing the cascading effect which indicates that the activation of one user will successively trigger its neighbours. Although deep learning-based methods have achieved relatively good results in cascade prediction, little research has been conducted to incorporate textual content into cascade prediction. Textual content, an important part of social media, may contain information that are related to the diffusion of messages. information a node has received in the past and its activate status. Thus, we narrow the focus in this article to macro-level cascade prediction by extending the existing models to explore online textual content. In this section, we will give the formal definition of the popularity prediction problem studied in this paper which takes into account both social networks and online textual contents. When a message m is firstly posted by a user, it will be perceived by the user's followers who will adopt the message and relay the message. This cascading process will continue on the social network until no further sharing occurs. We denote the observed diffusion cascade of m at in the time window T by C T m = {u 1 , u 2 , . . . , u n m T }, i.e., the set of users who adopted m in time window T . Note that n m T is the number of the adopters of m in time window T . We use graph G = (V, E) to denote the social network where V is the set of nodes representing users and E ⊂ V × V is the set of edges indicating the relationships between users. Compared to the previous works on cascade prediction, we take into account the online textual messages posted by users. Specifically, for a user v ∈ V, given a time period, we use M v to denote the messages posted by user v and M to denote the set of all messages, i.e., M = ∪ v∈V M v . Online textual content-aware cascade prediction. Given the cascade of message m in time window T (i.e., C T m ), social network G = (V, E) and the messages posted by users in V, i.e., ∀ v∈V M v , the purpose of the problem is to predict the final popularity of m at time ∞, i.e., n m ∞ . As mentioned previously, we focus on the diffusion of the messages related to COVID-19 preventive measures in this paper and the user generated messages are also related to the COVID-19 pandemic. We will make use of the infoexposure spillover effects of users' exposed information on their decision on relaying preventive measure messages to solve the cascade prediction problem. The result of this paper may be applicable to other types of information if similar spillover effect also exists. The purpose of Graph neural networks (GNN) is to calculate a representation of a graph. Compared to graph embedding works such as node2vec [23] and DeepWalk [24] , one advantage of GNN is that it allows to integrate node attributes into the learning process. GNN is implemented with multiple layers. At each layer, a node's embedding is updated by combining the representation of their neighbours calculated in the previous layer. Intuitively, a k-layer GNN calculates a representation for each node by combining the attributes of the nodes within k hops. We adopt the formal definition in [20] and give the general definition of the -th layer for a node v ∈ V as follows: where h v is the representation vector of node v at the -th layer and N (v) denotes the neighbours of node v. Function Aggregate and Combine are instantiated in many variants of GNN so as to capture different features of nodes' neighbourhoods. With the representation vector of every node at the k-th layer, then the representation of the graph G can thus be calculated by a function as follows: The Readout function can be simply implemented as the mean of nodes' vectors or other complex pooling functions. Twitter, one of the most prominent online social media platform, has been used extensively during the COVID-19 pandemic. We chose the Greater Region (GR) 1 , a region with a popularity of high mobility, as the targeted area in this paper. This section presents how we build the dataset, construct the cascades and build the social graph for our analysis and experiments. In our dataset, we collect two types of data: i) the COVID-19 related tweets posted or re-tweeted by GR users; ii) the social networks of GR users recording their following relationships. In what follows, we elaborate the three steps we followed to gather these data. Step 1. Tweet collection. At this step, we collect a set of seed users in GR who actively participate in COVID-19 discussions and the tweets they originally posted or retweeted. Instead of searching by keywords, we refer to a publicly available dataset which contains the IDs of COVID-19 related tweets [25] . We extract the tweet IDs posted between January 22, 2020) and July 18, 2020. This period covers the first wave of the pandemic. Through these IDS, we downloaded the corresponding tweet. Due to the ambiguity of locations of tweet posters, we use the geocoding APIs, Geopy and ArcGis Geocoding to regularise locations associated with tweets. For example, a user input location Moselle is transformed to a preciser and machine-parsable location: Mosselle, Lorraine, France. Based on the regularised locations, we filter the downloaded tweets and remove those posted by users out of GR. In total, we obtain 144,961 tweets from 8,872 GR users. Step 2. Social network construction. We construct the social network of a large number of GR users at this step. We use an iterative approach to gradually enrich the social network. For each seed user, we obtain his/her followers and only retain those who have a mutual following relation with the seed user, because such users are more likely to reside in GR. We then download new users' locations from their profile data and only add users from GR to the social network. We also add edges if users in the network have following relation with the newly added users. After the first round, we continue going through the newly added users by adding their mutually followed friends that do not exist in the current social network. This process will continue until no new users can be added. In our collection, it takes 5 iterations before termination. We take the largest weakly connected component of the social network. After this step, we collected a total of 12,256,152 users and 21,203,130 following relationships. Since the majority of users in the network are relatively inactive, we construct a subgraph by removing all users who post or retweet less than 2 tweets. Note that we keep some of such inactive users when the remaining network are no longer connected after the removal of these users. In the end, we obtain a social network with 21,339 users and 214,962 edges. We construct cascades from our tweet dataset and the social network built previously based on the definition in section III-A. A total of 60,035 cascades are built and we remove cascades with less than 3 users, following the existing works [18] , [22] . Eventually, 82.38% of cascades are removed and we ended up with 10,579 cascades. The average size of these cascades is 4.78. We use C to denote the set of all selected cascades. From C, we construct the set of cascades corresponding to messages related to preventive measures, denoted by C PM , based on the keywords listed in Table I . In this section, we will validate our hypothesis that the information exposed to a user has a spillover effect on his/her behaviour of adopting a message related to COVID-19 preventive measures. We first briefly describe the method we use to measure the hypothesised info-exposure spillover effect and then give the detailed experimental analysis designed to validate its existence in the diffusion of COVID-19 preventive measure messages. We design our experimental framework based on the experimental investigation method for spillover effect validation. Intuitively, the idea is to investigate whether users exposed to different information will behave differently in retweeting a message related to preventive measures. In other words, we will check whether certain type of exposed information will change the likelihood of users with regard to retweeting a preventive measure message. We divide the set of users into groups according to the information they are exposed to. Each group is composed of users who are exposed to a certain composition of information. One of this groups is set as the control group. The selection of the control group depends on the purpose of the experiment. Then the proportion of users in each group retweeting preventive measure messages will be used to estimate the likelihood of adopting preventive measure messages. By comparing the measurements with the control group, we can then quantitatively evaluate the gratitude of the spillover effect of the information exposed to this user group on adopting preventive measure messages, which we call spillover elasticity. Formally, the nodes in social graph G will be divided into n groups, i.e., D = {V 1 , . . . , V n } where ∪ V ∈D V = V. Let V c ∈ D be the selected control group. For each user group V i , we will find the users who re-tweet preventive measure messages in M PM , and construct the set of users V PM i . Then the activation likelihood for users in V i is calculated as With these notations, we can define spillover elasticity. Definition 1 (Spillover elasticity): The elasticity of the infoexposure spillover effect of group V i in a division D of user set V is calculated as Positive elasticity indicates the exposure to the information of V i increases the likelihood of adopting a preventive measure message while negative elasticity indicates the opposite. We start to verify that being exposed to certain information will affect users' behaviour of adopting and re-tweeting COVID-19 preventive measure messages. In order to conduct our experimental analysis, we need to first distinguish the types of COVID-19 related information. Previous studies [26] , [27] classified COVID-19 related information into several topics. Among these topics, we select three that are widely discussed in our dataset, i.e., unemployment, panic buying and school closures, and extract corresponding tweets with the keywords listed in Table I . We conduct our analysis from two perspectives. We first evaluate the influence of messages of a single topic on the behaviour of adopting a preventive measure message. Second, we investigate the influences of different compositions of topics of messages. Spillover effect of information of single topic. We build three divisions of the users in order to evaluate the spillover effect of each topic, i.e., D U , D P and D S for unemployment, panic buying and school closure, specifically. Each division has only two groups. One group consists of users that have been exposed to messages of the corresponding topic while the other group is composed of users that have not been exposed. We will take the group unexposed to the topic of messages as the control group. In table II, we summarise the results about the number of users exposed and unexposed in each division and the activation likelihood as well as the final elasticity. We have two observations from this table. First, the exposure to each type of messages will increase the likelihood of users to re-tweet a preventive measure message. On average, the activation likelihood equals to 0.56 for the exposed group while the unexposed group only has an activation likelihood of 0.26. The average elasticity is 1.27, which indicates that the activation likelihood doubles for the users exposed to the topics we selected on average. Second, the increase of activation likelihood for exposed users differs among the topics of exposed information. For instance, the exposure to information related to panic buying just leads to 25% increase which is much smaller than the other two topics of messages. For the above analysis, we can conclude that i) exposure to certain topics of information will have a positive spillover effect on users' adopting preventive measure messages; ii) the scale of spillover effect differs according to the topics of exposed messages. Spillover effect of information of compositions of topics. In the previous analysis, we focus on the spillover effect of single topics and ignore the changes when multiple topics of information are exposed to users simultaneously. We construct of a division of the users where U , P and S are short for unemployment, panic buying and school closure, respectively. The user group exposed to none of the topics is selected as the control group. Figure 1 shows the activation likelihood of user groups exposed to one or two selected topics of messages. We can see that exposure to more selected topics increases the likelihood of adopting a preventive measure message. The most significant increase occurs to the panic buying topic. Exposure to an additional topic will increase the activation likelihood by more than two times. When exposed to all the topics, the activation likelihood is increased to 0.81. When exposed to none of the topics, the activation likelihood for the users drops below 0. 10 From the analysis, we empirically validated that the information exposed to users indeed has a spillover effect on the behaviour of adopting a preventive measure message. In other words, the likelihood of a user to re-tweet a preventive measure message will differ if they are exposed to different information. In the following, we will make use of this phenomenon to improve the accuracy of the prediction of message popularity. In this section, we will make use of our findings of the spillover effect of users' exposed information on their decision of retweeting preventive measure messages to improve the accuracy of cascade prediction. Recall that the information exposed of users is composed of the messages posted by their friends and his/her own posts. We encode users' posted textual messages into representation vectors and then attach them to the attributes of the corresponding nodes in the social network. Then we can make use of the GNN framework to summarise the messages posted by their neighbours and even users that are not incident but within a certain number of hops defined by the number of layers in GNNs. In the following, we start with describing the node attributes of the social graph and then proceed to extend GNN-based models to integrate infoexposure spillover effect. We also give the objective function to train the model parameters The node attribute of node v ∈ V , i.e., h 0 v is concatenated by three components: i) s v , the activation status of the user in the given cascade C, ii) δ v , the representation vector of the messages posted by the user, and iii) e v , the node embedding of the user's corresponding node in the network. Formally, where · · is the concatenation operator. The user activation status s v is set to 1 if v ∈ C and 0, otherwise. The node embedding captures the structural properties of the user's neighbourhood in the graph. Following existing studies [18] , [22] , we use DeepWalk without further fine-tuning to learn the structural embedding for each user. We will describe in detail the method to abstract the messages posted by the user into a representation vector. RoBERTa [28] is a language pretrained transformer to encode short texts in multiple languages. In this paper, we use a widely used multilingual pre-trained RoBERTa variant: XLM-RoBERTa [29] . For each m ∈ M, we calculate its embedding with our trained XLM-RoBERTa model, and let d m be the corresponding embedding vector. For the messages posted by user v, we take the mean of the embedding vectors of all his posted messages as the final message embedding. Formally, we have δ v = 1 |Mv| m∈Mv d m . We implement three variants of GNN to integrate the infoexposure spillover effect we identified in the previous section, i.e., Graph Convolutional Networks (GCN) [30] , Graph Attention Network [31] and CoupledGNN [22] . GCN is a semisupervised learning algorithm for graph representation and GAT is a variant of GCN which introduces the attention mechanism to distinguish the significance of neighbours. These two variants are not designed specifically for cascade prediction, but for the general purpose of summarising neighbourhoods with a given depth. The calculated node representation can then be used for the downstream tasks such as link prediction and node classification. CoupledGNN [22] is a model developed for cascade prediction, and can stand for the state-of-theart. It has overwhelming performance over existing models by considering the cascading effect of information diffusion on social network, i.e., the phenomenon that users are activated due to the influence from their activated neighbours. By extending these models, our purpose is to illustrate the effectiveness of info-exposure spillover effect in improving further the performance of predicting the popularity of COVID-19 preventive measure messages. In addition, our extension can provide useful references for future cascade prediction models to integrate info-exposure spillover effect. The definitions of the function Aggregate(*) and Combine(*) of GCN, GAT and CoupledGNN are briefly given in Table III . GAT and GCN share the same combination function. For GCN, we use the mean of the representation vectors of both the nodes and their one-hop neighbours as the aggregated value at each layer while GAT uses the weighted average. We describe CoupledGNN in more details due to its simulation of the cascading effect in information diffusion. For the full description, we refer the readers to the original paper [22] . It deploys two GNNs. One GNN captures the activation statuses of users during the information diffusion at each layer, e.g., the activation status of user v at the -th layer s v . The other GNN aims to simulate the influence of users changing along with the activation status and the influences of their neighbours, i.e., r u . A neighbour u's influence to user v on becoming active in the next layer + 1 is calculated by the function influGate(r u , r v ). Then the aggregation function is the weighted average of all the neighbours' activation statuses with the default activation probability p v added. The combination function is based on the weighted average of its Aggregate(*) Combine(*) status of the previous layer and the aggregated representation. With the output activation status at the last layer (e.g., k), the popularity of the message diffused in C m T is calculated as In the following, we will describe in detail how we extend each selected model to capture the info-exposure spillover effect. We can interpret the output of the kth layer of a k-layered GCN or GAT as the summary of the information exposed to every user. Then we use an activation function to capture the info-exposure effect. Specifically, the function takes as input the output of the GCN or GAT and the representation of the message diffused in the given cascade and outputs the predicted final activation status of the nodes. Let m be the message being diffused and recall that d m is the representation of m calculated by the RoBERTa model. Let s ∞ v be the predicted activation status of node v. Our activation function is defined as: where function activate is implemented as a 3-layer neural network in this paper and W h and W δ are two parameters matrix to be learned. We add this function as an additional layer after the last layer of the GCN and GAT. Recall that CoupledGNN uses the function Influ-Gate to simulate the process of a user to be activated by their neighbours. The influence vector, e.g., r u of user u, contains user u's posted message and the messages from u's neighbourhood. Therefore, it can be considered as a summary of the information perceived by a user v from u if v follows u in Twitter. Based on this intuition, we extend CoupledGNN by reformulating the function InfluGate(*) to capture the the info-exposure spillover effect: C. Objective function We use the same objective function of [22] which is the mean relative square error (MRSE) and defined as the follows: This loss function is regularised to avoid over-fitting and accelerate the convergence speed, i.e., L = L MRSE + L Reg where L Reg = θ p∈P p 2 + λL user . Note that P denotes the set of parameters and L user is the cross-entropy is the final activation status of v. We adopts the metrics used in [22] to evaluate and compare the performance of our extended models and the benchmarkings used in our experiments. Specifically, in addition to the mean relative square error (MRSE) introduced in the previous section, we also use mean absolute percentage error (MAPE) and wrong percentage error (WroPerc). MAPE measures the average deviation between the predicted popularity and the true one while WroPerc measures the percentage of cascades that are incorrectly predicted with a given error tolerance . formally, they can be formally defined as: Note that I( * ) is an indication function which outputs 1 when the input proposition is true or 0 otherwise, and the threshold is set as 0.5 in our experiments. In addition to CoupledGNN, we use the following models as baselines. Feature-based method. This is a linear regression model with L2 regularisation with features. For better comparison, we adopt the same features used in the past studies [22] , [18] . SEISMIC [17] . SEISMIC uses the Hawkes self-activation point process to estimate or approximate the impact of cascading effect by their average number of followers. DeepCas [18] . DeepCas is an end-to-end deep learning method for information cascades prediction. It utilises the structure of the cascade graphs and node identities for prediction. An attention mechanism is designed to assemble a cascade graph representation from a set of random walk paths. GCN and GAT. We construct these two models from our SE-GCN and SE-GAT models by removing the representation vectors of messages. In other words, these two models only rely on network structure to predict the size of final cascades. As the output of the RoBERTa for a sentence is a highdimensional and sparse vector. we apply linear transformation to map its output to a relatively low-dimensional space. The dimension of the final text embedding used is set at 128. For all models including baselines, we tune their hyper-parameters to ensure a good performance on validation sets. The L2coefficients are chosen from {0.5, 0.1, 0.05, · · · , 10 −8 }. For all neural network models, the learning rate is chosen from {0.1, 0.05, · · · , 10 −5 }, the coefficient in loss function is set to be 0.5, and the mini-batch size is chosen from {15, 10, 5}. The number of GNN layers k is selected from {5, 4, 3, 2}. As for DeepCas, the number of walk sequences with walk length are set as 100 and 8, respectively. For SEISMIC, we follow the parameters from the original study, i.e. setting the constant period as 5 minutes and power-law decay parameters θ as 0.242. Considering the diffusion time of the messages in our collected data, we set the observation time window T as 3 hours and construct a set of observed cascades by removing users in our cascades that were activated after the first 3 hours. In order to comprehensively evaluate the effectiveness of info-exposure spillover effect in predicting the message popularity, in addition to the cascades of COVID-19 preventive measure messages C PM , we also apply all the models on another two sets of cascades. One is the set of all COVID-19 related cascades C. The other is the set of COVID-19 related cascades that are not related to preventive measures, i.e., C PM = C/C PM , the complement of C PM in C. We list the performance of all the above mentioned models in Table IV in the form of the three selected metrics. In general, we can see two obvious differences when the infoexposure spillover effect is introduced in cascade prediction. First, compared to the original models, our extended models significantly improve their performance not only for the preventive measure messages, but also for all the three types of messages. The most significant improvement occurs to SE-CGNN and reaches 23% in the WroPerc measurement for the preventive measure messages and over 10% for the messages unrelated to preventive measures. This is due to the fact that CoupledGNN simulates the cascading effects iteratively and this allows for applying the info-exposure spillover effect on activating individual users in a finer granularity. From the above analysis, we can conclude that the use of info-exposure spillover effects can effectively improve the performance of existing cascade prediction models. It should be integrated into future models by design. Second, we can observe that the extended models can more accurately predict the popularity of COVID-19 preventive measure messages than the other messages, which is the opposite for the baseline models. From the baseline models, we see that it is more difficult for accurately predict the final cascade size of COVID-19 preventive measure messages. Their performance on C and C PM are almost the same but becomes worse on C PM . The feature-based model has the worst performance which decreases by over 11% compared to that in predicting the size of the other two sets of cascades. However, when the identified info-exposure effect is used in our extended models, the popularity of preventive measure messages can be predicted with a much better accuracy. SE-CGNN can improve the performance in the set C PM by about 10% better for preventive measurement messages than those unrelated to preventive messages. This observation validated empirically that the exposure to information related to the COVID-19 pandemic has a strong spillover effect on retweeting message related to COVID-19 messages about how to prevent the transmission of the virus. In this paper, we concentrated on the problem of cascade prediction for COVID-19 information about preventive measures on online social media platforms. Compared to previous works, we took into account the phenomenon that the exposure to COVID-19 information will influence the behaviour of users to participate in the diffusion of information related to preventive measures, which we call info-exposure spillover effect in this paper. With a dataset we collected from Twitter, we successfully validated its existence. We then applied the identified spillover effects in predicting the popularity of preventive measure messages. Specifically, we built three new models by making use of the recent advances of graph representation techniques, i.e., graph neural networks. With extensive experiments, we showed that our new models outperform baselines not only for preventive measure messages but for all the COVID-19 related messages. This illustrates that the introduction of info-exposure spillover effect can effectively improve the performance of cascade prediction. There are still several limitations in our research. When representing users' historical textual posts, we took the mean of their representation vectors. This may remove certain useful information hidden in users' past messages. Moreover, we ignore the significance variance caused by the post time of messages. It has been studied that recent messages may have larger influence. This can be solved by introducing recurrent networks such as LSTM or the Hawkes process. Second, our cascade prediction models are extended from existing GNN models and focus on preventive measure messages. It will be an interesting future work to design a new general end-to-end GNN model which can capture diffusion patterns shared by COVID-19 related messages. The COVID-19 social media infodemic Police brutality and mistrust in medical institutions Can cascades be predicted Cascading outbreak prediction in networks: a data-driven approach Analyzing and predicting viral tweets Monte carlo simulation Representation learning for information diffusion through social networks: an embedded cascade model A novel embedding method for information diffusion prediction in social network big data How to measure behavioral spillovers: A methodological review and checklist Conversations and medical news frames on Twitter: Infodemiological study on COVID-19 in South Korea From micro to macro: Uncovering and predicting information cascading process with behavioral dynamics Maximizing the spread of influence through a social network Talk of the network: A complex systems look at the underlying process of word-of-mouth Inferring networks of diffusion and influence On the convexity of latent social network inference Deepinf: Social influence prediction with deep learning Seismic: A self-exciting point process model for predicting tweet popularity Deepcas: An end-to-end predictor of information cascades Deephawkes: Bridging the gap between prediction and understanding of information cascades The graph neural network model Information diffusion prediction via recurrent cascades convolution Popularity prediction on social platforms with coupled graph neural networks node2vec: Scalable feature learning for networks Deepwalk: Online learning of social representations Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus Twitter data set COVID-19 suicides in Pakistan, dying off not COVID-19 fear but poverty?-the forthcoming economic challenges for a developing country Understanding the socio-economic disruption in the United States during COVID-19's early days Roberta: A robustly optimized bert pretraining approach Unsupervised cross-lingual representation learning at scale Semi-supervised classification with graph convolutional networks Graph attention networks Acknowledgements. This work was partially supported by Luxembourg's Fonds National de la Recherche, via grant COVID-19/2020-1/14700602 (PandemicGR) and grant PRIDE17/12252781/DRIVEN.