key: cord-0676712-3gwpqxkb
authors: Mubang, Fred; Hall, Lawrence
title: Simulating User-Level Twitter Activity with XGBoost and Probabilistic Hybrid Models
date: 2022-02-18
journal: nan
DOI: nan
sha: ee01b4336123133d865d3fa19d4b229814787f59
doc_id: 676712
cord_uid: 3gwpqxkb

The Volume-Audience-Match simulator, or VAM was applied to predict future activity on Twitter related to international economic affairs. VAM was applied to do timeseries forecasting to predict the: (1) number of total activities, (2) number of active old users, and (3) number of newly active users over the span of 24 hours from the start time of prediction. VAM then used these volume predictions to perform user link predictions. A user-user edge was assigned to each of the activities in the 24 future timesteps. VAM considerably outperformed a set of baseline models in both the time series and user-assignment tasks

Recent research strongly suggests that social media activity can serve as an indicator for future offline events. For example, the authors of [1] showed that Twitter user data could be used to predict the spatiotemporal spread of COVID-19. The authors of [2] found a strong correlation between the number of tweets mentioning each candidate in a given state, and the state's election results.

Clearly, more attention should be focused upon creating a simulator that can predict future social media activity at user and topic granularity. To that end, in this work we used the Volume Audience Match Simulator, or VAM, which was first introduced in [3] . VAM is a machine-learning and sampling driven simulator that predicts both overall activity volume and user level activity in a given social media network. VAM is comprised of 2 modules.

The first module is the Volume Prediction Module. This module predicts, over the next 24 hours, the future (1) activity volume time series, (2) active old user volume time series, and (3) active new user volume time series on some social media platform, for some given topic of discussion.

The second module in VAM is the User-Assignment Module. This module uses the 3 time series predicted by the VP-Module, as well as historical user-interaction information in order to predict, for a given topic, the user-to-user interactions over the next 24 hours from some start time T . We tested VAM's predictive power on a dataset of tweets related to the China-Pakistan Economic Corridor (CPEC).

For the time series prediction task, we used 5 baselines and VAM outperformed all of them. These were the Persistence Model Baseline, ARIMA, ARMA, AR, and MA models [4] . For the user-assignment task, we used the Persistence Model Baseline (because the ARIMA-based models can only predict *This work is partially supported by DARPA and Air Force Research Laboratory via contract FA8650-18-C-7825. 1 Department of Computer Science , University of South Florida, 4202 E Fowler Ave, Tampa, FL 33620, USA fmubang@usf.edu lohall@usf.edu time series and not user-level activities). VAM outperformed this baseline in the user-assignment task as well. Figure 1 contains a pictorial representation of VAM [3] . The contributions of this paper are as follows. Firstly, we show that the Volume Audience Match algorithm of [3] can be used to predict user-level activity on a dataset related to international economics (Chinese-Pakistan Economic Corridor), a dataset different than the Venezuelan Political Crisis dataset used in [3] . By using a different dataset, this lends more credence to the idea that VAM is a generalizable framework for predicting user-level activity on social media networks. Secondly, we show that VAM outperforms a multitude of baselines in the time series prediction and user-level link prediction tasks. Thirdly, similar to [5] , we show that VAM can predict the creation of new users, unlike many previous works that only focus on the prediction of old users. Fourthly, we show that using external social media features from Reddit and YouTube can aid with predicting future Twitter activity.

A social media topic simulator could allow governments or organizations to react to concerns of the masses more effectively and efficiently. In [3] , VAM was used to simulate future Twitter activity related to 18 Venezuelan Political Crisis topics. This domain was of interest for several reasons. For example, if there are many events or many users writing tweets about the Venezuelan protests topic, that could mean that there is civil unrest taking place. Or, if many people are discussing the Venezuelan violence topic, that could mean that many people are engaging in violent activities, or on the receiving end of such violent activities.

In this work, we apply the VAM simulation system to another domain, which is the Chinese-Pakistan Economic Corridor (CPEC), an infrastructure initiative between China and Pakistan. There are 10 topics in this domain. If VAM could in fact, accurately predict future user-activity related to the CPEC initiative, that would allow some government or organization to have a better understanding of public opinion related to CPEC. For example, if VAM predicts that there will be an increase in tweets related to the benefits/development/jobs or benefits/development/roads topics, this lets some government or corporate entity know that people may be focusing on potential benefits of the CPEC initiative such as an more jobs or better roads.

Beyond the domain-specific applications, by applying VAM to another dataset besides the Venezuelan Political dataset of [3] , we show that perhaps VAM can serve as a general social media activity simulator, and not simply a domain-specific one.

As noted there are 2 problems VAM attempts to solve, the Volume Prediction Problem and the User-Assignment Problem.

The Volume Prediction Problem is to predict the overall volume of Twitter activities. Note that we do not distinguish whether a particular action is a tweet, retweet, quote, or reply because the focus of this work is to predict the overall volume of Twitter activities. Let q be some topic of discussion on a social media platform such that q ∈ Q, in which Q consists of all topics. Furthermore, let T be the current timestep of interest. The Volume Prediction task is to predict 3 time series of length S between T + 1 and T + S. These time series, for a topic q, are the future (1) activity volume time series, which is the count of actions per time interval; (2) the active old user volume time series, which is the number of previously seen users performing an action in a time interval; and (3) the active new user volume time series, which is the number of new users that perform an action in a time interval. Note that in this work S = 24 in order to represent 24 hours [3] .

Before describing the User-Assignment Problem we must first define several terms. Let G be a sequence of temporal weighted and directed graphs such that G = {G 1 , G 2 , ...G T }. Each temporal graph, G t , can be represented as a set (V t , E t ). V t is the set of all users that are active at time t. E t is the set of all user-to-user interactions, or links, at time t. Each element of E t is a tuple of form (u, v, w(u, v, t)). u is the child user, or the user performing an action (such as a tweet or retweet). v is the parent user, or user on the receiving end of the action. The term w(u, v, t)) represents the weight of the outdegree between u and v at time t [3] . Now we discuss the User-Assignment Problem. The goal is to assign a user to each activity predicted by the Volume Prediction Module, and to then assign edges between pairs of users. For tweets an edge between user A and B represents the act of user A retweeting a post by user B.

Given this information, let us say, for topic q there are 3 volume time series as discussed in the Volume Prediction Problem. The task is now to use these volume predictions, as well as the temporal graph sequence G to predict the user-to-user interactions for topic q between T + 1 and T + S. This can be viewed as a temporal link prediction problem. These predicted user-user interactions are contained in a temporal graph

The VAM Simulator was first discussed in the technical report [3] , which contains all details. In this work, we focus on the performance of VAM on our Twitter CPEC dataset.

Firstly, there are general "popularity prediction" methods which aim to predict the overall future volume of activities on a given social network, irrespective of "which user does what when". This is done in [6] with neural networks, in [7] with various statistical regression models such as Polynomial, Exponential, etc., in [8] with Hawkes Processes, and [9] with LSTM neural networks.

Next, there are methods that use a "decompositional userlevel approach", meaning user-level activity is predicted, however the problem is broken into several steps. The proposed models in [10] , [11] , and [12] predict user activity in these two steps: (1) first, an initial model predicts the overall "volume" of activities, and then (2) a second model uses the predicted volume of activities as features to predict "which user does what when". In [10] ARIMA is used for both models and [11] and [12] use LSTMs for both models. Note that VAM uses a similar approach to userlevel prediction, where the two main differences are that (1) VAM performs user-to-user predictions in Twitter, whereas the 3 works previously mentioned perform user-to-repository predictions in Github. Also, [11] and [12] do not model new users at all, whereas VAM does. New users are modeled in [10] but it does not utilize a user-archetype table to model new user activity, whereas VAM does.

The other types of decompositional prediction methods utilize clustering. An initial model predicts the main user clusters, then a 2nd model predicts the users in each cluster [13] , [14] .

The "direct user-level prediction" methods directly predict future user-level activity, without breaking the problem into subtasks. Embedding neural networks have been used, such as in the case of [15] and [16] . The works of [17] and [18] use neural networks on sequences of adjacency matrices to predict user activity over time in Twitter. In [19] the authors use a novel method that combines social science theory with a preferential attachement model.

The authors of [20] use Bayesian, sampling, and link prediction based models to predict user-level activity. Lastly, [21] used a machine learning driven approach to predict Github user-repo pairs, as well as user cascades in Twitter and Reddit.

There have been several previous works on temporal link prediction algorithms. More recent embedding-based, neuralnetwork based approaches include dyngraph2vec [22] , and tNodeEmbed from [23] . However, the issue with embedding methods is that they are computationally expensive in terms of training time and space.

There are matrix factorization methods used in [24] , [25] , and [26] . And finally, probabilistic methods, such as [27] and [28] . However, the problem with these approaches is that they are computationally expensive in terms of time and space. Lastly, none of these approaches can predict the growth of new users, which is important for certain social networks in which activity is strongly driven by new users.

Data was collected and anonymized by Leidos. Annotators and subject matter experts (SMEs) worked together to annotate an initial set of 4,997 tweet and YouTube comments. These posts were related to 21 different topics, which are in the supplemental materials [29] . There is a table that shows the Weighted Average Inner-Annotator agreements on each of these topics. All topics are related to the Chinese-Pakistan Economic Corridor. The time period was from April 2, 2020 to August 31, 2020.

A BERT model [30] was trained and tested on this annotated data with a train/test split of 0.85 to 0.15. The F1 scores per topic are shown in the supplemental materials [29] . There was a wide range of F1 scores, with the highest being 0.97 and the lowest being 0. As a result, in order to avoid having an overtly "noisy" dataset, we only chose topics for our final Twitter dataset that had a Weighted Average Inner-Annotator Agreement of 0.8 or higher, and a BERT F1 score of 0.7 or higher. By doing this, we ended up with 10 topics. This BERT model was then used to label topics for 3,166,842 Twitter posts (tweets/retweets/quotes/replies) and 5,620 YouTube posts (videos and comments). Table I shows the counts of the Twitter and YouTube posts per topic. BERT was not applied to the Reddit data, so the Reddit data used as additional features in this work is not split by topics.

The supplemental materials [29] contain the node and edge counts of each of the 10 Twitter networks. The largest network in terms of nodes is the controversies/china/border network with 443,666 nodes. The smallest network in terms of nodes is the controversies/pakistan/students network, with 10,650 nodes.

Lastly, the supplemental materials [29] contain a table showing the average hourly proportion of new to old users in the Twitter dataset. As shown in the table, for some topics, there is a particularly high frequency of average new users per hour. For example, in controversies/china/uighur, on average, every hour 78.72% of the active users were new and 21.28% were old. Topics such as this are the reason we aim to use VAM to predict both new and old user activity, unlike most previous works that only focus on old/previous user activity prediction.

Our training period was from April 2, 2020 to August 10, 2020 (4 months). The validation period was August 11 to August 17th, 2020 (1 week). Lastly, the test period was August 18, 2020 to August 31, 2020 (2 weeks).

Each sample represents a topic-timestep pair. The input features represent multiple time series leading up to a given timestep of interest T . The different possible time series used for features are shown in Table II . Also, a 1 hot vector of size 10 was used to indicate which topic each sample represented. Table III shows the feature sizes for each model trained. The model column shows the name of the model. The abbreviation represents the platform features used to train the particular model. "T", "Y", and "R" represent Twitter, YouTube, and Reddit respectively. The numbers represent the hourly length of the time series input to each model. However, note that the 3 output time series of each model are each of length 24 in order to maintain consistency in evaluation. For example, the VAM-TR-72 model is a model trained on Twitter and Reddit time series that are all of length 72. Using Table II these time series indices would be 1-3, 7-9, and 13, or 7 different time series. Also recall, the 10 static features (for the 1 hot vector). So in total, this model had 7*72 + 10 = 514 features, as shown in the table.

There were 31,210 training samples used for each model, 1,450 validation samples, and 140 test samples. There are 140 test samples because of 10 topics and 14 days for testing. However, for training and validation, we wanted to generate as many samples as possible so our models had adequate data. So, for those datasets, we created samples by creating "days" both in terms of hour and day. We call this a "sliding window data generation" approach, similar to [3] .

We trained 12 different VAM models. Each model was trained on a different combination of platform features which were some combination of Twitter, Reddit, and YouTube. The time series features used for each platform are shown in Table II . The names of the different models used are shown in Table IV . Furthermore, we also used different volume lookback factors (L vol ). The L vol parameter determines the length of each time series described in Table II . For example, the VAM-TRY-24 model was the model trained on Twitter, Reddit, and YouTube time series, all of length 24.

VAM's Volume Prediction module, which we call Φ, is comprised of multiple XGBoost models. It takes an input vector, x and produces a matrix,Ŷ ∈ R 3×S . In other words, Φ(x) =Ŷ . Each row of this matrix represents one of the 3 volume time series (actions, new users, old users). Each column represents a timestep between T + 1 and T + S, with S = 24 hours in our experiments. As a result, there are 72 XGBoost models contained within the Volume Prediction Module Φ, each one "specializing" on an hour-output-type pair (e.g. number of new users in hour 1, or number of activities at hour 18, etc. ). For more details see [3] .

Similar to [3] , we used the XGBoost [31] and sk-learn [32] libraries to create our XGBoost models. The subsample frequency, gamma, and L1 regularization parameters were set to 1, 0, and 0 respectively. A grid search over a pool of candidate values was done for other parameters using the validation set. For the column sample frequency, the candidate values were 0.6, 0.8, and 1. For the number of trees parameter, the candidate values were 100 and 200. For the learning rate, the values were 0.1 and 0.2. For L2 Regularization, the values were 0.2 and 1. Lastly, for maximum tree depth, the values were 5 and 7.

Mean Squared Error was the loss function and log normalization was used. Activity volume time series for a given topic in Twitter. 4

New user volume time series for a given topic in YouTube. 5

Old user time series for a given topic in YouTube. 6

Activity volume time series for a given topic in YouTube. 7

Activity volume time series across all topics in Twitter. 8

New user volume time series across all topics in Twitter. 9

Old user volume time series across all topics in Twitter.

Activity volume across all topics in YouTube.

New user volume time series across all topics in YouTube.

Old user volume time series across all topics in YouTube.

Activity volume time series in Reddit. 

In order to properly assess VAM's predictive power in the time series prediction task, various metrics were used. We used RMSE and MAE metrics in order to assess how well VAM could predict time series in terms of volume and exact timing.

Predicting the exact timing of a time series is a difficult task. It is possible for a model to approximate the overall "shape" of a time series, while not correctly predicting the number of events or exact temporal pattern. In order to account for this phenomenon, we also use the Normalized RMSE metric. It is calculated in the following way. The ground truth time series and simulated time series are both converted into cumulative time series. Each time series is then divided by its respective maximum value. The result is 2 time series whose values range from 0 to 1. Finally, the standard RMSE metric is applied to these normalized time series.

In order to measure VAM's accuracy in terms of pure volume of events, without regard to temporal pattern, we used the Symmetric Absolute Percentage Error, or S-APE. This measures how accurate the total number of events was for each model, without regard to the temporal pattern. The formula is as follows. Let F be the forecast time series, and let A be the actual time series:

The last 2 metrics were used in order to measure how well the volatility of a predicted time series matches that of the ground truth. These metrics are Volatility Error (VE) and Skewness Error (SkE). The Volatility Error is measured by calculating the absolute difference between the actual and predicted time series' standard deviations. The Skewness Error is measured by calculating the absolute difference between the actual and predicted time series' skewness. The skewness statistic used in this work utilizes the adjusted Fisher-Pearson standardized moment coefficient [33] .

We compared VAM to 5 baseline models, which are the Persistence Baseline, ARIMA, ARMA, AR, and MA models [4] . Firstly, we used the Persistence Baseline which is defined as follows. Let T be the current timestep of interest, and let S is the length of the desired predicted time series. The Persistence Model predicts the time series for period T + 1 to T + S by moving forward, or "shifting forward" the time series that spans period T − S to T . The underlying assumption of this model is that the immediate future of the time series will simply replicate its immediate past. Within the realm of social media time series prediction, this baseline has been used in [17] , [18] , [34] .

The Auto-Regressive Integrated Moving Average model (ARIMA) and its variants (ARMA, AR, and MA) are widely used statistical models and, hence, used for comparison. Furthermore, these models have even been used as the basis for some recent prediction approaches such as in the case of [10] and [14] . The ARMA, AR, and MA models are variants of ARIMA depending on what the p, d, and q parameters are set to. The term p represents the number of autoregressive terms. d is the number of differences to be performed for stationarity in the time series. Lastly, q is the number of lagged forecast errors to be used in the prediction equation [4] . For more details regarding how the ARIMA model and its variants had parameters set, refer to the supplemental materials [29] .

Table IV contains the 6 metric results for the 12 VAM models and 5 baselines. Since it is difficult to evaluate which model is best from looking at 6 different raw metric scores, we created one general metric called the Overall Normalized Metric Error (ONME). It is calculated in the following way. For each metric column, the 17 values (for 17 models) are normalized between 0 and 1 by dividing each value by the sum of the 17 metric scores. By doing this, one then creates 6 different scores for each model, all between 0 and 1, although no score is exactly 0 or 1, as one can see. We used this normalization method to show that the best model still had some error. Finally, these 6 scores for each model are averaged, creating the ONME score. The models in this table are ranked from best to worst in terms of ONME. Lower ONME is better.

Note that ARMA was the best baseline model because it had the lowest Overall Normalized Metric Error out of all 5 baselines. We wanted to know how well each model performed in comparison to this baseline. So, we also created the ONME Percent Improvement From Best Baseline Metric (PIFBB), in a similar fashion to [3] . It is calculated in the following way:

The upper bound of P IM F BB is 100%, which occurs if a model's PIMFBB is 0. This is clearly the best possible result. The lower bound for ONME is negative infinity because any given model could potentially perform infinitely worse than the best baseline.

According to Table IV , the best model was the VAM-TR-72 model. This was the VAM model trained on both Twitter and Reddit features with a lookback factor of 72. Its ONME Percent Improvement From the Best Baseline (ARMA) was 16.92%. It is noteworthy that the best 5 models all used Reddit and YouTube features in addition to Twitter features in order to predict the Twitter time series. This suggests that external platform features from Reddit and YouTube can be helpful in predicting future events on Twitter. Furthermore, the 4 worst VAM models all had lookback factors of 24, suggesting that longer lookback periods of 48 or 72 are more helpful for accurate Twitter time series prediction.

Lastly, while ARMA was the best baseline model, it is noteworthy that the seemingly simplistic Persistence Baseline was the 2nd best baseline, ahead of the well-known ARIMA model. The Persistence Baseline's PIFBB was -0.04%, while ARIMA's was -4.62% (higher is better). This is noteworthy because the Persistence Baseline is simply created by "shifting ahead" the events from the past to the future time frame. In summary, VAM outperformed ARMA for 49 out of 60 topic-metric pairs, or about 81.6% of the time. It performed particularly well at the "volume-with-exact-timing" metrics (RMSE and MAE), the "approximate temporal-pattern metric" (NRMSE), and the "volatility" metrics (Volatility Error and Skewness Error). It performed decently on the "pure volume" metric (S-APE), but obviously not as well as the other metrics. Figure 3 shows some time series plots of instances in which VAM-TR-72 performed particularly well against the baseline models.

In Figure 4 we show a bar plot of the temporal feature importances of the XGBoost models for the number of actions output category for the VAM-TR-72 model. The feature importances are calculated by adding up the number of times a feature is used to split the data across all trees and was calculated using the XGBoost library [31] . In this figure we refer to that output category as Num. Twitter Actions For Topic.

Along the Y-axis one can see the name of each feature category. There are 6 time series feature categories, 3 for the "global count" time series (the ones labelled with "All Topics"), and 3 categories for the "Twitter-topic" pair time series (the ones labelled with "For Topic"). We normalized all the feature category importance values between 0 and 1, which is what is shown in each bar plot.

As one can see, for the VAM-T-72 model, the Num. Twitter Old Users For Topic input time series is the most helpful for predicting the output time series Num. Twitter Actions For Topic. In other words, according to this plot, if one wished to predict the number of actions for the topic benefits/jobs (for example) at some future timestep, the most useful input time series would be the number of old user time series for benefits/jobs.

In second place in terms of importance, is the feature category Num. Twitter Activity Users For Topic, and in third place is the feature category Num. Twitter Old Users For All Topics. The Num. Reddit Actions was the 5th most important feature category, ahead of Num. Twitter New users For Topic.

In the following subsections, we shift our focus to the User Assignment module of VAM.

Similar to how the Volume Prediction modules utilized lookback factors (L vol ), we also utilized a lookback factor parameter for the User-Assignment task, L user . We set this value to 24 hours. So, in other words, VAM's userassignment module only uses the past 24 hours of user interaction history when making predictions. The assumption here is that recent user-interaction history is all that is needed to make accurate user-to-user predictions. We call this new truncated version of the temporal sequence of graphs, G, G recent . Using this information we now describe the userassignment algorithm [3] .

A recent history table called H recent is created from the history sequence of graphs, G recent . This table contains event records, with each record being defined as a tuple containing (1) the timestamp, (2) the name of the child user, Using this table and the volume count of old users from the Volume Prediction, module, VAM utilizes weighted random sampling to predict the set of active old users at T + 1,Ô T +1 . Using the new user volume prediction counts, VAM is also able to create the set of active new users at T + 1,N T +1 . Multiple data structures for each set of users are used to keep track of 4 main user attributes: (1) the user's probability of activity, (2) the user's probability of influence, (3) the user's list of parents it is most likely to interact with, and (4) the probability a user would interact with each parent in their respective parent list.

It is easy to obtain these 4 attributes for the old users because their history is available in the H recent table. However, for new users, VAM must infer what their attributes would most likely be. In order to do this, VAM uses a User Archetype Table, which is created with the use of a random sampling algorithm applied to the set of old users in the H recent table. The assumption is that new users in the future are likely to have the same attributes as old users in the recent past.

VAM then uses weighted random sampling in order to assign edges among the users in theÔ T +1 andN T +1 set. VAM "knows" how many total actions to assign among all users because the activity volume time series was predicted in the Volume-Prediction task. The final set of nodes and edges predicted at T + 1 is known as G f uture [29] contain a visual representation of the User Assignment algorithm. For more details, see [3] .

In this section we discuss the User Assignment results. Since the user-assignment algorithm is probabilistic, we performed 5 trials, and averaged their metric results. The supplemental materials [29] contain tables showing the standard deviations and variation coefficients of the metric results across the 5 trials.

In order to measure the accuracy of the old user prediction task, the Weighted Jaccard Similarity metric was used, which is also known as the Ruzicka Similarity [35] . It was used to measure how well VAM predicted the old users in each hour, as well as how "influential" they were. In this case, influence is defined quantitatively as the number of retweets, replies, and quotes a user's tweets received. For more details regarding this metric, see the supplemental materials [29] .

Since our task involves predicting the creation and activity of new users, in addition to activity of old users, defining and measuring new user predictive success has complexities. The names of a new user are unknown before they appear in the ground truth. Hence, it is impossible to exactly match a new So, in order to work around this issue, we measure new user prediction success using more macroscopic views of the network in the same fashion as [3] . We call these types of results, Network Structure results. Specifically, we used the Page Rank Distribution [36] of the weighted indegree of the network and the Complementary Cumulative Degree Histogram (CCDH) [37] of the unweighted indegree of the network. In order to measure the distance between the predicted and actual Page Rank distributions we used the Earth Mover's Distance Metric [38] . In order to measure the distance between the CCHD's of the predicted network and the ground truth network, the Relative Hausdorff (RH) Distance [37] was used.

For the user-assignment task, we used the Persistence Baseline as a baseline. Similar to the Volume Prediction task, it is created by shifting the user-to-user networks spanning T to T − S up to period T + 1 to T + S. This same baseline was also used in [10] , [11] , [16] , [39] , [40] .

Other works in the literature are unsuitable as baselines due to various reasons. The approaches in [10] - [14] are all prediction approaches that are platform-specific to Github, and do not translate easily to Twitter. The approaches in [21] , [41] predict Twitter user activity, but not in a way that is comparable to VAM. Those works predict user-to-tweet interactions as a classification task, whereas VAM predicts user-to-user interactions as a regression task. The general link prediction methods can predict Twitter user activity, however, they can only predict the activity of old users and not new users [22] - [28] . The Persistence Baseline can easily predict new users since it uses historical information for its predictions. For example, if it is known that there were 10,000 new users in the past period, the Persistence Baseline trivially predicts that there will be 10,000 new users in the future period. Lastly, in addition to not predicting new users, some works do not scale to the large networks used in this work [15] , [17] , [18] , [22] , [27] , [28] .

Table V contains the old user prediction results using the Weighted Jaccard Similarity metric. Since this is a similarity metric, higher scores are better. As one can see in the table we refer to this model as the VAM-TR-72V-24U model. This is a VAM model that has a volume lookback factor (L vol ) of 72 hours and a user-assignment lookback factor (L user ) of 24 hours. The numbers in bold represent the best results. As one can see in this table, VAM outperformed the Persistence Baseline on 8 out of 10 topics. VAM performed particularly well on the benefits/development/roads, 

Table VI shows the results for the Earth Mover's Distance metric (lower is better). As one can see, VAM outperformed the baseline on this metric for 8 out of 10 topics. Note, that the two topics where performance is less than the baseline have the least activity. VAM performed particularly well for the controversies/pakistan/baloch, benefits/development/roads, and opposition/propoganda topics. The Percent Improvement From Baseline (PIFB) scores for those topics were 27.29%, 20.89%, and 19.2%, respectively. Table VII shows VAM's Relative Hausdorff Distance results (lower is better). Similar to the Earth Mover's Distance results, VAM beat the baseline on 8 out of 10 topics. It performed particularly well on the controversies/pakistan/baloch, leadership/sharif, and controversies/pakistan/students topics. The percent improvement scores for those topics were 25.24%, 22.8%, and 18.94%, respectively.

The 5 trials were run in parallel across 5 computers, each with an Intel Xeon E5-260 v4 CPU. Each CPU was comprised of 2 sockets, 8 cores, and 16 threads. Each X. CONCLUSION In this work, we discussed the VAM simulator [3] , an end-to-end approach for time series prediction and temporal link prediction and applied it to the CPEC Twitter dataset. We showed that VAM could outperform the Persistence Baseline, ARIMA, ARMA, AR, and MA models on the Volume Prediction task. We then showed that VAM could outperform the Persistence Baseline on the User Assignment task. On the Volume-Prediction task, VAM outperformed the best baseline model (ARMA) on 49 out of 60 (or 81.6%) of all topic-metric pairs. Furthermore, we showed that external Reddit and YouTube features aid VAM with the Volume Prediction task.

For the User-Assignment task, VAM outperformed the Persistence Baseline on 24 out of 30 (or 80%) of all topicmetric pairs. Also, we showed that VAM can predict the creation of new users, unlike many previous link prediction approaches that only focus on the prediction of old userto-user interactions. Furthermore, we explained that VAM's user-assignment is quite fast, taking only 27 minutes to simulate the activity of millions of user-to-user edges.

By showing VAM's strong performance on the CPEC dataset, we lend more credence to the notion that VAM can serve as a general social media simulator, and not one that is just specific to the Venezuelan Political dataset [3] . Future work involves utilizing a machine-learning model for the User-Assignment module, as well as trying LSTM neural networks for both the Volume Prediction and User-Assignment modules.

Use of twitter social media activity as a proxy for human mobility to predict the spatiotemporal spread of covid-19 at global scale

Spatial analysis of social media content (tweets) during the 2012 us republican presidential primaries

VAM: An End-to-End Simulator for Time Series Regression and Temporal Link Prediction in Social Media Networks

Time series analysis: forecasting and control

VAM: An end-to-end simulator for times series regression and temporal link prediction in social media networks

Social media popularity prediction of planned events using deep learning

Time series predictive models for social networking media usage data: The pragmatics and projections

Evently: Modeling and analyzing reshare cascades with hawkes processes

An lstm model for predicting cross-platform bursts of social media activity

A predictive selfconfiguring simulator for online media

Predicting longitudinal user activity at fine time granularity in online collaborative platforms

Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

Initializing agent-based models with clustering archetypes

Modeling social coding dynamics with sampled historical data

Npp: A neural popularity prediction model for social media content

Simulating temporal user activity on social networks with sequence to sequence neural models

Using Deep Learning for Temporal Forecasting of User Activity on Social Media: Challenges and Limitations

Learning from dynamic user interaction graphs to forecast diverse social behavior

Deep agent: Studying the dynamics of information spread and evolution in social networks

The darpa socialsim challenge: Massive multi-agent simulations of the github ecosystem

Massive cross-platform simulations of online social networks

dyngraph2vec: Capturing network dynamics using dynamic graph representation learning

Node embedding over temporal graphs

Temporal link prediction using matrix and tensor factorizations

Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability

Temporal link prediction by integrating content and structure information

Nonparametric link prediction in large scale dynamic networks

An efficient algorithm for link prediction in temporal uncertain social networks

Simulating user-level twitter activity with xgboost and probabilistic hybrid models -supplemental materials

Bert: Pre-training of deep bidirectional transformers for language understanding

Xgboost: A scalable tree boosting system

Scikit-learn: Machine learning in Python

Measuring skewness: A forgotten statistic

Multiscale online media simulation with socialcube

Comprehensive survey on distance/similarity measures between probability density functions

The anatomy of a large-scale hypertextual web search engine

Catching the head, tail, and everything in between: A streaming algorithm for the degree distribution

A metric for distributions with applications to image databases

Using deep learning for temporal forecasting of user activity on social media: Challenges and limitations

Massive Multi-agent Data-Driven Simulations of the GitHub Ecosystem

Farm: Architecture for distributed agentbased social simulations

ACKNOWLEDGMENT The authors thank Leidos for providing the Twitter, YouTube, and Reddit data. This work is partially supported by DARPA and Air Force Research Laboratory via contract FA8650-18-C-7825.