key: cord-0474158-l6mcuiak
authors: Voukelatou, Vasiliki; Miliou, Ioanna; Giannotti, Fosca; Pappalardo, Luca
title: Understanding peacefulness through the world news
date: 2021-06-01
journal: nan
DOI: nan
sha: 2c197e80bffb3ee15b7c2e8cae2c6dc5a7c26133
doc_id: 474158
cord_uid: l6mcuiak

Peacefulness is a principal dimension of well-being and is the way out of inequity and violence. Thus, its measurement has drawn the attention of researchers, policymakers, and peacekeepers. During the last years, novel digital data streams have drastically changed the research in this field. The current study exploits information extracted from a new digital database called Global Data on Events, Location, and Tone (GDELT) to capture peacefulness through the Global Peace Index (GPI). Applying predictive machine learning models, we demonstrate that news media attention from GDELT can be used as a proxy for measuring GPI at a monthly level. Additionally, we use explainable AI techniques to obtain the most important variables that drive the predictions. This analysis highlights each country's profile and provides explanations for the predictions, and particularly for the errors and the events that drive these errors. We believe that digital data exploited by researchers, policymakers, and peacekeepers, with data science tools as powerful as machine learning, could contribute to maximizing the societal benefits and minimizing the risks to peacefulness.

The global challenges regarding people's well-being that today's society faces are manifold. In a major attempt to face them, the Sustainable Development Goals (SDGs) were introduced by the United Nations Conference on Sustainable Development in Rio de Janeiro in 2012. The objective was to set universal and measurable dimensions to ensure people's high levels of well-being. Considering that well-being is a vague and multi-dimensional concept, it cannot be captured as a whole but through a set of health, socio-economic, safety, environmental, and political dimensions [1, 2] . The United Nations Development Programme (UNDP) embodies these dimensions into 17 Sustainable Development Goals (SDGs) [3, 4, 5] , such as "Good Health and Well-Being", "No Poverty", and "Reduced inequalities".

A crucial development is the inclusion of SDG 16, i.e., "Peace, Justice, and Strong Institutions", considering that armed violence is on the rise and it is challenging to prevent it [6] . Since 2011, at least 100,000 people have been killed in deadly conflicts, with the majority of them in Afghanistan, Iraq, and Syria. Although the rate of major wars declined over the past decades, the number of civil conflicts and terrorist attacks increased in the last few years, even in developed countries [7] .

Governments and the international community often have little warning of abrupt changes in peace and safety, while the war expenses for the war-torn countries arXiv:2106.00306v4 [cs.AI] 26 Oct 2021 weaken their economies. For example, since 1996, the Democratic Republic of Congo has spent on war almost one-third of its gross domestic product [8] . It is hence not surprising that the Expert Panel on Technology and Innovation in UN Peacekeeping recognizes the importance of harnessing the data revolution for the benefit of the international community and peace [9] . In line with the aforementioned, scientific evidence confirms the critical role of Artificial Intelligence (AI) in accomplishing the SDGs, including the objective for peacefulness [10] .

Unfortunately, the use of big data and AI to foster research in the peace and safety field is still at the very beginning [11, 7] . The world's leading measurement of national peacefulness, i.e., the Global Peace Index (GPI), produced by the Institute for Economics and Peace [12] , is captured by institutional surveys and governmental data, which are usually expensive and time-consuming [2] . Besides, since it is an annual index, it fails to give an early warning of socio-economic, political, or military events and neglects short-term fluctuations of peacefulness.

The objective of this study is to demonstrate that a powerful peacefulness index such as GPI [13] can be estimated with the use of AI at a higher time-frequency as compared to the real GPI score. To tackle this task, we exploit machine learning and information extracted from a digital data source called Global Data on Events, Location, and Tone (GDELT) [14] . We use news media attention from GDELT as a proxy for estimating GPI to complement the knowledge obtained from the traditional data sources and overcome their limitations. News media records generally describe a variety of subject domains (e.g., economic events, political events) and represent a wide range of targets (e.g., opposing politicians) [15] .

Considering that GDELT is a free access database updated daily, it can contribute to the monthly estimation of GPI as compared to the real annual GPI. Besides, GPI through GDELT is produced at a low cost and time-efficient way, compared to the traditional methodology. In this paper, we expand our previous study [16] . In particular, in the current study we produce GPI estimates from 1-month-ahead up to 6-months-ahead. Additionally, we conduct the analysis using additional machine learning models, for a total of six. Moreover, we apply explainability techniques to analyze in-depth the behavior of high performance models, and we also provide explanations on medium and low performance models. Last, we include in our analysis 12 more recent data points, i.e., from April 2019 to March 2020.

Our results demonstrate that GDELT variables are a good proxy for measuring GPI at a monthly level. In particular, our models exploit the information from GDELT to provide GPI predictions from 1-month-ahead up to 6-months-ahead. We perform our analysis for all countries around the world. There are country models that show high performance, such as the United Kingdom and Yemen, countries that show medium performance, such as Chile and Libya, and others that show low performance, such as Estonia and Cyprus. The reasons for the low model's performance could be various, such as the under-representation or over-representation of some countries through the GDELT news [17] .

To understand better the drivers of the predictions, we use explainability techniques [18, 19, 20] to identify the relationships between the GDELT variables and peacefulness and explain the models' behavior.

This analysis allows us to unveil each country's profile. For example, the most important variables for the United States, such as "Express intent to settle dispute", and "Employ aerial weapons", indicate a powerful country in military, socioeconomic, and political terms. In contrast, the most important variables for Iceland, such as "Praise or endorse" and "Accede to requests or demands for political reform", denote a peaceful country.

Frequent estimation updates of the GPI score through the GDELT database could flag conflict or war spots months in advance by revealing considerable month-tomonth peacefulness fluctuations and significant events that would be otherwise neglected. As a consequence, our research could be beneficial to peacekeeping organizations, such as the United Nations and its agencies, to organize early interventions. In addition, it could be valuable to policymakers to apply adequate policies to prevent detrimental societal effects and contribute effectively to lasting peace.

Although peace is a central concept for the global community and peacekeepers strive for its maintenance, it has not a clear definition up to date. Thus, researchers are not easily guided in measuring peace and creating relevant indicators. Peacefulness is traditionally measured with official data, such as governmental data, surveys, and socio-economic data [21, 22, 23] . Similarly, the Global Peace Index (GPI), the world's leading measurement of national peacefulness [12] , is captured by official data, such as surveys and governmental data. Although they have been proven to be valid, data collected through surveys bring biases and limitations: they are costly, and time-consuming [24, 2] , and include errors brought from social desirability biases due to participants' inaccurate answers [25, 16] . In addition, governmental and socio-economic data are hard to collect, not yearly updated and could have a lag of up to two or three years. Thus, they might not be correctly representing the corresponding year of the peacefulness measurement.

In assessing peacefulness, the GPI investigates the extent to which countries are involved in ongoing domestic and international conflicts and seeks to evaluate the level of harmony or discord within a nation. GPI is constructed from 23 indicators that broadly assess what might be described as safety and security in society [12] (detailed list of indicators in Appendix A.1 Indicators of GPI). Considering the biases introduced by the official data and the composite index of 23 indicators, it is difficult to have frequent peacefulness updates.

As conflicts and violence become increasingly complex, policymakers and peacekeepers search for novel approaches to tackle the growing challenge. Big data and AI are potential tools to measure peace-related indicators, produce early warnings of peacefulness changes, and complement estimations from official data.

Social media, such as Twitter, are primarily used to assess public safety, external conflicts, foreign policy, and migration phenomena, as they render individuals' online activities accessible for analysis. Given this enormous potential, researchers use social media data to predict crime rates or detect the fear of crime [26, 27, 28, 29] and to track civil unrest and violent crimes [30, 31, 32, 33, 34] . Similarly, Twitter data are used to study early detection of the global terrorist activity [35] , military conflicts in Gaza Strip [36, 37] , and foreign policy discussions between Israel and Iran [38] . In addition, social media data are useful in estimating turning points in migration trends [39] , and stocks of migrants [40, 41] . Finally, researchers have created a French corpus of tweets annotated for event detection, such as conflict, war and peace, crime, and justice [42] .

Many researchers use mobility data, such as mobile phone records and GPS traces [43, 44, 45, 46, 47] in combination with traditional data, to predict and prevent crime [48, 49, 50, 51, 52] , compare how the different factors correlate with crime in various cities [53] , and estimate deprivation and well-being [54, 55, 56, 57] . Moreover, researchers combine social media data with phone records to infer migration events [58, 59, 60, 61, 62, 63] and use GPS data, combined with subjective and objective data, to study perceived safety [64] .

Additionally, the volume and momentum of web search queries, such as Google Trends, provide useful indicators of periods of civil unrest over several countries [65, 66] , and contribute to capturing a decline in domestic violence calls per capita when immigration enforcement awareness increases [67] .

Crowdsourced data are used to map violence against women [68] , for policeinvolved killings [69] , for analyzing the international crisis between India and Pakistan for the dispute over Kashmir [70] , for preventing crime events and emergencies [71] , and for capturing the fear of crime [72] .

Recently, researchers have started exploring remote sensing data, such as satellite images, to map refugee settlements [73, 74] and to study conflicts, in particular in zones where field observations are sparse or non-existent [74] , ethnic violence [75] , and humanitarian crises [76] .

Finally, researchers combine conflict-related news databases such as ACLED [77] with other official data to capture peace indicators and measure conflict risks [78, 79] , to demonstrate the relatively short-term decline in conflict events during the COVID-19 pandemic [80] , and to create political violence early-warning systems [81] . They also combine the Arabia Inform [82] with official data to extract variables for generating military event forecasts [83] .

The Global Data on Events, Location, and Tone database (GDELT) is a major news data source that describes the worldwide socio-economic and political situation through the eyes of the news media, making it ideal for measuring well-being and peacefulness [14] . GDELT is mainly used to explore social unrest, protests, civil wars and coups, crime, migration, and refugee patterns. Many researchers explain and predict social unrest events in several geographic areas around the world, such as in Egypt [84] , Southeast Asia [85] , the United States [86] , and Saudi Arabia [87] . Other researchers recognize social unrest patterns in India, Pakistan, and Bangladesh [88] , and reveal the causes and evolution of future social unrest events in Thailand [89] . GDELT is a valuable source of data for the detection of protest events [90] and violence-related social issues [91] , as well as for detecting and forecasting domestic political crises [92] . It is also used for the exploration of severe internal and external conflicts, such as the Sri Lankan civil war, the 2006 Fijian coup [93] , and the Afghanistan violence events [94] . Additionally, it helps in understanding the direct cooperative and conflictual interactions among China, Russia, and the US since the end of the Cold War [95] . Also, GDELT is used to study activities of political nature influencing or reflecting societal-scale behavior and beliefs [96] . Lastly, news data from GDELT are combined with other data sources, such as socio-economic indicators [97] , refugee data [98] , and housing market data [99] , Google Trends, and official migration data [100] , to analyze and produce short and medium-term forecasts of migration patterns.

Our paper is different from previous work in two important aspects. First, our models harness GDELT with the machine learning techniques to estimate a composite peace index as GPI, which covers domestic and international conflicts, safety and security, migration phenomena, etc. The wide variety of GDELT event categories can cover most GPI indicators. Second, we perform our analysis at a global scale to study peace over all countries in the world.

This section describes the data used in our study, the models used to produce the GPI estimates, the training strategy adopted, and the SHAP methodology to interpret the models' predictions. We provide the data and the code of our study for reproducibility in https://github.com/VickyVouk/GDELT_GPI_SHAP_project.

GPI [13] ranks 163 independent states and territories according to their level of peacefulness, and it was created by the Institute for Economics & Peace (IEP). GPI data are available from 2008 until 2020 at a yearly level (GPI report 2020 [12] ). The score for each country is continuous, normalized on a scale of 1 to 5, where the higher the score, the less peaceful a country is. For example, in 2019, Iceland was the most peaceful country with GPI = 1.072, whereas Somalia was the least peaceful country with GPI = 3.574. The index is constructed from 23 indicators related to Ongoing Domestic and International Conflict, Societal Safety and Security, and Militarisation domains [12] (detailed list of indicators in Appendix A.1 Indicators of GPI). These indicators are weighted and combined into one overall score. The weights for the GPI indicators can be retrieved from the GPI reports [12] . For the GPI construction, data are derived from official sources, such as governmental data, institutional surveys, and military data.

For this study, we increase the frequency of GPI from yearly to monthly data using linear interpolation. Every yearly GPI value is assigned to March of the corresponding year since most of the annual GPI indicators are measured until this month. The linear upsampling is an assumption (the simplest) since the monthly data generated do not correspond to the real monthly GPI. After upsampling, from 13 yearly values we obtain 145 months in total (March 2008 -March 2020).

We increase the frequency from yearly to monthly because a month might contain some important events distorted from the yearly index. Indeed, the yearly GPI data might not indicate abrupt peacefulness changes at a higher frequency because they are smoothed out on the yearly GPI value. Therefore, monthly GPI estimations could reveal events neglected from the yearly GPI. At the same time, we do not increase the frequency at a weekly or daily level to keep a trade-off between the noisy GDELT information and the official GPI. To address this time gap we have to linearly interpolate the yearly GPI value. We strongly believe that the monthly interpolation is the best choice, because interpolating the GPI at daily or weekly basis would make it impossible to fit with the daily/weekly fluctuations of the GDELT variables. Besides, daily or weekly estimates could indicate fluctuations that would not significantly change a country's stability for weeks or even months after taking place. Figure 2 show the monthly Global Peace Index for Belgium and Yemen, respectively, from 2008 to 2020. In Figure 1 , we annotate the terrorist attack that took place in Belgium in March 2016, which brought a deterioration in the peacefulness level of the country, increasing GPI from 1.47 to 1.536. However, this is depicted in the real yearly GPI only a year later, in 2017. On the contrary, when we introduce the monthly GPI score, we expect our model to depict the increase more timely, e.g., one month after the attack.

In Figure 2 , we annotate the start of the Civil War in Yemen in September 2014, which brings a deterioration in the country's peacefulness level, increasing GPI from 2.735 to 2.84. Since the real GPI is only published once a year, it seems that the increase starts from March 2014, i.e., six months before the actual event. With the monthly GPI score, we expect our model to capture this change in the GPI one month after the start of the Civil War.

As a consequence, a monthly system that adequately corresponds to the peacefulness fluctuations has the potential to quickly inform the placement of peacekeepers and the deployment of non-governmental organization (NGO) resources, making it potentially easier to save lives and prevent devastation [81] .

GDELT [14] is a Google-supported and publicly available digital news database related to socio-political events. It is a collection of international English-language news sources, such as the Associated Press and The New York Times. GDELT data are based on news reports coded with the Tabari system [101] , which extracts the events from the media and assigns the corresponding code to each event. Events are coded based on an expanded version of the dyadic CAMEO format, a conflict, and mediation event taxonomy [102] . GDELT compiles a list of 200 categories of events, from riots and protests to peace appeals and diplomatic exchanges, from public statements and consulting to fights and mass violence [102] (detailed list of topics in Appendix A.2 Topics of GDELT). Examples of identified events are "Express intent to cooperate", "Conduct strike or boycott", "Use conventional military force", and "Reduce or break diplomatic relations". The database offers various information for each event, such as the date, location, and the URL of the news article. We use GDELT 1.0 database, which is updated daily and contains historical data since 1979 [103] .

For GPI prediction, we derive several variables from GDELT, corresponding to the total number of events (No. events) of each GDELT category at the country and monthly level. Some event categories may not be present in the news of a country. On average, the number of variables per country is 87, varying from 25 to 141. We use the BigQuery [104] data manipulation language in the Google Cloud Platform to extract the GDELT variables (Listing 1). of January. The plot depicts a noticeable rise in these events on the 6th of January 2021, the day of the "Storming of the United States Capitol", and a peak of news related to the topic on the 7th of January 2021, showing how GDELT news depicts the worldwide sociopolitical and conflictual reality with a small lag, i.e., a day. Table 2 presents the 10 GDELT variables with the largest share of No. events for the United States from March 2008 to March 2020. For example, the GDELT variable "Make statement" has the largest share, followed by "Make a visit" and "Host a visit" variables.

The wide variety of GDELT event categories can cover most of the indicators that compose GPI. For example, the GPI indicator "Number of Internal Security Officers and Police per 100,000 People" can be covered by the GDELT variable "Exhibit military or police power". The GPI indicators "Ease of Access to Small Arms and Light Weapons" and "Volume of Transfers of Major Conventional Weapons, as recipient (imports) per 100,000 people" can be covered by "Fight with small arms and light weapons" and "Use conventional military force" or "Conduct non-military bombing" GDELT variables, respectively. Similarly, the "Nuclear and Heavy Weapons Capabilities" GPI indicator can be covered by the "Employ aerial weapons" GDELT variable. Also, the GPI indicator "Likelihood of violent demonstrations" can be covered by "Engage in political dissent", "Protest violently, riot" or "Demonstrate or rally" GDELT variables. Last, the "Financial Contribution to UN Peacekeeping Missions" GPI indicator can be covered by the GDELT variables "Appeal for aid" or "Provide humanitarian aid".

Models handling time series are used to predict future values of indices by extracting relevant information from historical data. Traditional time series models are based on various mathematical approaches, such as autoregression. Autoregressive models specify that the output variable depends linearly on its previous values and a stochastic term. Considering that our data are upsampled linearly, it is not feasible to apply autoregressive models because of the linear relationship between the dependent variable (GPI) and its past values. Besides, our objective is to measure GPI and understand and explain how different peacefulness topics captured by GDELT contribute to the GPI measurement. We use Linear Regression, Elastic Net, Decision Tree, Support Vector Regression (SVR), Random Forest, and Extreme Gradient Boosting (XGBoost) to investigate the relationship between the GPI score and the GDELT variables at a country level. Specifically, we aim to develop GPI estimates 1-month-ahead to 6-months-ahead of the latest ground-truth GPI value and find the model with the highest performance overall. Firstly, we introduce simple models, i.e., Linear Regression, Elastic Net, and Decision Tree, which are easy to implement and interpret. Next, we apply SVR, Random Forest, and XGBoost models, which tend to achieve higher predictive performance but are harder to interpret, and they need additional methodologies for the interpretation of the results (e.g., SHAP [19, 20] ). Our main goal is to find the model with the highest predictive performance. Appendix A.3 Machine learning models briefly describes the characteristics of the selected models.

Traditionally, before modeling, researchers start by dividing the data into training data and test data. Training data are used to estimate the models' parameters, and the test data are used to calculate the predictive performance of the models.

Considering that the socio-political situation around the world is not stationary and more recent events are relevant for the prediction, we train our models using the rolling methodology [105] , widely used in business and finance [106] . The rolling methodology updates the training set by an add/drop process while keeping its length stable and retrains the model before each k-months-ahead prediction.

The rolling training's set period for all models is half of our data, i.e., 72 months. First, we train the model to predict from 1-month-ahead to 6-months-ahead GPI values. After the first training, one month is dropped from the beginning of the training set, and another month is added to the end of the training set. Then, we perform the training again to predict the next 1-month-ahead to 6-months-ahead GPI values. We continue this training process for all subsequent months until we predict the last monthly value. This process ensures that the training set covers the same amount of time and is continuously updated with the most recent information.

In particular, we use data from At each step, we obtain up to 1-month-ahead to 6-months-ahead predicted GPI values. Specifically, by the end of each rolling training described above, we have k-months-ahead GPI predictions, where k = 1, 2, . . . , 6 months. By the end of the training process, we have 72 1-month-ahead GPI predictions [1] , 71 2-months-ahead GPI predictions, and so on. We evaluate the accuracy of the predictions for each kmonths ahead time horizon with respect to the corresponding test set that contains the real GPI values. Long-term predictions, such as 6-months-ahead peacefulness estimations, are an important tool for policymakers since it is a "policy-relevant lead time" consistent with other forecasting work; that is, a period sufficiently long that there could be a policy response [107] For each of the models mentioned in Section 3.4, we estimate the best hyperparameters in each training phase through 10-fold cross-validation. Appendix A.4 Hyperparameters includes all the details for the hyperparameters we tune for each model, with the exception of Linear regression, which is a closed-math function with no hyperparameters.

Understanding a model's prediction is important for trust, actionability, accountability, debugging, and many other reasons. To understand predictions from treebased machine learning models, like Random Forest or XGBoost, importance values are typically attributed to each variable. Yet traditional variable attribution for trees is inconsistent, meaning it can lower a variable's assigned importance when the true impact of that variable increases.

Therefore, for the interpretation of the importance of the model variables and for understanding the drivers of every single GPI estimation, we compute the SHAP (SHapley Additive exPlanation) values [19, 20] . SHAP is based on game theory [108] , and local explanations [109] , and it offers a means to estimate the contribution of each variable. By focusing specifically on tree-based models, the authors developed an algorithm that computes local explanations based on exact Shapley values in polynomial time. SHAP provides local explanations with theoretical guarantees of local accuracy and consistency. Additionally, the ability to efficiently compute local explanations using Shapley values over a dataset enables the development of a range of tools to interpret and understand a model's global behavior. Specifically, by combining many local explanations, a global structure can be represented while retaining local faithfulness [110] to the original model, which generates detailed and accurate representations of the model's behavior.

Last but not least, SHAP can be applied to interpret the results of the machine learning models since it identifies the relationship between the independent variables, either internal or external and the dependent variable. The relationship between the independent variables and the dependent variable does not need to be causal, as SHAP can fail to answer accurately causal questions. In this study, SHAP is a tool to identify which external GDELT variables drive the GPI estimations. This can be useful for explaining the models' behavior and diagnosing errors in the predictions.

The predictive models introduced in Section 3.4 are constructed for every country using the GPI values as the dependent variable and the GDELT variables as the independent variables. We use the Pearson Correlation coefficient, Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) [111, 112, 113] to evaluate the performance of the constructed models (Appendix A.5 Performance Indicators). The analysis is conducted for all 163 countries that have a GPI score, and we generate 1-month-ahead predictions up to 6-months-ahead. Figure 4 presents Pearson Correlation, and MAPE between the real and the 1-, 3-, and 6-months-ahead predicted GPI values at a country level for all predictive models. [2] Figure 1 in Appendix A.7 RMSE results presents the RMSE performance indicator as well. We find that SVR, Random Forest, and XGBoost have similar performance and outperform Decision Tree and Elastic Net. XGBoost shows the highest performance overall, especially for the 6-months-ahead predictions.

For the estimation of the GPI, the models use the historical data of the No. events for each GDELT category related to the military, social, and political events of the corresponding country. For each additional future estimation, we move further away from the last training data while the country's reality changes, and we, therefore, expect a lower model performance. Indeed, comparing Figures 4a-b, with Figures 4c-d, and with Figures 4e-f, we show that the performance of the models decreases for every additional month-ahead prediction. For example, we observe a 13,43% increase of the median MAPE for the 3-months-ahead predictions, and a 25.61% increase of the median MAPE for the 6-months-ahead predictions, as compared to the 1-month-ahead predictions.

Since XGBoost demonstrates the highest performance overall, we focus on it when presenting the subsequent results. Figure 5 compares the real and estimated GPI values, showing a strong linear relationship between the two. In particular, Figure  5a presents the scatter plot of the real and predicted GPI values of all the countries, while Figures 5b-d focus on the corresponding values of Iceland, Saudi Arabia, and Pakistan. These countries indicate that the models show high performance for low, medium, or high GPI values.

We then divide the countries into three categories based on their performance. We consider high performance models those with Pearson Correlation >= 0.7 and MAPE < 5 [114, 115] , low performance models those with Pearson Correlation <= 0.2 [115] , and the rest of the models are considered medium performance models. Figure 6 presents the countries with high, medium, and low performance for the 1-month-ahead predictions. For example, Uganda (UGA), Pakistan (PAK), Turkey (TUR), the United Kingdom (GBR), and Sweden (SWE) show high-performance, with a strong Pearson Correlation, higher than 0.8. We also observe medium performance countries, such as Libya (LYB) with high Pearson Correlation but high MAPE, and India (IND) with low Pearson Correlation but low MAPE. Finally, there are countries, such as Cyprus (CYP), Estonia (EST), Moldova (MDA), Mongolia (MNG), and Romania (ROU), which show a negative Pearson Correlation.

Our study aims to demonstrate that GDELT is a valuable digital news data source for estimating the GPI at a monthly level. For this reason, we present the performance indicators and analyze in-depth the models that confirm this hypothesis, i.e., the country models with high performance. Since conflicts and violence are present in every country, despite it being in war or not, we present countries with different military, socio-economic, and political histories and current situations to cover a variety of scenarios. [2] Since the Linear model has very low performance (Appendix A.6 Linear models results), we present the results for all models but the Linear regression. Countries' performance Figure 6 High, medium, and low performance country models. High, medium, and low performance country models for the 1-month-ahead predictions. There are country models with high performance, such as the United Kingdom (GBR), models with medium performance, such as Libya (LBY), and models with low performance, such as Mongolia (MNG). In particular, we present three of the most powerful countries (United States, United Kingdom, and Saudi Arabia) since they shape global economic patterns and influence policymaking [116] . Additionally, we use various sources, such as the official GPI ranking [12] , to choose three of the most peaceful countries (Portugal, Iceland, and New Zealand) and three of the most war-torn countries (DR Congo, Pakistan, and Yemen). Table 3 reports the models' performance for the 1-month-ahead up to 6-monthsahead GPI estimates for nine countries. Overall, 1-month-ahead GPI estimates are more accurate than the other estimates, especially with respect to the 6-monthsahead estimates. There are countries, such as Portugal, for which the performance remains stable overall 6 months predictions and countries like Yemen for which the performance falls for each additional in future prediction.

An explanation for these different behaviors could be, for example, in the case of Portugal, that the military, socio-economic, and political situation remains stable over time. Therefore the most important variables contribute to a more accurate prediction even further in the future. On the contrary, in war-torn countries like Yemen, the country's situation changes constantly, and the variables are not much relevant anymore. For this reason, for Yemen, we also conduct a training with the 36 most recent monthly values (Yemen * in Table 3 ), as opposed to the 72 values used for the rest of the countries. The performance improves considerably: the mean Pearson Correlation increases from 0.737 to 0.892, the mean MAPE drops from 6.832 to 4.287, and the mean RMSE decreases from 0.268 to 0.180. However, we do not observe the same improvement in the performance when decreasing the training set for the other war-torn countries, such as DR Congo.

Furthermore, we select four countries to study in-depth their peacefulness and the factors that drive it. We aim to capture various scenarios on the models' accuracy and the models' explanation. Particularly, we choose Saudi Arabia and Yemen to understand better and interpret the results and errors of the predictive models based on historical data. Additionally, we choose the United Kingdom and the United States to estimate their future GPI values to gain initial insights into the country's peace before the official GPI score becomes available.

Based on the G20 list of countries [116] , Saudi Arabia is considered one of the most powerful countries in the world in terms of military and international alliances, political and economic influence, and leadership. Figure 7 presents the percentage error of Saudi Arabia for the 6-months-ahead GPI estimations. We observe that the performance is high, and the percentage error varies from 4.05% to 11.38%. A positive percentage error indicates that the estimated GPI is higher than the real GPI, and therefore the model overestimates the monthly value. On the contrary, a negative percentage error illustrates that the estimated GPI is lower than the real GPI, and thus the model underestimates the monthly value. We obtain the largest negative percentage error for the GPI estimation for October 2018. The analysis of the variable importance through SHAP reveals the country's profile and helps us understand the larger errors of the model. Figure 8 shows the most important variables for the estimation of the GPI score. Each importance is calculated by combining many local explanations, and the model is trained between May 2012 to April 2018. It is evident the profile of a powerful country in military, socioeconomic and political terms since the important variables are related to embargo, boycott, or sanctions, diplomatic relations, mediations, economic cooperations, and appeals for aid, fights with military arms, military engagement, assaults, and endorsements. In Figure 8 , we also observe that "Fight with artillery and tanks" and "Appeal for aid" are among the most important variables for Saudi Arabia. As discussed in Section 3.3, these GDELT variables could cover the "Volume of Transfers of Major Conventional Weapons, as recipient (imports) per 100,000 people" and the "Financial Contribution to UN Peacekeeping Missions" GPI indicators, respectively. Individual SHAP Value plot for Saudi Arabia. It presents the model output value, i.e., the estimation of the GPI for October 2018, and the base value, which is the value that would be predicted if the variables for the current output were unavailable. The plot also displays the most important variables that the model uses for the estimation, such as "Cooperate economically" and "Appeal for aid". The red arrows are the variables that push the GPI estimation higher, and the blue ones push the estimation lower.

To explain better why the model has the worst performance in October 2018, we perform SHAP analysis at a local level to highlight the most important variables that the model uses for this specific estimation. Figure 9 displays the most important variables that Saudi Arabia's model uses for the GPI estimation of October 2018. The model output value is 2.12, and it corresponds to the 6-months-ahead prediction. The base value is smaller than the estimated GPI, and it is the value that would be predicted if the variables for the current output were unavailable. The red arrows are the variables that push the GPI estimation higher (to the right), and those blue push the estimation lower (to the left). Considering that this month the model underestimates the GPI value (Figure 7) , we focus on the variables that push the GPI estimation lower.

The most important variables for the prediction of October 2018 are "Cooperate economically" and "Appeal for aid", although they are 10th and 8th respectively in the model's overall ranking of importance ( Figure 8 ). In October 2018, the journalist Jamal Khashoggi was assassinated at the Saudi consulate in Istanbul, Turkey. This event provoked a series of news on the topics mentioned above. Figure 10 presents Saudi Arabia's model predictions with respect to the real GPI score and the variable "Cooperate economically". This variable shows an abrupt increase in October 2018 and pushes GPI prediction lower, showing a more peaceful month. Similarly, Figure  11 shows an abrupt increase of the variable "Appeal for aid" in October 2018 and drives the prediction lower, showing a more peaceful month. Considering that the assassination of the journalist is a negative event, one would expect a less peaceful month. However, looking at the news, the articles discuss possible spills into oil markets and economic cooperation between Saudi Arabia and other countries, such as the United States, to overcome a dispute over Khashoggi. In addition, the news is also concentrated on the investigation of the Khashoggi case, such as Amnesty International asking for a United Nations inquiry. Therefore, considering that the variables "Cooperate economically", and "Appeal for aid" have a negative relationship with GPI ( Figures 10 and 11 , respectively) the model underestimates the monthly value. Consequently, through the eyes of the world news, the presentation of peace is not always at the level we would expect.

Based on the official GPI ranking [13] , Yemen is one of the most war-torn countries in the world. It is hence interesting to understand in-depth the model's behavior for such a country's profile.

The situation in Yemen constantly changes due to the Civilian War that broke out in September 2014. The change of peacefulness in the country is depicted in the real GPI value, which abruptly increases in 2015 [13] . Therefore, six years of training data related to the pre-war period would not be helpful for the model to predict peace after the start of the war, since the No. events related to the military, economic, and political situation of the country changes. Thus, we decrease the training set to three years. We use the rolling methodology to throw the pre-war historical data more quickly and learn from the most recent and relevant data related to the post-war period. Therefore, for Yemen, we use data from March 2015 to March 2020 to understand the model's behavior during the Civil War period. Figure 12 presents the percentage error for 1-month-ahead GPI estimations from March 2018 to March 2020 with training period of 36 months. The model has a high performance, with a low percentage error that varies from 0.07% to 3.18% with a median value of 1.66%. We obtain the largest negative percentage error (underestimation of GPI) for June 2018. Figure 11 Saudi Arabia predictions, with respect to the real GPI score, and the variable "Appeal for aid". Saudi Arabia predictions (orange curve), with respect to the real GPI score (blue curve), and the variable "Appeal for aid" (green curve). This variable pushes the model to underestimate the monthly value in October 2018 (vertical dashed black line). The reason for this error is the assassination of Jamal Khashoggi in this specific month. Figure 13 displays the most important variables for the estimation of the GPI score. Each variable importance is calculated through SHAP, with a training period from June 2015 to May 2018. Overall, the most important variables reveal a war-torn country profile since they are related to military aid, territory occupation, bombing, negotiations, discussions, yields, visits, international involvements, and consults. In Figure 13 , "Conduct non-military bombing" is among the most impor- Similarly to Saudi Arabia, we analyze at a local level to understand why the model produces the highest percentage error in June 2018. Figure 14 displays the variables that drive the prediction for June 2018. The model output value is 3.23, and it corresponds to the 1-month-ahead prediction. The red arrows represent the variables that push the GPI estimation higher, i.e., "Conduct non-military bombing". The blue arrows represent the variables that push the GPI estimation lower, i.e., "Discuss by telephone" and "Provide military aid". Considering that in June 2018 the model underestimates the monthly value (Figure 12 ), we focus on the latter variables.

In June 2018, the number of events on "Discuss by telephone" is 55, higher than the median value (14) of the previous three years. Similarly, the number of events on Figure 14 Individual SHAP Value plot for Yemen. It presents the model output value, i.e., the estimation of the GPI for June 2018, and the base value, which is the value that would be predicted if the variables for the current output were unavailable. The plot also displays the most important variables that the model uses for the estimation, such as "Discuss by telephone" and "Provide military aid". The red arrows are the variables that push the GPI estimation higher, and the blue ones push the estimation lower.

"Provide military aid" is 121, higher than the median value (72) of the previous three years. In June 2018, the United Arab Emirates Armed Forces (UAE) announced a pause to the military operations on the 23rd of June 2018 because of UN-brokered talks. This is depicted in the increase of the news on "Discuss by telephone". In addition, the United States turned down UAE requests for aid in the offensive against rebel-held Yemeni port, thanks to the UN efforts. This denial has been discussed a lot on the news, which explains the increase of the news on "Provide military aid". Figures 15 and 16 show that the variables' higher monthly value and their mostly negative relationship with the GPI drive the model to underestimate the monthly value in June 2018. In other words, the model predicts a lower monthly GPI value, and consequently, June 2018 results are more peaceful than it was. On the one hand, the model makes a wrong prediction, resulting in the largest percentage error. On the other hand, the model might give an interesting signal. Although Yemen is involved in constant conflicts, June 2018 results more peaceful since the UNbrokered ceasefire agreement managed the withdrawal of the warring parties from Al Hudaydah in Yemen. Although we notice additional abrupt increases of the two variables' values, e.g., in November 2020 (Figures 15 and 16) , the model does not reproduce an abrupt decrease of the GPI. Consequently, the model shows its power to learn from its mistakes.

The United States is considered the most powerful country in the world [116] . On that account, it is interesting to study its peacefulness after March 2020. The United States model shows a high performance (Table 3 ) and can provide policymakers and peacekeepers with useful initial insights into the country's peacefulness before the real GPI score becomes available.

To start with, Figure 17 shows the most important variables for the training period between April 2014 and March 2020. Overall, these variables indicate a country profile of a strong player in the military, socio-economic, and political foreground. The most important variable is related to aerial weapons, and it mainly concerns events that take place overseas. Additionally, the rest of the variables are mostly related to fights with small arms, military de-escalations, embargoes, threats, protests, cooperations, and relations. We also observe that in Figure 17 "Employ aerial weapons", "Fight with small arms and light weapons", and "Protest violently, riot" are among the most important variables for the United States. As discussed in Section 3.3 these GDELT variables could correspond to GPI indicators "Nuclear and Heavy

Weapons Capabilities", "Ease of Access to Small Arms and Light Weapons", and "Likelihood of violent demonstrations", respectively Last, we compare the variables in Figure 17 with the ten variables that have the largest share of overall news ( Table  2 in Section 3.2). None of the variables that have the largest share overall news is among the most important variables for the United States. This confirms that the model is not biased to learn only from the variables with the largest share. Still it selects the variables that adequately serve for making the peacefulness prediction. Individual SHAP Value plot for the United States. It presents the model output value, i.e., the estimation of the GPI for June 2020, and the base value, which is the value that would be predicted if the variables for the current output were unavailable. The plot also displays the most important variables that the model uses for the estimation, such as "Protest violently, riot". The red arrows are the variables that push the GPI estimation higher, and the blue ones push the estimation lower.

We now focus on the murder of George Floyd, which took place on the 25th of May, 2020. Several protests followed this event at the end of May, and for the whole of June 2020, provoking news concentrated on the topic. Figure 18 shows the local SHAP explanation for the prediction of June 2020. The estimated GPI (3months-ahead prediction) is 2.30, indicating that the GPI value will remain stable in June 2020 compared with the last ground-truth value on March 2020 (2.31) and the median GPI value of the previous three years (2.34). Particularly, "Protest violently, riot" is the variable that pushes the GPI estimation the lowest. Indeed, in June 2020, the news was concentrated on a series of protests, followed by the murder of George Floyd against police brutality and racism. This variable pushes for a more peaceful month since it has a negative relationship with the GPI. It seems that protesting in the United States contributes to the improvement of various socio-political situations, and to peacekeeping.

The rest of the variables displayed in Figure 18 have lower values than their corresponding median values of the training period, confirming that the news of the month was concentrated on the United States racial unrest and the Black Lives Matter movement. We point out that, in this particular prediction, the most important variable for the overall training period, i.e., "Employ aerial weapons" (Figure  17 ), has a less important contribution to the model output as compared with the variable "Protest violently, riot". This proves the power of SHAP in identifying the role of each variable for every single prediction.

Similar to the United States and Saudi Arabia, based on the list of G20 [116] , the United Kingdom is considered one of the most powerful countries in the world. It is hence interesting for the European social policymaking to anticipate the level of peacefulness after the last ground-truth data, i.e., after March 2020.

Here we focus on the GPI prediction for July 2020, where various restrictions related to Covid-19 and the civilians' protection were announced. Figure 19 presents the global variable importance plot for a training period from April 2014 to March 2020. The figure highlights a country where various socio-political events occur since the important variables are mostly related to strikes or boycotts, appeals, negotiations, yields, relationships, and sanctions. "Engage in political dissent" is among the most important variables for the United Kingdom ( Figure 19 ). As discussed in Section 3.3 this variable could correspond to the GPI indicator "Likelihood of violent demonstrations".

To study peacefulness in July 2020, we need to deep into the analysis at a local level. Figure 20 presents the individual SHAP value plot for the United Kingdom: the GPI value is 1.8, and it is the model output value for the 4-months-ahead prediction. The GPI value in July 2020 is slightly higher than the last ground-truth value (1.77), and it is stable compared to the median GPI value of the previous three years (1.8) .

The most important variables that push the GPI value higher are "Express intent to meet or negotiate" and "Conduct strike or boycott". The former variable's value is 9447, which is lower than the median value of the previous six years (12, 026) , and the latter variable's value is 120, which is slightly lower than the median value of the previous six years (126). These results show that lower values of these event categories decrease internal peace in the United Kingdom. The decrease in the events It presents the model output value, i.e., the estimation of the GPI for July 2020, and the base value, which is the value that would be predicted if the variables for the current output were unavailable. The plot also displays the most important variables that the model uses for the estimation, such as "Express intent to meet or negotiate" and "Conduct strike or boycott". The red arrows are the variables that push the GPI estimation higher, and the blue ones push the estimation lower.

of these categories could be due to COVID-19 restrictions, or the news concentrated on the COVID-19 pandemic. Additionally, "Impose administrative sanctions", and "Employ aerial weapons" are the variables that drive the GPI prediction lower. The former's value in July 2020 is 3451, higher than the variable's median value of the previous six years (2590). The news related to "Impose administrative sanctions" regard discussions on restrictions due to the pandemic, despite the easing of the lockdown. Furthermore, many articles discuss the ban to Huawei from the 5G network due to security risks and the ban on junk food advertising and promotion in-store. Consequently, the model has learned that although "Impose administrative sanctions" events restrict people, the deeper aim of the restrictions is to protect them and promote their well-being. Last, the variable "Employ aerial weapons" value is much lower than the median value of the previous six years (167), pushing the GPI value lower. This variable is referred to overseas events that the United Kingdom is involved. The decrease in its value might demonstrate that the news does not discuss it due to previous de-escalations or because the news is concentrated on other topics.

There are country models which demonstrate medium performance (Section 4.1 and Figure 6 ), such as Colombia and Chile (Pearson Correlation = 0.63 and MAPE = 0.96, and Pearson Correlation = 0.28 and MAPE = 1.83, respectively, for the 1-month-ahead predictions). To get insights into the reasons behind the medium performance, we further study these countries' models. Colombia ranks 11th out of 163 countries on the list presenting the economic cost of violence ranked by percentage of GDP. Particularly, its economic cost of violence is 169,517 (in million 2019 PPP U.S. dollars) [12] . Thus, in line with the study's purposes, it would be important to understand and explain why the model to shows a medium performance. Figure 21 presents Colombia's model predictions with respect to the real GPI score. Colombia has been pursuing peace since 1964. Therefore we focus on a selected sample of important events to show how well our model is capturing peacefulness fluctuations and why predictions may vary compared to the real GPI score. In January 2015, President Santos said the government was ready for a bilateral ceasefire with Farc after welcoming Farc's December unilateral ceasefire. The estimated GPI captures the decrease of GPI, as opposed to the real GPI that continues increasing. In March 2016, the government and Farc delayed the signing of a final agreement. In this case, the estimated GPI adequately captures the GPI increase compared to GPI that decreases. Similarly, in September 2016, the government and Farc signed a historic peace accord. Thus, the estimated GPI is correctly decreased this month, compared to the real GPI that continues increasing. Last, in August 2019, the Farc rebel group commander defied the 2016 peace agreement and called on supporters to take up arms again. Consequently, the GPI score should increase, and Colombia's model adequately captures this peace fluctuation compared to the real GPI that continues decreasing. The real GPI score does not depict these peacefulness changes because it is a monthly index upsampled from a yearly index. Therefore, some small changes are smoothed out on the real index or if important ones are depicted later on the following year (Section 3.1 includes further details on the upsampled GPI).

In addition to Colombia, we analyze Chile further to understand its medium performance better. Based on the 2020 GPI report [12] , Chile has its lowest levels of peacefulness since the inception of the GPI. Figure 22 depicts Chile's model predictions with respect to the real GPI score. The plot demonstrates that the predictions curve does not follow the real GPI curve till March 2019. In March 2019, we observe the real GPI score increasing abruptly till March 2020, and the predictions curve does not follow the real GPI score till October 2019. In October 2019, Chile was rocked by mass protests at economic inequality, prompted by a subsequently-reversed rise in Santiago metro fares. The estimated GPI score, in contrast with the real GPI score, captures this score increase on time. The reason that the real GPI score anticipates this increase may be that the GPI score is yearly and upsampled to a monthly index. Therefore it depicts the abrupt peacefulness turbulence already from March 2019. 

Oct 2019 Chilean protests begin GPI score 1-month-ahead predictions Chile Figure 22 Chile predictions, with respect to the real GPI score. Chile 1-month-ahead predictions (blue curve), with respect to the real GPI score (orange curve). The estimated GPI score adequately captures the disturbance in peace in October 2019, that the Chilean protests begun, as compared to the real GPI score.

We also deepen the analysis to find out why some country models show low performance. To control to what extent these countries are covered from the GDELT news, we investigate if there is any correlation between each country's mean number of the overall news and model's performance. We also control any correlation between each country's mean number of monthly news and the model's monthly performance. However, we do not find out any correlation. Another possible explanation for some countries' low performance, which could be further explored, is that some countries might be under-represented through the GDELT news or even over-represented [17] . For example, many United States news media, which is the strongest player in the media industry, are tracked by GDELT. The United States news in the English language might not sufficiently cover events happening in foreign countries or non-English speaking countries.

Moreover, news media could introduce additional biases in the study. First, they sometimes misrepresent reality. For example, they give a distorted version of the crimes within a city with a significant bias towards violence [117] . Second, news media datasets contain the gatekeeping bias, i.e., the journalists decide on which event to publish, the coverage bias, related to the over-coverage or under-coverage of an event, and the statement bias, i.e., when the content of an article might be favorable or unfavorable towards certain events [118] .

New technologies have been increasingly acknowledged as critical tools to foster peacefulness [119, 120] . In particular, new digital data streams harnessed with AI techniques allow for predictive analytics to enhance early warning about emerging conflicts and operational risks, cost-and time-effectively.

This study exploits GDELT, a database containing digital news related to sociopolitical events, to estimate the monthly peacefulness values through GPI. Measuring the GPI score at a monthly level can indicate trends at a much finer scale than it is possible with the yearly official measurements, capturing month-to-month fluctuations and significant events that would be otherwise neglected. We use machine learning techniques to estimate the GPI values from 1-month-ahead up to 6-months-ahead for 163 countries worldwide, with different socio-economic, political, and military profiles. There are countries for which the model performance is high, while for others, the model performance is medium or low. We conduct in-depth analysis on country models with high performance, such as Saudi Arabia, Yemen, the United States, and the United Kingdom. We apply explainability techniques to provide explanations for their models' predictions and reveal the profile of each country.

For example, the most important variables for Yemen are related to military aid, territory occupation, bombing, negotiations, discussions, yields, visits, international involvements, and consults, revealing a war-torn country profile. Additionally, an analysis of SHAP local explanations of the selected countries' models allows us to explain the errors in the predictions and identify the events that drive these errors.

There is an aspect of our study that we should take into consideration. Considering that GPI is a yearly index, we upsampled its yearly values linearly to monthly values. The linear upsampling is an assumption since the monthly data generated do not correspond to the real monthly GPI. Alternatively, another assumption could be to increase the frequency of GPI through stochastic differential equation (SDE) methods [121] , a more complex methodology than simple linear interpolation. Considering that both solutions are assumptions and that our main goal is to demonstrate that monthly peacefulness can be captured through the news data, we choose the simplest one.

Future studies could deepen more the analysis by trying different upsampling methodologies. An alternative solution could be replacing GPI with a monthly index, which would not require upsampling.

Another line of future research lies in the analysis of the results per country. As discussed in Section 4.2, the models show low performance in predicting the GPI value, for certain countries. One approach to improve the models' performance is to change the training data length based on the history of the country, usually depicted on the GPI. For example, as we show for Yemen, the performance improves by changing the training data from the most recent 72 months to the most recent 36 months. Additionally, news media might introduce biases, driving the models to show low performance in predicting the GPI value. Therefore, it would be beneficial to study in-depth the representativeness of GDELT news, as some countries might be under-represented or over-represented, to help us explain why some models fail to demonstrate high or at least medium performance.

Last but not least, we highlight that machine learning models are a powerful tool for solving prediction problems. Still they are not inherently causal, and interpreting them with SHAP will fail to answer causal questions accurately. Therefore, we indicate two additional points that can improve early-warning conflict systems: more information about the causes of conflicts and war and theoretical models representing the complexity of social interactions and human decision-making. In particular, future AI-based conflict models should offer explanations for conflicts and war and plans for preventing them. This is a difficult task because conflict and war dynamics are multi-dimensional, and the data collected today are too narrow, sparse, and disparate [7] .

Overall, the analysis of our results shows great promise for the estimation of GPI through GDELT and, in general, for the measurement of peacefulness using big data and AI. We believe that this study is valuable to policymakers, peacekeepers, the scientific community, and especially to researchers interested in "Data Science for Social Good". Indeed, GDELT could be used not only for peacefulness but for any other well-being dimension and socio-economic index related to societal progress. of violent crime", "Likelihood of violent demonstrations", "Number of jailed population per 100,000 people", "Number of internal security officers, and police per 100,000 people". • Militarization contains: "Military expenditure as a percentage of GDP", "Number of armed services personnel per 100,000 people", "Volume of transfers of major conventional weapons as recipient (imports) per 100,000 people", "Volume of transfers of major conventional weapons as supplier (exports) per 100,000 people", "Financial contribution to UN peacekeeping missions", "Nuclear and heavy weapons capabilities", and "Ease of access to small arms and light weapons".

The GDELT event categories we use are related to 20 topics, as described below. For each topic, we provide a short description and a few examples of event categories:

Make Public Statement refers to public statements expressed verbally or in action, such as "Make statement", "Make pessimistic comment", and "Decline comment". Appeal refers to requests, proposals, suggestions and appeals, such as "Appeal for material cooperation", "Appeal for economic cooperation", and "Appeal to others to settle dispute". Express Intent To Cooperate refers to offer, promise, agree to, or otherwise indicate willingness or commitment to cooperate, such as "Express intent to engage in material cooperation" and "Express intent to provide material aid". Consult refers to consultations and meetings, such as "Discuss by telephone" and "Host a visit". Engage In Diplomatic Cooperation refers to initiate, resume, improve, or expand diplomatic, non-material cooperation or exchange, such as "Sign formal agreement" and "Praise or endorse". Engage In Material Cooperation refers to initiate, resume, improve, or expand material cooperation or exchange, such as "Cooperate economically" and "Share intelligence or information". Provide Aid refers to provisions and extension of material aid, such as "Provide economic aid" and "Provide humanitarian aid". Yield refers to yieldings and concessions, such as "Accede to requests or demands for political reform", "De-escalate military engagement", and "Return, release". Investigate refers to non-covert investigations, such as "Investigate crime, corruption" and "Investigate human rights abuses". Demand refers to demands and orders, such as "Demand political reform" and "Demand settling of dispute". Disapprove refers to the expression of disapprovals, objections, and complaints, such as "Criticize or denounce" and "Complain officially". Reject refers to rejections and refusals, such as "Reject request or demand for material aid" and "Reject mediation". Threaten refers to threats, coercive or forceful warnings with serious potential repercussions, such as "Threaten with military force" and "Threaten with administrative sanctions". Protest refers to civilian demonstrations and other collective actions carried out as protests such as "Demonstrate or rally" and "Conduct strike or boycott".

Exhibit Force Posture refers to military or police moves that fall short of the actual use of force, such as "Exhibit military or police power" and "Increase military alert status". Reduce Relations refers to reductions in normal, routine, or cooperative relations, such as "Reduce or break diplomatic relations" and "Halt negotiations". Coerce refers to repression, violence against civilians, or their rights or properties, such as "Arrest, detain" and "Seize or damage property". Assault refers to the use of different forms of violence, such as "Conduct non-military bombing" and "Abduct, hijack, take hostage". Fight refers to uses of conventional force and acts of war, such as "Use conventional military force" and "Fight with small arms and light weapons". Engage In Unconventional Mass Violence refers to uses of unconventional force that are meant to cause mass destruction, casualties, and suffering, such as "Engage in ethnic cleansing" and "Detonate nuclear weapons".

Linear regression Linear regression, one of the simplest and most widely used regression techniques, calculates the estimators of the regression coefficients (the predicted weights) by minimizing the sum of squared residuals [111] . One of its main advantages is the ease of interpreting results.

Elastic Net is a regularized variable selection regression method. One of the essential advantages of Elastic Net is that it combines penalization techniques from the Lasso and Ridge regression methods into a single algorithm [122] . Lasso regression penalizes the sum of absolute values of the coefficients (L1 penalty), Ridge regression penalizes the sum of squared coefficients (L2 penalty), while Elastic Net imposes both L1 and L2 penalties. This means that Elastic Net can completely remove weak variables, as Lasso does, or reduce them by bringing them closer to zero, as Ridge does. Therefore, it does not lose valuable information, but still imposes penalties to lessen the impact of certain variables.

Decision trees are used to visually and explicitly represent decisions, in the form of a tree structure. A decision tree is called regression tree when the dependent variable takes continuous values [122] . The goal of using a regression tree is to create a training model that can predict the value of the dependent variable by learning simple decision rules inferred from the training data. The regression tree induction algorithm divides the dataset into smaller data groups, while simultaneously an associated decision tree is incrementally developed. The final tree consists of decision nodes and leaf nodes. A decision node has two or more branches, each representing values for the variable tested. A leaf node represents a decision on the value of the dependent variable. The topmost decision node, called the root node, corresponds to the most important variable. The main difference between a regression tree and a decision tree is that for regression problems, the objective function is to minimize the variance in each partition.

SVR [123] is a regression learning approach which, comparing to other regression algorithms that try to minimize the error between the real and predicted value, uses a symmetrical loss function that equally penalizes high and low misestimates.

In particular, it forms a tube symmetrically around the estimated function (hyperplane), such that the absolute values of errors less than a certain threshold are penalised both above and below the estimate, but those within the threshold do not receive any penalty. The most commonly used kernels, for finding the hyperplane, is the Radial Basis Function (RBF) kernel, that we also use for our analysis. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. Moreover, it has excellent generalization capability, and provides high prediction accuracy.

Random Forest limits the risk of a Decision Tree to overfit the training data [122] . As the names "Tree" and "Forest" imply, a Random Regression Forest is essentially a collection of individual Regression Trees that operate as a whole. A Regression Tree is built on the entire dataset, using all the variables of interest. On the contrary, Random Forest builds multiple Regression Trees from randomly selecting observations and specific variables and then averages over all trees' prediction. Individually, predictions made by Regression Trees may not be accurate, but combined, are, on average, closer to the true value.

Extreme Gradient Boosting (XGBoost) XGBoost [124] is a scalable machine learning regression system for tree boosting. It uses a gradient descent algorithm and incorporates a regularized model to prevent overfitting. Comparing to Random Forest that builds each tree independently and combines them in parallel, XGBoost uses boosting, combining weak learners (usually decision trees with only one split, called decision stumps) sequentially, so that each new tree corrects the errors of the previous one. In particular, XGBoost corrects the previous mistakes made by the model, learns from it and its next step enhances the performance until there is no scope of further improvements. Its main advantage is that it is fast to execute and gives high accuracy.

The hyperparameters we tune for Elastic Net are α, which is the relative importance of the L1 (LASSO) and L2 (Ridge) penalties, and λ, which is the amount of regularization used in the model. For Decision Tree, we tune the complexity parameters maxdepth, which is the maximum depth of the tree), minsamplessplit, which is the minimum number of samples required to split an internal node, and minsamplesleaf , which is the minimum number of samples required to be at a leaf node. For Random Forest, similarly to Decision Tree, we tune the maxdepth, the minsamplessplit, and the minsamplesleaf . We also tune the nestimators, which accounts for the number of number of trees in the model, and the maxf eatures, which corresponds to the number of variables to consider when looking for the best split. For XGBoost, we tune the nestimators, similarly to Random Forest, and the maxdepth, similarly to Decision Tree. We also tune the learningrate, a value that in each boosting step, shrinks the weight of new variables, preventing overfitting or a local minimum, and colsample b ytree, which represents the fraction of columns to be subsampled, it is related to the speed of the algorithm and it prevents overfitting.

Last, for SVR RBF model we tune the regularization parameter C, which imposes a penalty to the model for making an error, and gamma parameter, which defines how far the influence of a single training example reaches.

Pearson Correlation, a measure of the linear dependence between two variables during a time period [t 1 , t n ], is defined as:

(1)

Root Mean Square Error (RMSE), a measure of prediction accuracy that represents the square root of the second sample moment of the differences between predicted values and actual values, is defined as:

Mean Absolute Percentage Error (MAPE), a measure of prediction accuracy between predicted and true values, is defined as:

The median Pearson Correlation for the Linear models for the 1-month-ahead predictions is 0.069, and the median MAPE is 39.273. These results demonstrate that Linear models show lower performance not only from the XGBoost models (0.521, and 1.593, respectively), but also from the Elastic Net models (0.327, and 1.997, respectively), already from the 1-month-ahead predictions. Figure 23 RMSE for all country models. RMSE between the real and the predicted 1-, 3-, and 6-months-ahead GPI values at a country level, for all prediction models. The boxplots represent the distribution of the aforementioned performance indicators for all country models. The plots' data points correspond to each country model. Overall, XGBoost models outperform the rest of the four models.

Organisation for Economic Co-operation and Development: How's Life?: Measuring Well-being

Measuring objective and subjective well-being: dimensions and data sources

Sustainable development goals (sdgs): Are we successful in turning trade-offs into synergies?

Towards integration at last? the sustainable development goals as a network of targets

Pathways for peace: Inclusive approaches to preventing violent conflict

Retool AI to forecast and limit wars

Africa's missing billions: International arms flows and the cost of conflict

To boldly know: Knowledge, peacekeeping and remote data gathering in conflict-affected states

The role of artificial intelligence in achieving the sustainable development goals

Big data, new technologies, and sustainable peace: Challenges and opportunities for the un

The Institute for Economics and Peace: Global Peace Index

The Institute for Economics and Peace: VISION of HUMANITY

The GDELT Project

Sentiment analysis in the news

Estimating countries' peace index through the lens of the world news as monitored by gdelt

A first look at global news coverage of disasters by using the gdelt dataset

A survey of methods for explaining black box models

Consistent individualized feature attribution for tree ensembles

A unified approach to interpreting model predictions

International commodity prices, growth and the outbreak of civil war in Sub-Saharan Africa

A new measure of the 'democratic peace': what country feeling thermometer data can teach us about the drivers of American and Western European foreign policy

The Institute for Economics & Peace: Structures of peace: identifying what leads to peaceful societies

Conducting web-based surveys

Lies, damned lies, and survey self-reports? identity as a cause of measurement bias

Crime prediction using Twitter sentiment and weather

Predicting crime with routine activity patterns inferred from social media

Measuring ambient population from location-based social networks to describe urban crime

Crime and its fear in social media

Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs

Predicting and Preventing Emerging Outbreaks of Crime

Detecting and preventing emerging epidemics of crime

Who 'tweets' where and when, and how does it help understand crime rates at places? measuring the presence of tourists and commuters in ambient populations

Let them tweet cake: Estimating public dissent using twitter

Sentiment analysis combination in terrorist detection on twitter: A brief survey of approaches and techniques

Using social media to measure conflict dynamics: An application to the 2008-2009 Gaza conflict

# GazaUnderAttack: Twitter, Palestine and diffused war

Using social media to measure foreign policy dynamics: An empirical analysis of the Iranian-Israeli confrontation (2012-13)

Inferring international and internal migration patterns from twitter data

Leveraging facebook's advertising platform to monitor stocks of migrants

Combining social media and survey data to nowcast migrant stocks in the united states

A french corpus for event detection on twitter

Analyzing large-scale human mobility data: a survey of machine learning methods and applications

Scikit-mobility: a Python library for the analysis, generation and risk assessment of mobility data

A survey of results on mobile phone datasets analysis

so) big data and the transformation of the city

A Survey on Deep Learning for Human Mobility

Once upon a crime: towards crime prediction from demographics and mobile data

Predictable policing: Measuring the crime control benefits of hotspots policing at bus stops

Detecting criminal organizations in mobile phone networks

Spatial analysis of crime incidence and adolescent physical activity

Addressing under-reporting to enhance fairness and accuracy in mobility-based crime prediction

Socio-economic, built environment, and mobility conditions associated with crime: a study of multiple cities

An analytical framework to nowcast well-being using mobile phone data

Using big data to study the link between human mobility and socio-economic development

Small Area Model-Based Estimators Using Big Data Sources

Network diversity and economic development

A general approach to detecting migration events in digital trace data

Human migration: the big data perspective

Inferring and modeling migration flows using mobile phone network data

Exploring the use of mobile phone data for national migration statistics

Evaluation of home detection algorithms on mobile phone data using individual-level ground truth

Dynamic population mapping using mobile phone data

Safe spaces embedded in dangerous contexts: How chicago youth navigate daily life and demonstrate resilience in high-crime neighborhoods

Open source data reveals connection between online and on-street protest activity

Association between volume and momentum of online searches and real-world collective unrest

Immigration enforcement awareness and community engagement with police: Evidence from domestic violence calls in los angeles

Women's strategies addressing sexual harassment and assault on public buses: an analysis of crowdsourced data

Validating media-driven and crowdsourced police shooting data: a research note

Hope Speech Detection: A Computational Analysis of the Voice of Peace

Realtime predictive patrolling and routing with mobility and emergency calls data

Towards a place-based measure of fear of crime: A systematic review of app-based and crowdsourcing approaches

Humanitarian applications of machine learning with remote-sensing data: review and case study in refugee settlement mapping

Remote sensing of violent conflict: eyes from above

Landsat-based early warning system to detect the destruction of villages in darfur, sudan. Remote sensing of environment

Can night-time light images play a role in evaluating the syrian crisis?

Introducing ACLED-Armed Conflict Location and Event Data

Conflict and Peace Economics: Retrospective and Prospective Reflections on Concepts, Theories, and Data

Measuring peace: Comparability, commensurability, and complementarity using bottom-up indicators

Covid-19 and armed conflict

Views: a political violence early-warning system

Arabia Inform

Forecasting violent events in the middle east and north africa using the hidden markov model and regularized autoregressive models

Forecasting civil unrest using social media and protest participation theory

Predicting social unrest events with hidden Markov models using GDELT. Discrete Dynamics in Nature and Society

Predicting social unrest using GDELT

Using Machine Learning for Prediction of Factors Affecting Crimes in Saudi Arabia

SURGE: Social Unrest Reconnaissance GazEteer

An online framework for temporal social unrest event prediction using news stream

Graph-based method for detecting occupy protest events using gdelt dataset

Application of data science to discover violence-related issues in iraq

Detecting and forecasting domestic political crises: A graph-based approach

Multi-level analysis of peace and conflict data in GDELT

Predicting future levels of violence in afghanistan districts using gdelt

The cooperative and conflictual interactions between the united states, russia, and china: A quantitative analysis of event data

Event prediction with learning algorithms-a study of events surrounding the egyptian revolution of 2011 on the basis of micro blog data

A Multi-Scale Approach to Data-Driven Mass Migration Analysis

Refugee Mobility: Evidence from Phone Data in Turkey

Integration of Syrian refugees: insights from D4R, media events and housing market data

Forecasting asylum applications in the european union with machine learning and data at scale

An analysis of the TABARI coding system

Cameo: Conflict and mediation event observations event and actor codebook

Gdelt: Global data on events, location, and tone

What is bigquery? In: Proceedings of the 19th International Database Engineering & Applications Symposium. IDEAS '15

Forecasting: Principles and Practice

Good bye traditional budgeting, hello rolling forecast: has the time come?

Forecasting political conflict in asia using latent dirichlet allocation models

Explaining prediction models and individual predictions with feature contributions

why should i trust you?" explaining the predictions of any classifier

Anchors: High-precision model-agnostic explanations

An Introduction to Statistical Learning

Machine Learning Essentials: Practical Guide in R. sthda

Mean absolute percentage error for regression models

On the relationship among values of the same summary measure of error when used across multiple characteristics at the same point in time: An examination of malpe and mape

User's guide to correlation coefficients

The Group of Twenty (G20). Routledge

The relationship between media portrayals and crime: perceptions of fear of crime among citizens

Do the robot: Lessons from machine learning to improve conflict forecasting

Big data and peacebuilding

Simulation and inference for stochastic processes with yuima. A comprehensive R framework for SDEs and other stochastic processes

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Support vector regression

This work is partially supported by the European Community programme under the funding schemes: Research Infrastructure G.A. 871042 SoBigData++ and ERC-2018-ADG G.A. 834756 "XAI: Science and technology for the eXplanation of AI decision making". We thank Stefano-Maria Iacus, Stan Matwin, Francesca Chiaromonte, and Donato Farina for their feedback and inspiration. We also thank Daniele Fadda for support on data visualization. 

The GPI is a composite index of these 23 indicators weighted and combined into one overall score. The GPI comprises 23 indicators of the absence of violence or fear of violence aggregated into three major categories: Ongoing Domestic & International Conflict, Societal Safety & Security, and Militarization:• Ongoing Domestic & International Conflict includes: "Number and duration of internal conflicts", "Number of deaths from external organized conflict", "Number of deaths from internal organized conflict", "Number, duration and role in external conflicts", "Intensity of organized internal conflict", and "Relations with neighbouring countries". • Societal Safety & Security encompasses: "Level of perceived criminality in society", "Number of refugees and internally displaced people as a percentage of the population" , "Political instability", "Political Terror Scale", "Impact of terrorism", "Number of homicides per 100,000 people", "Level

The code to reproduce the study is available at https://github.com/VickyVouk/GDELT_GPI_SHAP_project.

The authors declare that they have no competing interests.

This work has been partially funded by EU project H2020 SoBigData++ #871042.Author's contributions VV : study conceptualization, data preprocessing and analysis, experiment running, code implementation, interpretation of results, writing, plots, IM: study conceptualization, data preprocessing and analysis, experiment running, code implementation, interpretation of results, writing, FG: interpretation of results and study direction, LP: study conceptualization, experiment design, interpretation of results, writing, study direction.