Paper Title (use style: paper title)


International Journal of Advanced Network, Monitoring and Controls          Volume 04, No.02, 2019 

81 

Hazard Grading Model of Terrorist Attack Based on 

Machine Learning 
 

Yu Jun 

School of Computer Science and Technology 

Xi'an Technological University 

Xi'an 710021, Shaanxi, China 

e-mail: yujun@xatu.edu.cn 

Xian Tong 

School of Computer Science and Technology 

Xi'an Technological University 

Xi'an, 710021, Shaanxi, China 

Hu Zhiyi  

Institute of Engineering Design  

Army Academy of PLA 

Beijing, 100042, China 

 
Liu Yutong 

Engineering Design Institute  

Army Academy of PLA 

Beijing, 100042, China 
 

Abstract—In this paper, there is no unified grading standard 

for the harm of terrorist attacks. A classification model of 

terrorist incidents based on machine learning is proposed. 

First, the data related to the hazard in the Global Terrorism 

Database (GTD) is extracted and preprocessed. Secondly, the 

data is extracted by principal component analysis, and all 

events are aggregated into 5 by K-means clustering. Again, the 

entropy method is used to calculate the weighting coefficient of 

each indicator, and the comprehensive score of the hazard of 

each type of terrorist attack is calculated. Finally, the scores 

are divided into 1-5 levels of hazard grading models in order of 

high to low. The results show that the hazard grading model 

can scientifically and objectively quantify terrorist attacks. 

Keywords-Terrorist Attacks; Hazard; Hierarchical Model; 

Principal Component Analysis; K-Means Clustering; Entropy 

Method 

I. INTRODUCTION  

A terrorist attack is an aggression committed by an 
extremist or organization that is not in conformity with 
international morality and is directed against, but not 
limited to, civilians and civilian installations. It not 
only has great destructiveness and destructive power, 
but also directly causes huge casualties and property 
losses. It also brings tremendous psychological 
pressure to people, causing a certain degree of turmoil 
in society and greatly hindering economic development. 
Global terrorism is a phenomenon of public interest, 
and everyone is directly affected by it. Therefore, anti-
terrorism work is imminent. Big data is now the main 
source of counter-terrorism intelligence. The Global 
Terrorism Database (GTD) is the world's most 
comprehensive database of non-confidential terrorist 

attacks, containing more than 180,000 terrorist attacks, 
each containing at least 45 variables. An in-depth 
analysis of data related to terrorist attacks will help 
deepen people's understanding of terrorism and provide 
valuable information support for opposing terrorism 
and preventing terrorism. Data collection and 
preprocessing intelligence are the lifeblood of counter-
terrorism work. Keeping reliable information in a 
timely manner can play an active role in combating 
terrorism and effectively curb the spread of 
terrorism[2]. 

Grading catastrophic events (such as earthquakes, 
traffic accidents, meteorological disasters, etc.) is an 
important task of social management. The usual 
grading generally adopts a subjective method, and the 
authority stipulates the grading standard. The 
harmfulness of terrorist attacks depends not only on the 
two aspects of casualties and economic losses, but also 
on the timing, geography, and targeted objects. 
Therefore, it is difficult to fully reflect these factors. 
The  

hazard grading of terrorist incidents can clearly 
define the future attacks, and different levels of events 
correspond to different treatments. This will not only 
help the management of social security, but also avoid 
unnecessary waste of manpower and property. 

Combined with big data processing technology, this 
paper establishes a hierarchical model based on PCA 
algorithm, K-meas clustering algorithm and entropy 
method. First, 14 evaluation indicators related to the 
hazard of the event were selected to preprocess the 

DOI: 10.21307/ijanmc-2019-051


International Journal of Advanced Network, Monitoring and Controls          Volume 04, No.02, 2019 

82 

existing data. Secondly, the PCA method was used to 
reduce the index from 14 dimensions to 4 dimensions, 
and the reduced dimension vector was obtained by the 
clustering algorithm. Gather into 5 categories, you can 
get the category corresponding to each event. Finally, 
using the entropy method to score the hazard of each 
event and according to the average hazard score of 
each class. According to the degree of harm from high 
to low levels 1 to 5. A hazard grading model of 
terrorism events is obtained with a hazard rating of 5. 

II. DATA PREPROCESSING 

In this paper, the hazard grading model of terrorism 
events data is established from some important fields of 
the GTD original database. The selected data handling 
requires missing value processing, conversion of 
characters to numeric values and numerical processing. 

A. Important field selection 

The Important field of hierarchical is pointed out by 
the World Anti-Terrorism Incident Research. The 
Terrorism Hazard Classification Model Data Table has 
selected the following 14 fields from GTD, as shown in 
Table 1. 

TABLE I.  THE SELECTED FIELD TABLE 

Field Description 

extended Whether it is a continuous event 

latitude latitude 

longitude longitude 

success Successful attack 

suicide Suicide attack 

nkill Total number of deaths 

propextent Degree of property damage 

nwound Total number of injuries 

country country 

region area 

city city 

attacktype Attack type 

targtype Target/victim type 

weapontype Weapon type 

 
B. Missing value processing 

In the selected field, Python's function DataFrame. 
dropna  can delete rows or columns with null values, 
and retain all data that is not empty.  Then the character 
field needs to be converted to a numeric field. 

C. Converting character fields to numeric fields 

The character field that need to be converted is as 
follows: 

1) Eventid: Events in the GTD are numbered with 
12 digits. The first 8 digits are recorded in the format 
"yyyymmdd". The last 4 digits calibrate the serial 
number of the day, e.g. 0001, etc.  

2) Country: According to the developed economies 
assessment standards recognized by the United Nations, 
168 countries are divided into developed and 
underdeveloped countries. Since terrorist attacks are 
more harmful to developed countries, the relevant 
assignments are shown in Table 2.1. 

3) Region: Count the frequency of terrorist 
incidents in each region and assign the frequency to 
regional indicator values. 

4) City: The world city is divided into three levels: 
the capital, the provincial capital, and other cities. 
Since the terrorist attacks are more harmful to the 
political and economic centers, the relevant 
assignments are shown in Table 2.1. 

5) Attack type: Counting the frequency of 
occurrence of 9 types of attacks, and assigning the 
frequency to the attack type indicator value. 

6) Weapon type: Counting the frequency of 
occurrence of 13 weapon types, and assigning this 
frequency to the weapon type indicator value. 

7) Targtype: Counting the frequency of occurrence 
of 22 target types, and assigning this frequency to the 
target type indicator value. 

TABLE II.  THE STATE AND CITY ASSIGNMENT 

Index assignment 

developed countries 2 

underdeveloped countries 1 

the capital 3 

the provincial capital 2 

other cities 1 


International Journal of Advanced Network, Monitoring and Controls          Volume 04, No.02, 2019 

83 

D. Numerical processing 

In the original GTD database, the nkill field 
includes the number of all victims and terrorists who 
directly caused death from terrorist incidents. We use 
only requires the number of victims and does not 
require the death toll of terrorists. Therefore, the 
number of victims is obtained by subtracting the 
number of terrorist deaths (nkiller) from the total 
number of deaths. 

III. TERRORIST ATTACK HAZARD CLASSIFICATION 
MODEL 

In this paper, the PCA algorithm, K-means 
clustering algorithm and entropy method are used to 
classify the terrorist attacks. The process of building a 
hierarchical model is divided into four steps: 

1) The 14 indicators with greater influence is 
standardized by PCA algorithm. We construct a 14-
dimensional matrix, and then reduce the matrix from 
14 dimensions to 4 dimensions. 

2) The K-means algorithm is used to cluster all the 
terrorist events in the matrix into five major categories, 
i.e. five hazard levels. 

3) Using the entropy weight method finds the 
weights of each of the 14 indicators, and then 
weighting and summing the 14 indicators of each event 
to obtain the score of the event. For each hazard level, 
finding the average score for all events is at that level. 

4) Sorting by the average scores of the five hazard 
levels, We divide them into one to five grades from 
high to low. The higher score means the greater 
damage.  

A. Using the PCA algorithm for dimensional 
reduction 

Principal Component Analysis (PCA) extracts M-
dimensional feature matrices from N-dimensional 
matrices. First, we calculates eigenvalues and 
eigenvectors of N-dimensional matrices. According to 
the order of PCA eigenvalues from large to small, we 
select the corresponding first M eigenvectors., and then 
obtain an N*M feature transformation matrix T. In this 
paper, N=14, M=4. The dimensionality reduction is 
completed.[6] 

The order of  PCA eigenvalues generated by 14 
indicators from large to small is shown in Table 3. 

 
TABLE III.  CHARACTERISTIC VALUES CORRESPONDING TO THE 
INDICATORS 

Indicators Characteristic values 

nkill 9.82022087e-01 

nwound 8.06184462e-02 

targtype 7.91122120e-03 

country 5.20872985e-02 

attacktype 4.84991077e-03 

region 4.01240379e-02 

suicide 2.66626688e+00 

city 2.60031933e-02 

longitude 1.84972981e+02 

extended 1.63936354e+03 

latitude 1.36725606e+03 

propextent 1.06560032e-01 

success 1.04574700e+02 

weapontype 0.00000001e+00 

 
In this paper, 98686 data is reduced by the PCA 

algorithm, i.e. the original 14-dimensional matrix 
  1 2 3 4 5 6 7 8 9 10 11 12 13 14x = x ,x ,x ,x ,x ,x ,x ,x ,x ,x ,x ,x ,x ,x is reduced to a 4-

dimensional matrix
1 2 3 4

Y = y ,y ,y ,y   . The corresponding 

contribution degrees of the 4-dimensional feature vectors 
are: 0.49, 0.42, 0.06, 0.03, and the sum is greater than 
0.99. Therefore, the dimension-reduced matrix preserves 
most of the original data and can be directly used for 
clustering. 

B. Using K-Means algorithm for Hazard classification 

The main idea of the K-means clustering algorithm is to 
cluster a number of discrete data points with k centroids and 
divide them into k clusters to distinguish data points with less 
similarity. Sum of the squared error (SSE) is the objective 
function of clustering, and classify data points with similar 
similarity into one class. The method finally converges to the 
optimal solution by continuously updating the centroid 
attribution and centroid position of the data points[1]. The 
algorithm process is as follows: 

1) We select 5 event objects as the initial cluster center. 

2) We calculate the Euclidean distance from each event 
to each cluster center and assign this event to the nearest 
cluster. 

3) After all the event assignments are completed, the five 
cluster centers are recalculated, and compared with the 
cluster center obtained in the previous calculation. If the 
cluster center changes, the Euclidean distance and the 
assigned category are recalculated. 


International Journal of Advanced Network, Monitoring and Controls          Volume 04, No.02, 2019 

84 

4) When the cluster center does not change, the 
clustering result is directly output. 

Calculate the cluster center to which each type of event 
belongs, as shown in Table 4. 

TABLE IV.  TABLE 4. CLUSTERING CENTER FOR EVENT 
CLASSIFICATION 

type X1 X2 X3 X4 numbers 

0 2.4843 -16.3826 -1.3464 0.3081 63122 

1 -3.3968 22.8297 -3.8782 0.0615 37848 

2 825.778 873.697 28.9316 -104.59 2 

3 13.8411 -127.794 19.7789 -2.7281 3500 

4 -9.5985 63.3898 16.7324 -1.2382 9711 

The formula for calculating each event category is as 
shown in Equations (1) to (6). 

       
2 2 2 2

1 1 2 3 1
( 8256.783) ( 873.658) ( 28.915) ( 104.608)D y y y y  

        
2 2 2 2

2 1 2 3 4
( 13.840) ( 127.794) ( 19.779) ( 2.728)D y y y y  

       
2 2 2 2

3 1 2 3 4
( 2.484) ( 16.382) ( 1.346) ( 0.308)D y y y y  

2 2 2 2

4 1 2 3 4
( 3.396) ( 22.829) ( 3.878) ( 0.061)D y y y y         

        
2 2 2 2

5 1 2 3 4
( 9.598) ( 63.289) ( 16.731) ( 1.238)D y y y y  

 i 1 2 3 4 5min min{D ,D ,D ,D ,D }  

Among them is 1 2 3 4Y = y ,y ,y ,y    the feature 

component vector after dimension reduction by 

PCA algorithm. Di is the Euclidean distance 

between the dimension vector and the five cluster 

centers. mini is the minimum Euclidean distance, 

and i is the final event category. 

C. Using entropy method for calculating weight 
coefficient 

The entropy method is a mathematical method used 
to determine the degree of dispersion of an indicator. 
With the great degree of dispersion comes great impact 
of the comprehensive evaluation of the indicator. The 
entropy value can be used to determine the degree of 

dispersion of an indicator. The steps of calculating the 
weight coefficient by the entropy method are as follows: 

1) We select 14 indicators of 98686 events, and use 
xij to indicate the index value of the i-th indicator in the 
j-th terrorist attack. ）；；（ 1498686n14,,1;98686,,1i  mj   

2) Normalization of 14 indicators is Normalized 
processing. The absolute values of the 14 indicators are 
conversed into relative values. It  has different 
representative meanings that the positive indicator and 
the negative indicator value (the higher the positive 
indicator value is the better), the lower the negative 
indicator value is the better), as shown in Equation (7) 
and Equation (8). 








'

1 1

min{ ,... }

max{ ,..., } min{ ,..., }
ij ij nj

ij

j nj j nj

x x x
x

x x x x  








' 1

1 1

max{ ,... }

max{ ,..., } min{ ,..., }
j nj ij

ij

j nj j nj

x x x
x

x x x x
 

3) Calculating the proportion of the i-th event in the 
j-th index are shown in Equation 9. 






n

i

ij

ij

ij

x

x

1

p

 

4) Calculating the entropy value of the j-th indicator, 
are shown in Equation 10. 

 )nln(/1k            

0),ln(e
1



 


j

n

i

ijijj
eppk

 

5) Calculating the information entropy redundancy 
are shown in Equation 11. 

 jj
e1d
 

6) Calculating the weights of each indicator are 
shown in Equation 12. 






m

j

j

j

j

d
1

d
w

 

7) Calculating the hazard weighting value of each 
event are shown in Equation 13. 


International Journal of Advanced Network, Monitoring and Controls          Volume 04, No.02, 2019 

85 

 




m

j

iji
w

1

j
xs

 

The weighting factors for each indicator are shown 
in Table 5. 

TABLE V.  TABLE 5. WEIGHT COEFFICIENTS OF EACH INDICATOR 

indicator x1 x2 x3 x4 x5 x6 x7 

Weight 0.25 0.01 0.26 0.15 0.17 0.08 0.01 

indicator x8 x9 x10 x11 x12 x13 x14 

Weight 0.01 0.01 0.01 0.01 0.01 0.01 0.01 

D. Hazard grading result 

All events can be divided into five hazard levels by 

PCA and K-Means clustering. The hazard score of 

each event is obtained by entropy method, and the 

average value of the hazard score of each type of event 

is obtained. After sorting the average, the five hazard 

levels are shown in Table 6. 

TABLE VI.  TABLE 6. HAZARD GRADING RESULT 

Hazard level Cluster category Hazard level 

1 2 1766.7104 

2 3 3.2596 

3 0 0.6239 

4 4 -2.6904 

5 1 -0.8788 

IV. CONCLUSION 

In this paper, 14 categories related to hazard are 

selected from the Global Terrorism Database (GTD) 

for the hazard grading of terrorist attacks; after pre-

processing the data used, through principal component 

analysis (PCA) The related data is used for feature 

extraction. The K-means clustering method aggregates 

all events into five categories. The entropy method 

calculates the weight coefficient of each indicator, and 

finally obtains the comprehensive score of the harm of 

each type of attack. According to the comprehensive 

scores of the five types of attacks, a graded to five-

level classification model was obtained. This model 

quantifies the relevant data of past terrorist attacks, 

and the obtained model has objectivity. It is necessary 

to establish more detailed grading standards. 

 
REFERENCE 

[1] Sanjun Nie. Research on Counter-terrorism based on Big Data[A]. 
IEEE Beijing Section. Proceedings of 2016 IEEE International 
Conference on Big Data Analysis (ICBDA) [C]. IEEE Beijing 
Section: IEEE BEIJING SECTION Institute of Electrical Engineers 
Beijing Branch), 2016: 5. 

[2] Strang, Kenneth David & Sun, Zhaohao. (2017). Analyzing 
Relationships in Terrorism Big Data Using Hadoop and Statistics. 
Journal of Computer Information Systems. 57. 67-75. 
10.1080/08874417.2016.1181497.K. Elissa, “Title of paper if 
known,” unpublished. 

[3]  Li Wei. Characteristics and Trends of Current International Terror 
and Anti-Terrorism Struggle [J]. Modern International Relations, 
2007 (02): 22-27. 

[4] Yu Yihan, Fu Wei, Wu Xiaoping. Privacy data metric and 
hierarchical model based on Shannon information entropy and BP 
neural network[J].Journal of Communications,2018,39(12) 

[5] He Jing. Research and Analysis of Future Anti-terrorism Situation 
Based on Big Data[J]. Economic Research Guide, 2019(05): 186-187. 

[6] Wang Qi, Li Xiaopei, Dong Xinyan. Classification model of wine 
grape based on principal component analysis[J]. China High-tech 
Zone, 2018(05): 218. 

[7] Wang Chao, Yao Min, Fu Zhanzhan. Research on Emergency 
Classification Based on Fuzzy Comprehensive 
Evaluation[J].Software Guide, 2019(04): 149-15 

[8] Lu Ronghui. Terrorism and Counter-Terrorism in the Context of 
Globalization [D]. Suzhou University, 2005. 

[9] Wang Chao, Yao Min, Fu Zhanzhan. Research on Emergency 
Classification Based on Fuzzy Comprehensive 
Evaluation[J].Software Guide, 2019,18(04):149-152. 

[10] Hou Wenjing, Jiang Xinxin, Wen Hong, Lei Wenxin, Xu Aidong. 
Terminal Security Level Grading Model of BP Neural Network 
Based on Edge Side[J].Communication Technology, 2018, 51(10): 
2455-2458. 

[11] Shi Ya, Wang Xiuhua, Yang Wei, Liu Li, Tan Zhezhen, Ouyang Wei. 
Study on the grading strategy of comprehensive evaluation model for 
long-term care of the elderly[J]. Chinese Journal of Nursing, 2018, 
53(10): 1237-1243