key: cord-0788047-ycoqpfyy
authors: Antonio, Yeftanus; Indratno, Sapto Wahyu; Saputro, Suhadi Wido
title: Pricing of cyber insurance premiums using a Markov-based dynamic model with clustering structure
date: 2021-10-26
journal: PLoS One
DOI: 10.1371/journal.pone.0258867
sha: 3f8d1b9f124f11ee750c48de69b60a10fa354167
doc_id: 788047
cord_uid: ycoqpfyy

Cyber insurance is a risk management option to cover financial losses caused by cyberattacks. Researchers have focused their attention on cyber insurance during the last decade. One of the primary issues related to cyber insurance is estimating the premium. The effect of network topology has been heavily explored in the previous three years in cyber risk modeling. However, none of the approaches has assessed the influence of clustering structures. Numerous earlier investigations have indicated that internal links within a cluster reduce transmission speed or efficacy. As a result, the clustering coefficient metric becomes crucial in understanding the effectiveness of viral transmission. We provide a modified Markov-based dynamic model in this paper that incorporates the influence of the clustering structure on calculating cyber insurance premiums. The objective is to create less expensive and less homogenous premiums by combining criteria other than degrees. This research proposes a novel method for calculating premiums that gives a competitive market price. We integrated the epidemic inhibition function into the Markov-based model by considering three functions: quadratic, linear, and exponential. Theoretical and numerical evaluations of regular networks suggested that premiums were more realistic than premiums without clustering. Validation on a real network showed a significant improvement in premiums compared to premiums without the clustering structure component despite some variations. Furthermore, the three functions demonstrated very high correlations between the premium, the total inhibition function of neighbors, and the speed of the inhibition function. Thus, the proposed method can provide application flexibility by adapting to specific company requirements and network configurations.

Currently, cyber risk management using cyber insurance is increasingly needed. Cyber risk is a type of operational risk that arises from the execution of cyberspace activities, posing a threat data from infection and recovery dynamics based on certain assumptions is shown as the solution to current cyber incident data limitations. Thus, network characteristics and metrics are critical considerations in modeling the dynamics of virus spread. However, experimental results by Xu and Hua [19] only showed the strong influence of the degree of a node in a network on cyber losses and premiums. To confirm this, we conducted a study on the regular graph using the Markov model and obtained similar results [26] . The degree of a node can only explain the number of neighbors but has not described the relationship between neighbors. Two or more nodes with the same degree can have different neighboring connection structures. The structure between neighbors of a node can be described by a network metric called the clustering coefficient, which is a clustering coefficient for how closely nodes in a graph cluster together [27] . In other words, the clustering coefficient can explain the clustering structure of a network. Several experiments have shown the influence of the clustering coefficient on disease transmission [28] [29] [30] [31] . Assuming that social networks have a high community structure and clustering coefficient, Wu and Liu [32] proposed a new model to study their influence on epidemics. According to their findings, the degree of the community determines the spread of epidemics in community networks. In contrast, an increase in the clustering coefficient reduces the epidemic spread efficiency for a community with a fixed degree. Using the SIS process, Bo Song et al. [33] concluded the same thing that in a homogeneous network (same degree for each node), clustering could inhibit epidemics. Conversely, there is no inhibiting effect during infection in heterogeneous networks. However, no one has created a model at the individual level that can explain the dynamic process of infection to the status of an individual [34] .

This study proposes a Markov-based model with the network structure effect, namely, the ε-SIS model with a clustering coefficient factor for cyber insurance pricing. We incorporate the coefficient clustering function [32, 33] into the transition probability of the Markov model or ε-SIS process [25] . Cyber insurance rates are calculated using the cost function based on two types of losses by Xu and Hua [19] . In contrast, the simulation process is run using a modified Markov-based simulation with different infection rates. In previous work, we used the average degree factor as a matrix of the network in a compartment SIS process [35] . We propose a modified Markov-based algorithm with different rates at the individual-level ε-SIS model to generate synthetic cyber-attack data in this study. This algorithm is a modification of the individual-level SIS process algorithm with homogeneous rates. The procedure was implemented through a case study on a regular (homogeneous) network using random regular graph sampling [36, 37] . Furthermore, the regular graph's theoretical background and its relationship to the local clustering effect are also presented in this paper. Moreover, the findings are validated by implementation on a real network (large network).

The remainder of this paper is developed as follows. Materials and methods discusses the concepts and methods used for rate-making using a Markov-based model with a clustering structure. The main results and findings presented in Results and discussion include regular graph theory and clustering coefficients. Results and discussion also offers a discussion of the findings of a regular and email communication network. Conclusions and future work are presented in Conclusion.

This section discusses the theories and simulation methods used for cyber insurance pricing with a clustering structure factor. These are related to the definition of clustering coefficients and how this metric defines the Markov-based model's infection rate, random regular graphs, and simulations using the modified Markov-based simulation.

Our model is an individual-level model where a node's tendency to have a clustering structure depends on a metric known as the local clustering coefficient. Let an undirected graph G = (V, E) be a representation of a network where V is a set of vertices (nodes) and E is a set of edges (links). A link (u, v) 2 E connects node u 2 V and node v 2 V. The set of neighbors of node v is denoted by

Hence, the cardinality of N(v), also known as the degree of node v, expresses the number of neighbors of node v and can be written as |N

links that connect all three nodes, is a triangle in a network G [38] . Let T(v) = |{(u, w);w, u 2 N(v), (u, w) 2 E}| be the number of triangles formed with the center at node v. The local clustering coefficient for node v is defined as

In terms of the relative density of connections in its neighborhood, it determines how connected its neighborhood is to a complete network. Thus, this metric measures the proportion of the number of triangles with the center at node v compared to the number of triangles between the neighbors of node v if all the neighbors are connected (complete network),

. For example, Fig 1 illustrates the difference in the local clustering coefficient values at node 1 (C 1 ). Node 1 has the same degree k v = 4 for each structure. However, the relationship between its neighbors is different, which causes the local clustering coefficient value of node 1 to be different. In this case, the set of possible clustering coefficients for node 1 is 0; 1 6 ; 2 6 ; 3 6 ; 4 6 ; 5 6 ; 1 values. By adding the clustering coefficient factor to the epidemic model, we can characterize the dynamics of the virus spread based on the structure between neighbors.

A regular graph with degree k denoted by k-regular graph is a graph G = (V, E) where the degree of each node is the same, namely, k v = k for every v 2 V. In other words, each node in graph G has the same number of neighbors. Several graph theories are needed to determine the existence of a k-regular graph.

Lemma 1 (The handshaking lemma [39] Lemma 3 (The existence of a regular graph). The sufficient and necessary conditions for the existence of a k-regular graph with the order n are n > k + 1 and nk even.

Proof. The maximum edge (link) of a graph with the order n is in a complete graph

and the order is n − 1. Thus, k = n − 1 or n = k + 1. This condition is the n minimum for a special k. Additionally, note that if a regular graph is of the order n, then the number of sides is nk 2 ; thus, nk must be even.

Therefore, for odd n, the regular graph is defined only for even k. Theoretical foundations for regular graphs are essential for the results and discussion sections to adequately describe the influence of clustering coefficients on regular graphs.

This study considers the cyber risk model by Xu and Hua (2019) [19] . This risk model uses two types of threats faced by each node: (1) threats from outside the network (for example, infection because node v was attacked or the user visited a malicious site) and (2) threats from within the network (e.g., infected node v attacking its neighbors). Assume that if a node is infected, it can be repaired and returned to a safe status but is still vulnerable to reinfection.

Suppose a cyberattack occurs on a network represented by an undirected graph G = (V, E) where V is a set of nodes, and E is a set of edges (links). Transmission on this network occurs via link (u, v) 2 E so that node u and node v can attack each other. The number of nodes on the network is denoted by N = |V|. The degree of a node is the number of links associated with a node. The degree of node v is denoted by deg(v). An undirected graph G = (V, E) can be written into the adjacency matrix A = (a uv ) where

Let there be N computers or devices such that v 2 1, 2, � � �, N. The status of the network at time t can be written as the vector I > (t) = (I 1 (t), I 2 (t), � � �, I N (t)), where I v (t) = 1 when node v is infected at times t and I v (t) = 0 if node v is secure (but vulnerable to attack) at times t to v = 1, 2, � � �, N. The infection probability vector is denoted by p > (t) = (p 1 (t), p 2 (t), � � �, p N (t)), where p v (t) = P(I v (t) = 1) for v = 0, 1, 2, � � �, N. At the first time t 1 infection caused data corruption or damage at node v is L 1 v and loss due to system downtime is R 1 v . The losses for the second infection are L 2 v and R 2 v , respectively, and the losses for the third infection are L 3 v and R 3 v , respectively. Thus, the total loss up to time t can be written as

where M v (t) is the number of infections from node v to time t, μ v (�) is the cost function due to infection and δ v (�) is the cost function corresponding to the length of time-to-repair. The total loss faced by the firm until t is Thus, the key quantity is how to obtain M v (t), which depends on the vector of network status up to time t, that is, I > (t). Network status vectors are obtained using a modified Markovbased model (in-homogeneous SIS) process with an inhibition function of the clustering coefficient.

Modified Markov-based model Wu and Liu (2008) [32] proposed a new model to study the effect of clustering coefficients on epidemics. According to their findings, the community level determined the spread of the virus in community networks. Conversely, an increase in clustering coefficients reduced the efficiency of epidemic spreading to a fixed community level. Using the SIS process, Bo Song et al. (2017) [33] concluded the same thing that in a homogeneous network (same degree for each node), clustering could inhibit epidemics. In contrast, there was no inhibitory effect during infection in the heterogeneous network. However, no one has yet created a model at the individual level that can explain a more specific dynamic process [34] .

The clustering coefficient influences the infection rate for each node. Let the f(C v ) function describe the effect of the high cluster on the epidemic spread speed at node v. With the same assumptions, the necessary conditions for f(C v ) are 

By supposing β j = βf(C j ), this process is a process of an in-homogeneous SIS model [40] . An in-homogeneous SIS model accommodates different infection rates for each node. Van Mieghem and Omic (2013) introduced an in-homogeneous SIS model [40] . The model adjusts the characteristics of different nodes in carrying out attacks, for example, the speed of the data transfer signal. If node j is infected at a particular time, it will attack its neighbors at the rate of β j .

Suppose that in an in-homogeneous SIS model, β j is the infection rate for node j. If node j is infected, the time-to-infection of node v due to attack from node j is an exponential random variable with a mean equal to b À 1 j . The time it takes for node v to repair is an exponential random variable with a mean equal to d À 1 v . Likewise, the time-to-infection of node v due to external net factors is an exponential random variable with a mean of ε À 1 v . The following equation gives the transition probability.

where I j (t) is the status of node j at time t and the β j attack rate of the infected neighbor of node v, i.e., node j. This model will be used to obtain the upper bound of infection probabilities and Monte Carlo simulations.

The dynamic equation for the infection probability from the in-homogeneous SIS model can be obtained with N-intertwined mean-field approximation (NIMFA) [41] as follows:

Another approximation uses the upper bound for the infection probabilities. Cator and Mieghem proved that

In other words, I v (t) and I j (t) are nonnegatively correlated for all finite graphs. These results lead to the upper bound for the infection probabilities, previously introduced for the ε-SIS model [19] .

Upper bounds for infection probabilities are conservative estimates of the premium [19] . These upper bounds are obtained by solving the dynamic equation for the infection probabilities.

Theorem 1. For the in-homogeneous SIS model with infection rate β j for j = 1, 2, � � �, N, recovery rate δ v = δ and self-infection rate ε v = ε, the upper bound of the infection probabilities are given by

where using the Markov condition with two states β j = 0; 8j 2 1, 2, � � �, N for every t � 0, δ v = δ, and ε j = ε, then we can obtain

In other words, ε dþε is the lower bound for the infection probability when there is no infection rate for every link. Thus, the equation for the upper bound of the infection probabilities is

Let (12) can be written as

This equation becomes a nonhomogeneous differential equation that can be solved in the same way as Xu and Hua (2019) [19] , and the result is

Proposition 1. The upper bound for the stationary infection probability of node v is given by

Proof. The dynamics of the upper bound enter a stationary state if lim t!1 p

We used the simulation procedure provided by Xu and Hua (2019) by modifying the rate of the interarrival time distribution. Let � F be the set of infected neighbors, where � F ¼ fj 1 ; j 2 ; � � � ; j D v g � f1; 2; � � � ; Ng and D v be the number of infected neighbors of node v. The time-to-infection of node v due to attacks from neighbors is given by the random variables

. In the Markov-based model, the random variables have exponential distributions. However, the rate of distribution may differ according to the inhibitory effect of infection at each node. Survival functions with different rates are � F j ðxÞ ¼ e À b j x , where j 2 {1, 2, � � �, N} is the index of the node. The time-to-infection due to malicious site access is given by the random variable Z v with survival function � G v ¼ e À εx , and the time-to-recovery is an exponential random variable R v with rate δ. Using the theory of alternating renewal processes and the assumption of positive lower orthant dependence [19] , the stationary upper bound of infection probability of node v is

Consider that

using Jansen's inequality Eq (18) can be written as

The result in Eq (19) is a stationary upper bound, which is the same as the result of the IH-SIS model in Proposition 1. Thus, the simulation can be carried out using the procedure given by Algorithm 1.

Algorithm 1: Simulation of cybersecurity risk with clustering coefficient factor.

Input: Local clustering coefficient of node C v , basic infection rate β, initial status, the number of simulations n sim , contract period T, set of susceptible nodes. Calculate the infection rate with inhibiting factor

Calculate the number of infected nodesM. Generate random time-to-recovery r 1 ; r 2 ; � � � ; rM from exp(δ).

for v in secure nodes do Determine the infected neighbors of node v,

if infection occurs then Change status from 0 to 1 and calculate the loss. else

Change status from 0 to 1 and calculate the loss. end end return t, network status, the loss for every node end Calculate insurance premium until T.

Output: network status, total loss, premiums.

In this section, we discuss the results of the theory and simulations that have been carried out. The simulation was carried out for the contract time T = 100 days. The selected input parameters were β = 0.2, δ = 1, and ε = 0.2. To analyze the inhibitory effect, other parameters were set the same, including the degree of the node. Therefore, the study was carried out on the regular network and its properties. A regular graph was generated for the orders n = 20 and k = 4. For the loss function, L v followed the Beta distribution with density function

wherew v is the scale parameter used to describe the wealth of node or device v, a, b, c > 0 are shape parameters, and B is the beta function. We chose a = 3, b = 8, c = 1, andw v ¼ 1500 for this case. The cost function for infection-related loss and system downtime-related loss is described as

where ψ, ψ 1 , ψ 2 are rates related to infection, initial wealth, and recovery process. The cost function parameter was chosen so that (ψ, ψ 1 , ψ 2 ) = (1 × 10 −3 , 5 × 10 −6 , 2 × 10 −5 ). The premium until time t is calculated using the standard deviation principle [42] as follows:

where the loading factor ξ = 0.15. A discussion of these results, including the theory and simulation of premiums, is obtained on a k-regular graph. Numerical studies were conducted on the 4-regular graph provided by 

The relationship between the clustering coefficient and the order of the regular graph is given by Fig 5. The average of the local clustering coefficients grows as the degree of a node k increases for each n. This result shows the average clustering coefficient that approaches 0 as the n order becomes more extensive. Thus, if n is very large and k is very small, it can be concluded that there is a minimal clustering coefficient effect on the pricing procedure on the kregular graph.

Some of the theoretical results obtained concerning the clustering coefficient and premium calculation are as follows.

Lemma 4 (Minimum effect). For 2-connected regular graph G = (V, E) with n > 3, the clustering coefficient for each node is zero. In this case, there are minimum effects of the clustering coefficient on cyber insurance premiums 8v 2 V.

Proof. All 2-connected regular graphs for n > 3 are cycle graphs (ring networks). Thus, for all {u, v, w} � V, no triangles are formed, so (u, v), (u, w) 2 E but (v, u) = 2 E. The implication is T(v) = 0 and C v = 0, 8v 2 V. Consider the conditions for the cluster function f(C v ), namely, df ðC v Þ dC v and 0 < f(C v ) < 1. Additionally, consider the effect of the clustering coefficient on the spread of the epidemic as β v = βf(C v ). Because the f(C v ) function decreases, when C v is at its minimum value, f(C v ) is at its maximum value; in other words, βf(C v ) ! β for f(C v ) ! 1, and 

Pricing of cyber insurance premiums with clustering structure there is a minimum decreasing effect of the clustering coefficient on the spread of the epidemic and the pricing of cyber insurance premiums.

Lemma 5 (Maximum effect). For a (n − 1)-connected regular graph G(V, E) with n � 3, the clustering coefficient for each node is one. In this case, there are maximum effects of the clustering coefficient on the pricing of cyber insurance premiums 8v 2 V.

Proof. All (n − 1)-regular graphs for n � 3 are complete graphs (K n ). Thus, for all {u, v, w} � V, triangles are always formed so that (u, v), (u, w), (v, u) 2 E, 8u, v, w 2 V. The implications are TðvÞ ¼ ðnÀ 1ÞðnÀ 2Þ 2 and C v = 1, 8v 2 V. Consider the conditions for the f(C v ) clustering function, namely, df ðC v Þ dC v and 0 < f(C v ) < 1. Additionally, consider the effect of the clustering coefficient on the spread of the epidemic as

Thus, there are maximum decreasing effects of the clustering coefficient on the spread of the epidemic and the pricing of cyber insurance premiums.

The last two lemmas bring us to the following consequences:

There is a minimum of one or more structures on a k-connected regular graph for k = 3, � � �, n − 2 such that there is at least one node that has nonzero and not one clustering coefficient. Thus, there is an effect on a node in cyber insurance rate making with

Proof. Based on the results of Lemma 4 and Lemma 5, there is always a structure of k-regular graph for k = 3, � � �, n − 2 with the specified order n and holds the existence of a regular graph that is nk even. This is because the formation process of the k-regular graph for k = 3, � � �, n − 2 involves adding one link to the 2-connected regular graph or subtracting one link at the n − 1-connected regular graph continuously. As a consequence, at least one node in that structure with 0 < TðvÞ < k v 2 À � indicates that 0 < C v < 1. Thus applies

The three functions explaining the inhibitory effect of the clustering coefficient are defined as follows:

• The linear function is f(C v ) = −C v + 1.

• The quadratic function is f ðC v Þ ¼ À 0:65C 2 v þ 1 [33] .

• The exponential function is f ðC v Þ ¼ À ffi ffi ffi ffi ffi C v p þ 1.

Each function provides a different inhibitory effect. The choice of the operation depends on how much the community can reduce the effectiveness of the infection rate. The quadratic function represents low inhibition, the linear function represents moderate inhibition, and the exponential function represents high inhibition. The numerical studies in the following subsection consider these three functions.

The upper bound of infection probabilities was obtained from Eq (12) in Theorem 1. The three functions in Fig 6 demonstrate the influence of the magnitude of the inhibition on the upper bound. We compared the upper bound with and without inhibitory effects. Fig 7 shows the upper bound of infection probabilities for four nodes, namely, node 5, node 10, node 15, and node 20. Each node represents a different clustering coefficient. The clustering coefficients of nodes 5, 10, 15, and 20 are zero, 1 6 , 2 6 , and 3 6 , respectively.

The effect of structural characterization with local clustering coefficients is visible. The upper bound obtained by the model without the clustering coefficient is the same as that obtained by Xu and Hua (2019) [19] . It can be seen from the upper bound of each node that coincides with each other. By studying regular graphs, Antonio and Indratno (2021) [26] support the substantial effect of degrees on the model. Other factors that impact the rate of infection have not been explored in this model. However, the clustering structure can affect the speed or effectiveness of propagation, where nodes can have different infection rates [32, 33] . Through local clustering coefficients, each node undergoes an infection rate adjustment that depends on the inhibitory function. The three inhibiting parts considered earlier had different impacts on the upper bound of infection probabilities. The upper bound with the quadratic function gives a slight change compared to the upper bound without the clustering coefficient effect. Then, the linear function has a moderate impact, and the exponential function has a reasonably strong influence. Therefore, these functions represent the level of impact of the clustering structure on the speed and effectiveness of the spread of the virus. Based on the model in Eq (7), the transition probability of a node depends on the sum of the clustering coefficient functions of its neighbors. The upper bounds of the three functions have the same pattern. Node 5 always produces the highest upper bound and is followed by nodes 20, 10, and 15. Table 1 summarizes the clustering coefficients of the four neighbors of each node, the total clustering coefficients and the totals of the three functions. This fact supports the upper bound result. Node 5 produces the highest total clustering coefficient functions for linear, quadratic, and exponential functions. Sequentially, nodes 20, 10, and 15 have a total clustering coefficient in the linear, quadratic and exponential functions below node 5. This confirms that the upper bound depends on the clustering coefficient function of the neighbors. 

https://doi.org/10.1371/journal.pone.0258867.g006

We looked at the linear relationship between the total clustering coefficient function (TN) and the upper bound (UB) to prove this assertion. The outcome is depicted in Fig 8. For all three functions, the figure depicts a positive linear relationship between TN and UB, which means that while TN grows, UB grows as well. The linear relationship is given by UB ¼ a 0 þ a 1 TN: α 0 represents the intercept, α 1 represents the slope of the linear model, and R 2 represents the coefficient of determination. The coefficient of determination measures how well the independent variable can predict the fluctuation of the dependent variable. The linear relationship is powerful when R 2 is close to one. When R 2 is close to one, the linear connection is quite strong. Let R 2 L ; R 2 Q , and R 2 E be R 2 for linear, quadratic, and exponential functions. With R 2 L ; R 2 Q ; R 2 E > 0:9, the three functions have a strong relationship. As a result, TN can account for more than 90% of UB. For linear, quadratic, and exponential functions, α 0 is 0.12, 0.09, and 0.14, respectively. For linear, quadratic, and exponential functions, α 1 is 0.06, 0.07, and 0.05, respectively. The upper bound is affected more strongly by the exponential inhibition function. As a result, it is obvious that the risk of transmission is no longer homogenous (same upper bound when degrees are equal) but instead has a significant correlation with the total inhibitory function of neighbors. 

We performed simulations using Algorithm 1 to produce premiums. Cyber incident data are generated based on transmission parameters. Determining the number of simulations (n sim ) is one of the challenges of this method. We considered ten numbers of simulations. n sim = {10, 25, 50, 100, 250, 500, 1000, 1500, 2000, and 2500} to find the convergence of n sim . We ran simulations with β = 0.2 and no inhibition from local clustering coefficients to demonstrate Table 1 . Characteristics of clustering coefficients for nodes in a 4-regular graph topology (Fig 4) .

Clustering Local clustering coefficient of neighbors, total local clustering coefficient, and total inhibition function with linear (f(

https://doi.org/10.1371/journal.pone.0258867.t001 convergence. Fig 9 reveals the convergence of the Monte Carlo simulation for mean infection, mean loss, and premiums of 20 nodes. For each variable, the average of 20 nodes is also displayed. In addition, the difference (Δ) for each (n sim ) is taken into consideration. When n sim is increased, all three variables converge to the same value. At n sim = 500, all figures are convergent on average. However, divergence is still apparent for each node at n sim = 500. At n sim = 2000, each node has begun to converge. If the difference (Δ) between the number of simulations is close to or equal to zero, the percentage change (Δ) implies convergence. As seen in Fig  9 for the variables ΔInfection Mean, ΔLoss Mean, and ΔPremiums, all nodes and their averages approach zero as n sim is increased, and the simulation is considered to be convergent at n sim = 2000. Finally, for the premium set, we choose n sim = 2000 as the number of simulations. On the 4-regular graph, premiums have been modified to account for the clustering structure. The linear relationship (correlation) between the total linear, quadratic, and exponential inhibitory functions (TN) and the premium is visualized in Fig 10. For twenty nodes with linear, quadratic, and exponential functions, the correlation between TN and P is more than 0.6, suggesting a strong and moderately strong linear relationship. The correlations for the linear, quadratic, and exponential functions are 0.77, 0.66, and 0.82, respectively.

TN is a representation of two network entities: the degree and the local clustering coefficient. As a result, these findings incorporate the influence of the clustering structure on the premium. If the premium is based just on degrees, it is often homogenous. Indeed, the clustering structure influences the efficacy of epidemic propagation. This fact shows that when the effect of the inhibitory function increases or the speed of epidemic spread decreases due to the clustering coefficient, then the premium corresponds to the total inhibitory function of the neighbors. Additionally, these findings suggest the existence of a significant linear connection between UB and TN. UB has been verified as the initial premium estimate. The premiums for the twenty nodes on the four-regular graph are shown in Table 2 . Nodes 5, 10, 15, and 20 were bolded to illustrate the presence of various local clustering coefficients. The overall inhibitory function (TN) adjusted premiums in line with TN. As in the upper bound (UB), this outcome is impacted significantly by the TN in Table 1 . Premiums without clustering coefficients (without CC) are compared, along with linear, quadratic, and Table 2 . Premiums in a 4-regular graph topology (Fig 4) .

Without Premium without clustering coefficient effect, and with clustering coefficient effect using linear (f(C v ) = −C v + 1), quadratic (f ðC v Þ ¼ À 0:65C 2 v þ 1), and exponential (f ðC v Þ ¼ À ffi ffi ffi ffi ffi C v p þ 1) inhibition functions.

https://doi.org/10.1371/journal.pone.0258867.t002 Each node has four degrees in total. As previously stated, TN accounts for two network properties: degree and the local clustering coefficient. TN has a premium of approximately 12.3 units while having a value close to 4. Conversely, the TN that is less than the degree corrects the premiums by the difference between TN and degrees. Node 5, with the largest TN for linear, quadratic, and exponential functions, provides the most extensive premium in comparison to other nodes. TN decreases when the trend of inhibitory function decreases, resulting in a decrease in the premium reduction trend. Premiums with an exponential inhibition function are the least expensive option. The premiums are more realistic than when only degrees are included. Additionally, the premium is not uniform but is adapted according to the cluster structure of its neighbors. Premiums that use this strategy might be cheaper, making them more competitive in the market.

To validate the results, we used a real communication network (see Fig 11) . The real network is an email communication network. Rossi and Ahmed (2015) [43] provided communication data, which may be viewed online (https://networkrepository.com/email-enron-only.php). Table 3 link (|T| avg ), the maximum number of triangles formed by a link (|T| max ), the average clustering coefficient (C avg ), and the global clustering coefficient (C). According to these parameters, this network has 143 nodes and 623 links. With the density D equal to 0.0613, this network is classified as of extremely low density. Out of the 142 potential communications, the maximum communication (d max ) occurs only between 42 accounts (neighbors). C avg and C can be used to characterize the clustering structure of this network. C avg = 0.4339 shows that some nodes have a high local clustering coefficient, while others have a low coefficient. The clustering coefficient at the global level is C = 0.3590. This measure suggests that |T| = 2700 accounts for approximately 35.9% of all triangles constructed in this network. If we focus just on the degree component, we see that nodes with a high degree have an increased risk and premium. However, the more neighbors a node has, the less successful it is in spreading the disease. By including the clustering structure of neighbors in the model, the premium for nodes of the same degree may be rendered in-homogeneous.

On large networks, the simulation complexity increases significantly and takes ample time. Because of these conditions, we modified the transmission parameters in the simulation of a real network. We dropped the infection rate to ε = 0.05 and boosted the recovery rate to δ = 10 for each node. The modification implies that the average time to infection of a device due to clicking on malicious emails has grown to 20 days. The average time-to-recovery of a device has increased to 2.4 hours. Parameters are chosen based on the assumption that the security system for each device is more robust and the ability to recover is faster in a large company. We compare the computations with and without the inhibitory function to demonstrate the influence on premiums. n sim = 2000 was used in the calculations to account for simulation convergence.

We chose ten users from a total of 143 to highlight the significance of the findings. The node is selected based on its degree, the overall clustering coefficient of its neighbors, and its location. Table 4 summarizes the ten nodes chosen, along with the parameters that impact the premium. The nodes correspond to the degrees from greatest to lowest. Two nodes with the same degree, namely, node 3 and node 9, were chosen to demonstrate the influence of their local clustering coefficients of neighbors. As expected, nodes with a high degree also have a high total C. However, it does not occur on all nodes. For instance, node 136 with degrees 17 and node 17 with degrees 30 have the same total C. Nodes 95 and 48 have a lower total clustering coefficient than node 136 due to their degrees 23 and 20. This measure incorporates both degrees and the local clustering coefficient. Thus, nodes with the same degree do not always have the same premium as those in Table 2 or prior studies by Xu and Hua (2019) [19] and Antonio and Indratno (2021) [26] . At various rates, the inhibitory action reduces the effectiveness of infections. Quadratic functions have the highest overall value, followed by linear and exponential functions. Policy underwriters can choose these functions based on indications of cybersecurity or network requirements. For instance, the speed of data transmission is decreasing if they have a long route. To obtain a more accommodating premium for network features, we use the function resulting in a more realistic premium change than would be obtained without the clustering structure component.

The premium simulation results and 95% confidence intervals for each of the ten selected nodes are shown in Table 5 . Additionally, high-degree nodes pose a high threat. The most expensive premium is provided by node 105, which has the highest degree 42. The premium associated with the clustering coefficient demonstrates a shift by offering a lower price. Three functions quadratic, linear, or exponential are all adaptations of the function, with the fastershrinking function resulting in reduced premiums. At nodes 3 and 9, the importance of the results is immediately apparent. Both nodes have a degree of 12. Without regard for the clustering arrangement, these two nodes offer the identical premium of 2.9. (currency unit). However, after adapting to the clustering structure of its neighbors, node 3 provides a lower price. These findings are consistent with the fact that node 3 has a lower total clustering inhibition function than node 4. This approach is successful because it takes the metric under consideration so that the premium is dependent on both the degree and the clustering structure.

The premium with the quadratic function produces a minor change, whereas the exponential function produces the most difference. Additionally, the resultant premium supports the overall result of the neighbor clustering function (TN), which lowers as the function becomes quicker. The total clustering inhibition function of neighbors, which combines the degree and Table 5 . The premium of the ten selected nodes. Premiums and confidence intervals (CI 95%) for selected nodes without clustering functions, and with clustering functions (quadratic, linear, exponential). local clustering coefficient, is the crucial metric of a network for calculating the premium with this approach. Fig 12 illustrates the premium findings in the confidence interval plot. The top position of each node is always determined by the premium without the clustering coefficient, followed by the quadratic inhibition function. The exponential gives the most change of premiums. The nodes have been arranged according to their degree. In general, there are still impacts of degree, although this is not the only impact. At nodes 9 and 3, which have the same degree, both premiums and improvements using the clustering effect are different.

Additionally, the figure depicts how premium fluctuations become more significant as risk grows. The disparity between premiums with and without clustering coefficients is more critical at node 105 with the highest premium than at other nodes with lower premiums. When applied to very high-risk instances, this condition requires an adjustment factor to guarantee that the premium remains enough to cover future risks.

We provide premiums for 143 nodes to confirm the overall results. Premium boxplots of 143 nodes without and with clustering coefficients are shown in Fig 13. The boxplot findings corroborate the evidence of an improvement in the premium price model with the clustering structure. Each range of boxplots decreases when the model without CC is replaced with the model with CC using quadratic, linear, and exponential functions. Similarly, an outlier in each boxplot, namely, the best premium, shows that each function decreases. In aggregate, Fig 14 is a combination of a confidence interval plot and a bar plot depicting a network's premiums (a total of 143 nodes). These findings also corroborate earlier findings that the presence of a clustering structure might lower premiums. With clustering coefficients, the premiums for quadratic, linear, and exponential functions fell by 2.99%, 8.07%, and 11.78%, respectively. Thus, the overall premium generated by the inhibition function is lower than the premium without the clustering structure. Fig 15 shows the linear correlation between degrees (deg), the total clustering coefficient functions of neighbors (F. QUA, F. LIN, F. EXP), the premium without clustering coefficient (P), the premiums with a linear function (P. LIN), a quadratic function (P. QUA), and an exponential function (P.EXP). Degree, F. QUA, F. LIN, and F. EXP are highly correlated because they are the sum of local clustering coefficients from neighboring nodes. The distinction is in the scale of the adjacency matrix of the model. The value is now between zero and one (in the range of C and f(C)). The correlation between premiums is extremely strong, with values greater than 0.9. Premiums with local clustering coefficients are used to compensate without clustering coefficients. The correlation between premiums and degree (Deg), F. QUA, F. LIN, and F. EXP decreased from P, P. QUA, P. LIN, and P. EXP sequentially. The more quickly the clustering function decays, the stronger the connection between the premium and the inhibition function. This result means that the inhibitory function chosen affects this relationship. 

We have introduced a modified Markov-based model with a clustering structure factor in the network for premium calculations. To validate the findings, we conducted two types of experiments: regular and real networks. Additionally, theories on regular networks have been established to verify that clustering coefficients influence regular networks. Without the impact of clustering coefficients and a homogeneous rate, each node generates an equal premium. The epidemic inhibition factor was multiplied by the local clustering coefficient to modify the infection rate. As a result, this approach can provide premiums that vary depending on the inhibition function employed, which can be quadratic, linear, or exponential. The results are also significant in large networks (real networks). The correlation between the total inhibitory function and the premiums is stronger than that between the degree and the premiums. Thus, this approach calculates the premium more comprehensively since it considers two network properties, namely, the degree and the local clustering coefficient.

Our novel technique can minimize the premium depending on the features of clustering. These findings corroborate Wu and Liu (2008) [32] and Bo and Song et al. (2017) [33] , who found that the clustering coefficient decreases the efficacy of epidemic transmission. This element has been effectively integrated into the premium calculation. By giving a more realistic premium based on the clustering structure, this suggested technique can improve the Markovbased model developed by Xu and Hua (2019) [19] and Antonio and Indratno (2021) [26] . Thus, the flexibility of the proposed approach in application enables it to provide premium improvements that are not homogenous (overestimate) and are more suitable. The limitation is the inclusion of a single element impacting the efficacy of the epidemic. Indeed, the model may incorporate a wide range of other variables. Another limitation is that each node continues to perform the same function. The inhibitory properties of each node may vary.

Future research should explore the usage of diverse functions at each node. The clustering coefficient metric as a function of communication weights may be a critical element to consider in determining how epidemics spread [44] in future studies. Complexity in large-scale simulations encourages the creation of more efficient algorithms, such as a modification of the Gillespie algorithm [35] . From the perspective of mathematical modeling, the theory and application of fractional differential equations [45] to risk modeling [46] or mixed fractional risk processes [47] , particularly cyber risk, might be an attractive research area. Epidemic modeling in combination with fractal theory or sets [48] is also required to give a novel viewpoint on understanding viral transmission dynamics [49] for predicting cyber insurance claims.

Supporting information S1 Data. (XLSX) 

Defining cyber risk

State of Malware Report

COVID-19 pandemic cybersecurity issues

Cybercrime To Cost The World $10.5 Trillion Annually By 2025 Cybersecurity Ventures

Cyber insurance market is expected to grow $28.60 billion by 2026: Says AMR. GlobaNewswire Allied Market Research

Data hacks and big fines drive cyber insurance growth

Cyber risk research in business and actuarial science

Cyber claim analysis using Generalized Pareto regression trees with applications to insurance

On the limits of cyber-insurance

Copula-based actuarial model for pricing cyber-insurance policies

E-risk management with insurance: A framework using copula aided Bayesian Belief Networks

What do we know about cyber risk and cyber risk insurance?

Pricing of cyber insurance contracts in a network model

Epidemic processes in complex networks

Virus Spread in Networks

The N-intertwined SIS epidemic network model

Dynamic structural percolation model of loss distribution for cyber risk of small and medium-sized enterprises for tree-based LAN topology

Pricing cyber insurance for a large-scale network

Cybersecurity Insurance: Modeling and Pricing

Robustness of the Markov-Chain Model for Cyber-Attack Detection

Fractional-Wavelet Analysis of Positive definite Distributions and Wavelets on D 0 ðCÞ

Wavelet-analysis of network traffic time-series for detection of attacks on digital production infrastructure. SHS Web of Conferences

Wavelet-based Real Time Detection of Network Traffic Anomalies

The Impact of the Wavelet Propagation Distribution on SEIRS Modeling with Delay

Epidemics in networks with nodal self-infection and the epidemic threshold

Cyber Insurance Rate Making Based on Markov Model for Regular Networks Topology

Clustering Coefficient

Impacts of cluster on network topology structure and epidemic spreading

How Clustering Affects Epidemics in Random Networks

Modelling the spread of diseases in clustered networks

The impact of network clustering and assortativity on epidemic behaviour

How community structure influences epidemic spread in social networks

How clustering affects epidemics in complex networks

A new individual-based model to simulate malware propagation in wireless sensor networks

Algorithm and Upper Bound of Infection Mean on Finite Network

Fast Uniform Generation of Random Graphs with Given Degree Sequences

Uniform generation of random regular graphs

Maximising the clustering coefficient of networks and the effects on habitat network robustness

Graph Theory with Applications

In-homogeneous Virus Spread in Networks

Performance Analysis of Complex Networks and Systems

Loss Models: From Data to Decisions

The Network Data Repository with Interactive Graph Analytics and Visualization

Clustering Coefficients for Correlation Networks

On the Hybrid Fractional Differential Equations with Fractional Proportional Derivatives of a Function with Respect to a Certain Function

An application of fractional differential equations to risk theory

Mixed fractional risk process

Fractality, and Image Analysis

Fractal Control and Synchronization of the Discrete Fractional SIRS Model

We would like to express our gratitude to the academic editor and reviewer for their helpful comments and suggestions that helped us strengthen this article.