key: cord-0045827-6r9bw974 authors: Jankowski, Jaroslaw; Bartkow, Piotr; Pazura, Patryk; Bortko, Kamil title: Evaluation of the Costs of Delayed Campaigns for Limiting the Spread of Negative Content, Panic and Rumours in Complex Networks date: 2020-05-23 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50423-6_22 sha: 141be8a03383602a2bfb863a349f6fed026c5e20 doc_id: 45827 cord_uid: 6r9bw974 Increasing the performance of information spreading processes and influence maximisation is important from the perspective of marketing and other activities within social networks. Another direction is suppressing spreading processes for limiting the coverage of misleading information, spreading information helping to avoid epidemics or decreasing the role of competitors on the market. Suppressing action can take a form of spreading competing content and it’s performance is related to timing and campaign intensity. Presented in this paper study showed how the delay in launching suppressing process can be compensated by properly chosen parameters and the action still can be successful. Information spreading processes within the networks are usually analysed from the perspective of the performance with main goal to maximize their coverage. It can be achieved by a proper selection of initial nodes starting propagation among their neighbours. The problem defined as influence maximisation was presented together with greedy solution for finding set of nodes delivering results close to optimum [17] . Apart from greedy approach other possibilities use heuristics based on selection of nodes with high centrality measures [20] . Increasing the spread is central problem for viral marketing, diffusion of products and innovations [12] . The purpose of information spreading can be suppression of other processes [5] in case of the spread of misleading or harmful information, rumours and by marketing companies to compete with other products. One of directions is studying the factors affecting the dominance over other processes [3] . It can be used for spreading competing products, opinion, ideas or digital content in a form of memes, videos [29] or gifts [7] . Information spreading can be also used for stopping real epidemics, with the use of educational information, awareness [10] and related to the rumors and panic [25] . For planning and launching competing processes, the habituation effect should also be taken into account, because of dropping response to multiple viral marketing messages received in a short time [6] . The performance of the suppressing process is related to various factors including proper timing and campaign parameters. Presented in this paper study shows relation between delays of suppressing actions and the costs required to make them successful. It is assumed that delayed action still can be successful, if the strength of the process is increased, when compared to the harmful process. However, it is related to increased costs of the action, for example better, more appealing content, the number of seeds, more effective way of selection seeds or incentives used. The remainder of the paper is organised as follows: Sect. 2 includes the related work, the assumptions are presented within the Sect. 3 and are followed by empirical results within the Sect. 4 . Obtained results are concluded in the last section. Online platforms based on social networks created infrastructures for information spreading processes [2] . Initially activated network members can spread information to their neighbours, in the next step they can spread content to their neighbours and so on. The process continues till saturation point, when new transmissions are not possible. At the end, the process reaches some fraction of nodes or whole network. In most cases the main goal is the increasing the spread of the content to achieve a high number of network nodes influenced [27] . Spread of information can be modelled with various models. Some of approaches use models derived from epidemiology like SIS and SIR with their further extensions [16] while other use branching processes [14] . From the perspective of network structures the most often and well studied models include Independent Cascade Model (ICM) and Linear Threshold Model (LT) [17] . Both approaches were used for various applications and further extensions taking into account time factors or network dynamics [15] . Apart from increasing the process dynamics, quite opposite goals can be taken, to use techniques to block spreading information [26] . To stop epidemics, spreading information about pathogens can be used [10] . For example, several studies were carried on to study spreading information in communication networks to increase awareness [4] . Not only pathogens can be treated as harmful with the need to block spreading. Recently more and more attention is put on spreading the information and digital content which can be potentially harmful or even dangerous for target users. Processes of this type can be based on misleading information and fake news, false medical information, panic and rumours which can negatively influence audiences or promote bad behaviours. It is important to spread alternative information to suppress the negative content and prevent harmful rumours spreading [13] . Similar situation takes place on the market where companies are trying to launch viral campaign or spread information competing with other campaigns [30] . Factors affecting competition were analysed in terms of network structures for multi-layer networks [4] . Another studies take into account immunization strategies based on vaccination of nodes [28] . Another aspects are related to intervals between marketing messages because the ability to process information is limited [19] and habituation effect takes place [24] . As a result high intensity of viral marketing messages received in a short time can be treated as a SPAM [6] . Together with the need of modeling multiple processes within the networks extension of models were proposed. For example Linear Threshold model was extended towards Competitive Linear Threshold Model (CLTM) [11] . The influence blocking maximization problem (IBM) was defined. Authors assumed that positive (+) and negative (−) information spreads within the network. Network nodes can be in three states inactive, +active, and -active. Second main spreading model, the Independent Cascade Model, was extended towards Multi-Campaign Independent Cascade Model (MCICM) [5] . It assumes two campaigns spreading simultaneously within the network with competition mechanism. One of processes is treated as the primary campaign and the secondary, treated as limiting campaign, is decreasing the dynamics and the coverage of the first one. Like in the ICM model activated node had the single chance to activate it's neighbours. The objective of the study was to protect nodes from activation by the first process by activation with the second one. In this paper we analyse the role of timing for suppressing campaign and the relation between the costs of delayed campaigns and their performance. In our research we assume that costs are related to propagation probability and the number of seeds. Propagation probability can be related to incentives, samples quality or others ways to motivate users to propagate the content. Number of seeds is related to fraction of target audience selected as seed and is directly related to campaign costs. The goal is to analyse the costs required to launch successful campaign (treated as a Positive Process) even the delay in the relation to Negative Process takes place. It is assumed that Positive Process is the reaction, that's why the Negative Process starts first. Positive and negative propagation processes considered in this paper are modelled within network N (V, E) based on vertex set V = v 1 , v 2 , ..., v m and edges set E = e 1 , e 2 , ..., e n . According to the used Independent Cascade model [17] node u ∈ V is contacting all neighbours, nodes with relation represented by edge (u, v) ∈ E, within the network N and has only one chance to activate node v ∈ V , in the step t + 1 with propagation probability P P (u, v) under condition that node v was activated at time t. For our case, Independent Cascade Model is adapted to two concurrent cascades. Probability P P NP (u, v) denotes the probability that node u activates node v one step after node u is activated by a Negative Process. Probability P P P P (u, v) denotes the probability that node u activates node v one step after u is activated by a Positive Process. Two separate seed sets are used to initialise Negative Process and Positive Process. Seed set denoted by S NP ⊆ V is used to initialise the Negative Process. The ranking method R NP is used to select number of seeds according to seeding fraction SF NP . Seed set for Positive Process denoted by S P P ⊆ V such as S NP ∪ S P P = ∅ is used to initialise the negative process. The ranking method R P P is used to select a number of seeds according to seeding fraction SF P P . Every seed node s NP ⊆ S P N is activated in time t NP = t1, Every seed node s P P ⊆ S P P is activated in time t P P ⊆ T , T = {t 1 , t 2 , ..., t n }. Lets denote by A NP , t the set of active nodes A NP,t ∈ V possessing the negative information at time t, activated in time point t − 1 by a Negative Process, and by A P P , t the set of active nodes possessing the positive information A P P,t ∈ V at time t, activated by a Positive Process in time t − 1. Let's denote by set of not active Selection of nodes a NP,t newly activated by a Negative Process among all neighbours n i such as (n i , v i ) ∈ E takes place according to the formula: with probability P P NP . Selection of candidates for activations with positive process a P P takes place among all neighbours n i such as (n i , v i ) ∈ E not active or activated by a Negative Process: with probability P P P P . The sequence of steps (1) and (2) is taken randomly to deliver equal chances to positive and negative process. All newly activated nodes are included in active nodes sets for the next time point t + 1, respectively for Negative and Positive Process as A NP,t+1 = a NP and A P P,t+1 = a P P . Process is continued until no more new activations are observed. Final results are represented by coverage C ∅ with nodes not activated by any process, C NP with nodes activated by a Negative Process and C P P with nodes activated by a Positive Process such as V = C ∅ C NP C P P . To clarify the approach, the toy example presents three scenarios. In Fig. 1 a Positive Process (green) is started in the same step like Negative Process (red) and has high chance to suppress it. In the second scenario presented in Fig. 2 Positive Process was started too late and parameters where not enough to dominate Negative Process. Third scenario in Fig. 3 shows that delayed process can survive but parameters like propagation probability should be properly adjusted. It is using a simplified graph with weights assigned to edges. Information flows only if weight on the edge is lower or equal to propagation probability P P 1 for Negative Process and P P 2 for Process. Both P P 1 and P P 2 are the same to all nodes according to Independent Cascade Model. Graph has 10 nodes and 17 edges. Both processes are competing. Also the seeds selected for the spreading are given, for Negative Process it is node number 1 and for Positive Process node number 10. Last parameter is delay Di that is responsible for start suppression process (green). For this example process consists of simulation steps. Each step is divided into two stages: stage for Negative Process, and stage for Positive Process. Figure 1 -Figure 3 show how delay can affect the results of reaction depending om process parameters. (2, 3, 6) . Propagation probability (P P 1 = 0, 3) allows to infect node 2 (0, 22 < 0, 3) and 3 (0, 15 < 0, 3) but prevent to infect node 6 (0, 41 > 0, 3). (A2) node 10 starts suppressing campaign and tries to infect his neighbors (5, 6, 7, 8, 9) . P P 2 = 0, 3 allows to infect nodes 5, 8 and 9. In SIM U LAT ION ST EP = 2 cycle repeats for every node with S1 (A3) and then with S2 (A4). In this case process 2 begins to defeat process 1. In the next step suppressing process (S2) finishes with more nodes activated than process (S1). (Color figure online) Information spreading for both Negative Process (NP ) and Positive Process (P P ) are divided into simulation steps. In each step, each process has a chance to increase coverage with activated nodes contacting their neighbours. Simulation starts with choosing seed nodes according to seeds selection strategy. The spreading process starts with selection of seeds according to their ranking, R NP for Negative Process and R P P for Positive Process. Negative Process can be initialized by choosing random nodes (like in the real world, a disease itself cannot choose node while the carrier does), otherwise marketing strategies are usually based on strictly selected nodes. To try to suppress these two ways of contamination with competing process we choose three seeding strategies based on tree rankings: Random, Degree based and third Effective Degree. We choose selecting nodes by the most common centrality measure, the node degree, treated as a one of main and relatively effective heuristics for seed selection as well as a reference method [12] . Additionally we use effective version of degree, computed before launching Positive Process. It is not based on the total number of the neighbours, but on the number of nodes infected by Negative Process. Fig. 1 (A1) The experiment assumes five different sizes of seed set selected in each used network. Experiments verifies the efficiency of increasing number of seeds. Number of seeds, seeding fraction (SF NP , SF P P ) is equal to 1%, 2%, 3%, 4% or 5% and represents the percentage of nodes selected as seeds. Suppressing process (Positive Process) will start with the given delay Di. It can start at the same step or, later (delay: 0-8), to test consequences of late reaction. And finally for both of the competing strategies we give propagation probability (P P P P , (P P NP )) equal 0.1, 0.2, 0.3, 0.4 or 0.5. Propagation probability is responsible for the chance to infect node and represents propagation probability according to Independent Cascade Model [17] . During the process for each edge possible for transmission random value is dynamically generated. if it takes value lower or equal to propagation probability (different for Positive Process and Negative Process) activation of contacted node takes place. All values used for all parameters are presented in Table 1 . Performance of Positive Process can be measured with the use of several metrics. Performance Factor (P F ) is represented by a total number of nodes activated by a positive process by the number of nodes activated by negative process for the same configuration parameters. Another metrics, Success Rate (SR), represents percentage of spreading processes with winning Positive Process. Simulations were performed on five real networks N 1-N 5 UoCalifornia [23] , Political blogs [1] , Net science [21] , Hamsterster friendships [18] and UC Irvine forum [22] available from public repositories, having from 899 to 1899 nodes and from 2742 to 59835 edges. We obtained total R NP × R P P × P P NP × P P P P × SF NP × SF P P × Di × N with the total number 168,750 of simulation configurations. For each of them ten runs were repeated and averaged. The main goal is to investigate the influence of increasing the efficiency of nodes in contaminating their environment. In order to gather necessary knowledge each combination of parameters to find the most successful way to suppress spreading potentially dangerous process as compared. The loop searches through nodes activated by Positive Process, for each infected node the script is looking through its neighbours and tries to infect every neighbour who is not infected with the same disease. Nodes that are infected in this step of contamination cannot spread the disease in the same step. If Di = 0, seeds for Negative Process are selected from 'healthy' nodes and the loop repeats spreading but searches through the nodes activated by Positive Process and tries to infect with positive content every node activated by negative process or neutral. If Di > 0, Positive Process starts after Di + 1 cycles of Positive Process spreading. In this case the simulation step ends when Positive Process step ends, until the suppressing process is activated. The competing lasts until one of the strategies defeat the competitor and spread all over the network or network states stabilize. In this section, results from agent-based simulations are presented. During analysis we estimated costs of making effective delayed process. The main goal was answer the question to as far we need to increase propagation probability (P P ) to obtain certain success rate (SR) of suppressing process under varying delay steps. Figure 4 shows significance of suppression process. As we can notice, cases with no delay (Di = 0) provides the best performance of Positive Processes. Overall, for cases with no delay suppression campaign achieved 31% coverage. Subsequently differences between Positive Process and Negative Process are grown. Negative Process reached the best performance when the Positive Process was most delayed. It was analysed for eight steps of delay (Di = 8), and for this case overall negative campaign performance is 62.4%. For a more detailed evaluation of the diffusion of Positive Process we figured out three factors, presented in the Fig. 5 . Used propagation probability (P P ), seeds fraction (SF ) and networks (N ) were analysed. Propagation probability causes the biggest increment of coverage performance. Along with propagation probability, coverage performance is increasing as following: 2.00%, 7.88%, 12.10%, 15.98%, 21.44%. A similar relationships can be seen for seeds fraction (SF ) values. However there the growth of performance isn't so drastic. The following results we obtained: 9.72%, 11.10%, 12.13%, 12.86%, 13.59%. In terms of coverage performance N 3 achieves the highest performance 18.43%. The worst outcome was obtained within N 5, it 7.39%. Therefore, it is evidence that effect are with relation to network topology. For sensitivity analysis to determine the key parameters affecting coverage of the positive process the meta-modeling based on the Treed Gaussian Process (TGP) was used. Briefly, TGP is one of the significant machine learning methods, developed by Gramacy [8] . Gramacy et al. further extended TGP to be suitable for sensitivity analysis [9] . Since then, after constructing TGP models, sensitivity analysis can be used to identify key variables in the models by using the variancebased method. We used here two sensitivity indexes: first order and total effects. The first order index represents the main effect and the contribution of each input variable to the variance of the output. The total effects include both main effects and interactions between input variables. The Fig. 6A shows the slopes of the various parameters used in simulations. It provides information on whether the output, performance of the Positive Process, is an increasing or decreasing function of the respective input data. Solid lines are mean values that are within the 95% confidence interval. It was observed that the changes of four parameters SF NP , R NP , R P P , SF P P had only small influence on the output. In addition, with the increase of SF P P and R P P , the main effects caused by the increase of these variables slightly increased, suggesting that the impact of individual differences between SF P P and R P P on changes in the examined networks was slightly improved in such conditions. However, a clear improvement can be seen with the increase of P P P P . Here we see a clearly noticeable increase, which indicates a significant impact of the variable under the existing conditions. Impact of N shows that results were highly dependent on used networks. The increase in the R NP and SF NP variables indicates that the effects caused by their increase have worsened to a lesser extent. It can be clearly seen that increases in delay Di, and P P NP negatively affect the coverage of positive process This means that the variables have a negative impact and their significance deteriorates under the circumstances. In Fig. 6B first order sensitivity indicators quantify changes in output variables suitably caused by individual input variables while in Fig. 6C sensitivity indicators reflect the interactive effects of all input variables on the output variable. Figure 6B clearly shows that P P NP is the main contributor to the network coverage of PP. Di and P P P P are classified as second and third factors respectively contributing to network coverage of positive process. This differs to some extent from the individual effects shown in Fig. 6A , which can be explained by the combined effects of P P NP , Di and P P P P . The role of remaining variables is approximately the same, sharing small values of the network response coverage of positive process. The cumulative effects Fig. 6C increases when we consider the interactions between all variables, especially for P P NP , to a slightly lesser extent for Di. The sensitivity indicators are not sum to one and it indicates that interactive effects between two or more variables are important for the individual assessment. Another step of analysis includes analysis of Success Rate for different propagation probabilities of both processes. Figure 7 shows Success Rates for each pair of probabilities for Positive Process (P P ) and Negative Process (NP ). Success Rate (SR) for propagation probability 0.1 for both processes is marked with RED within the Delay 0 section. If delay takes place it was impossible to obtain same Success Rate without increasing Propagation Probability for P P . For example for Delay 4 it was possible to achieve similar Success Rate for P P P P = 0.2 and for Delay 5 with P P P P = 0.5 (reference cells marked with RED ). Success Rate values marked with GREEN are related to processes competing with Negative Process spreading with P P NP = 0.2. In similar way reference values are marked for P P NP = 0.3, 0.4, 0.5 with colors BLUE , YELLOW and VIOLET respectively. If Positive Process is delayed one, two or three steps (Di = 1, Di = 2 and Di = 3) properly increased propagation probability makes possible obtaining high Success Rate above 80%. It Positive Process starts with delay four or five steps (Di = 3, Di = 4) even with high probability only in few cases Success Rate exceeded 50%. Results show that further delay with Positive Process is resulted dropping Success Rate to low ranges. If delay is longer than five steps it was impossible to obtain Success Rate higher than 15% even if probability of negative process was at the level 0.1 and the positive process was launched with probability 0.5%. Information transmitted with the use of electronic media spreads with high dynamics. Apart from neutral or positive content online social networks can be used for information or content potentially harmful. Misleading information, rumour or textual information may cause panic and lead to bad behaviours. From that perspective suppressing information spreading processes is challenging and important direction in the area of network science, what was confirmed by earlier studies. Possible actions taken against negative content can be based on launching competing campaigns with the main goal for limiting the dynamics and coverage of negative processes. Later the action is taken it can be less effecting and stopping negative content can be problematic. Presented study showed how delays in launching positive process is influencing its performance and ability to reduce negative process. It can be achieved by proper adjusting the parameters of limiting process when compared to negative process. Increasing propagation probabilities increases the ability to cope with the negative process, even if limiting action is taken with delay at the moment when large fraction of network is covered by negative content. Future directions include the role of intervals between messages on campaign performance. Another possible areas include experiments within temporal networks and investigation of the role of changing network topology on performance of limiting actions. The political blogosphere and the 2004 US election: divided they blog The role of social networks in information diffusion Competitive influence maximization in social networks Interacting spreading processes in multilayer networks: a systematic review Limiting the spread of misinformation in social networks Viral marketing for multiple products Gifting and status in virtual worlds Tgp: Bayesian treed gaussian process models. R Package Version Categorical inputs, sensitivity analysis, optimization and importance tempering with TGP version 2, an R package for treed gaussian process models Competing spreading processes on multiplex networks: awareness and epidemics Influence blocking maximization in social networks under the competitive linear threshold model Seeding strategies for viral marketing: an empirical comparison Preventing rumor spreading on small-world networks The multidimensional study of viral campaigns as branching processes Compensatory seeding in networks with varying avaliability of nodes How to run a campaign: optimal control of sis and sir information epidemics Maximizing the spread of influence through a social network KONECT: Hamsterster friendships network dataset The limited capacity model of mediated message processing On the role of centrality in information diffusion in social networks Finding community structure in networks using the eigenvectors of matrices Triadic closure in two-mode networks: redefining the global and local clustering coefficients Clustering in weighted networks What do we really know about how habituation to warnings occurs over time? a longitudinal FMRI study of habituation and polymorphic warnings Siraru rumor spreading model in complex networks Negative influence minimizing by blocking nodes in social networks Maximizing the coverage of information propagation in social networks Immunity of multiplex networks via acquaintance vaccination Competing memes propagation on networks: a network science perspective Maximizing the spread of competitive influence in a social network oriented to viral marketing Acknowledgments. This work was supported by the National Science Centre, Poland, grant no. 2017/27/B/HS4/01216.