key: cord-0008013-uq7bpmbi authors: Rüdiger, Sten; Plietzsch, Anton; Sagués, Francesc; Sokolov, Igor M.; Kurths, Jürgen title: Epidemics with mutating infectivity on small-world networks date: 2020-04-03 journal: Sci Rep DOI: 10.1038/s41598-020-62597-5 sha: 3afe63d6233d0ab45177d612675e889001d7b543 doc_id: 8013 cord_uid: uq7bpmbi Epidemics and evolution of many pathogens occur on similar timescales so that their dynamics are often entangled. Here, in a first step to study this problem theoretically, we analyze mutating pathogens spreading on simple SIR networks with grid-like connectivity. We have in mind the spatial aspect of epidemics, which often advance on transport links between hosts or groups of hosts such as cities or countries. We focus on the case of mutations that enhance an agent’s infection rate. We uncover that the small-world property, i.e., the presence of long-range connections, makes the network very vulnerable, supporting frequent supercritical mutations and bringing the network from disease extinction to full blown epidemic. For very large numbers of long-range links, however, the effect reverses and we find a reduced chance for large outbreaks. We study two cases, one with discrete number of mutational steps and one with a continuous genetic variable, and we analyze various scaling regimes. For the continuous case we derive a Fokker-Planck-like equation for the probability density and solve it for small numbers of shortcuts using the WKB approximation. Our analysis supports the claims that a potentiating mutation in the transmissibility might occur during an epidemic wave and not necessarily before its initiation. the threshold for percolation and thus also promotes the global contagion on networks with this property 5 . We here built on this earlier modeling work for epidemics on small-world networks. We assume a network of nodes which are inhabited by individuals (hosts of the pathogen). Because of transport of individuals between nodes, infecting agents can travel with them to a connected node and infect the node's community. This process is modeled here by the established Susceptible-Infected-Recovered (SIR) model in which, for simplicity, individuals at each node are collectively in one of three states: susceptible (S), infected (I), or recovered or removed (R). Additionally, we take into account the genetic mutations and selectivity that a spreading pathogen experiences. Viruses have very high mutation rates 15 and several mutations can accumulate leading to substantial displacements in their genotype. In fact, mutation rates are so high that the evolution of a virus is interacting with its epidemiological behavior 16 . We here investigate the case that pathogens, by a series of random mutations, undergo changes of their genes that affect their infection or transmission probability. Such mutations of the infectivity occur for instance during adaptation to new environments or as response to antiviral or antibiotic drugs 17 . Recently, the possibility that an adaptive mutation during the West African Ebola epidemic increased transmission between humans was described 18, 19 . We specifically study the situation where the agent's infection probability is initially subcritical (in the sense of percolation theory) but may mutate into a state with higher, supercritical infection probability. We find that in networks without small-world links, the probability of an escape to the supercritical state remains small, even if the mutation probability is large. In contrast, if small-world links are present, a supercritical mutation occurs frequently even for moderate mutation rate. The genetic change then entails a transition to a supercritical infection probability, which in turn causes a global infection of almost all nodes of the network. In contrast to the former case of epidemics assisted by small-world links, we here obtain a regime with a high chance of complete infestation of the host species due to the escape process driven by mutations. For larger numbers of long-range links, i.e., for networks approaching the global mixing of random graphs, the sweeping outbreaks disappear again. Our model comprises a two-dimensional spatial network of nodes that adopt one of the three SIR states. A one-dimensional fitness variable is attached to each node that hosts the virus, i.e., that is in the I-state. As motivated above, we have chosen a genetically dependent fitness variable that directly affects the node-to-node transmission probability. Generally, the concept of the fitness landscape relates the genotype of an organism to its survival and reproductive success. If the fitness landscape is flat, all genotypes of a population have the same chance of replication. If however, the landscape exhibits regions of different height, with peaks surrounded by deep valleys, a population usually climbs uphill by series of small genetic changes. In our model we specifically consider the generic case where we have two peaks and one valley in the fitness space, i.e., states between the maxima are selectively disadvantageous (Fig. 1a) . Thus the non-spatial part of our model is similar to point models of stochastic tunneling through a fitness valley 20-22 . In the case of viral epidemics, the two peaks could, for example, correspond to different specified routes of transmission between individual hosts. For instance, in 23 the possibility of an evolution to a respiratory route in mammals has been studied for the potentially dangerous A/H5N1 Influenza virus. Here, as is typical for viruses 24 , the number of genetic substitutions that are required for this virus to adapt is low (at 3 to 5). In contrast, bacteria are adapting to new environments slower, which may be linked to the fact that individual mutations have a smaller effect and many mutations may be required 25 . Spatial effects on evolution in fitness landscapes have been considered in 26 and were generally found to support generation of mutants even if they are less fit. This may also lead to faster adaptation in multi-peaked fitness landscapes since the mutants explore the fitness space quicker. We here propose a model where the spatial structure is more complex by allowing local spatial contacts and, additionally, far-reaching contacts. SiR model of epidemic spreading. The SIR model is one of the simplest models for transmission of diseases in a population of individuals. Every individual is in either one of three SIR states. The transitions between the states are where λ and μ are the probabilities of infection and recovery per time step, respectively. Consider now a network of locations on which the infection spreads by specified contacts. Every node of the network contains individuals that are collectively in one of the three states. If a node is susceptible and at least one of its connected nodes is infected, it becomes infected as well with probability λ, whereas an infected node recovers with probability μ. The value of λ is determined by the states γ of the infected nodes connected to a susceptible node, see Fig. 1b . During each time step, for every S-node we calculate the probability λ(γ) for any connected I-node and let the S-node become infected according to this probability. If several connected nodes are infected, we randomly shuffle the order by which the transition is tested. Individuals in state R are immune and cannot be infected again. Note that for simplicity we consider λ and μ as probabilities for a (relatively large) discrete time step, which means that we do not obtain exponential waiting time distribution for the transitions. We initially start with a network of susceptible nodes and a single randomly chosen infected site. After a certain time, the system reaches a stable equilibrium where no infected nodes remain. The fraction of recovered nodes depends on the infection and recovery probabilities λ and μ as well as on the topology of the network. For simplicity, we set the recovery probability to μ = 1 so that an infected node is recovered or removed after one time step. www.nature.com/scientificreports www.nature.com/scientificreports/ SiR Model in a Watts-Strogatz-network. To take into account the geographical space om which the disease spreads, we consider a two-dimensional regular square lattice with N nodes and periodic boundary conditions. In such a grid every node has four nearest neighbors and the disease would spread spatially like a wave front. It is known that, in analogy to the problem of percolation on lattices 27 , for a supercritical infection probability λ > 0.5 a nonzero probability for the spreading of the infection exists 28 . Next, every link is rewired to a random node with probability p which generates shortcuts and decreases the average path length. The resulting network is a generalized Watts-Strogatz-Graph 29 with mean degree k = 4 (Fig. 1c) . For very large p close to 1 we obtain a well-mixed random network. It was shown in 5 that rewiring reduces the threshold of percolation in the network so that the critical λ is smaller than 0.5. Mutations of the Infection Probability. In many investigations of SIR on networks, the infection probability λ is a constant of the dynamical process. Now, incorporating genetic adaptation, we assume that the www.nature.com/scientificreports www.nature.com/scientificreports/ infection probability can be varied by mutations. The disease could for instance be a virus that spreads from node to node on the network and can undergo mutations of its genome. Here we assume that the individuals at each node are collectively at the same genetic state. This may appear unrealistic at first; however, it depends indeed on the respective time scales of mutations, fixations and transmissions to determine how close a population is to this scenario. Given the phylogenetic data for instance for the case of Ebola epidemic in 2013-2016, it appears that the genetic variability of the entire virus population of the outbreak is much larger than the variability at each infection site or country 30 . Following this line of thought we consider the limit in which establishment of a strain at each node occurs quickly, so that the local population is collectively characterized by a single virus strain. We will study two scenarios, one for pathogens requiring a small number of mutational steps for adaptation and one for pathogens requiring a large number of steps, where a continuous approximation may be valid: i) In a first case, we introduce a relation λ d (γ) with the mutating genetic variable γ, which can have three values: -1 in the initial state (no mutation), 0 in a neutral state (one mutation), 1 in the supercritical state (two mutations). The transmission probability λ d depends on γ as This mapping is chosen so that it has two maxima of optimal fitness. One peak is set slightly below the critical infection probability of 0.5 so that the epidemic wave typically covers at most a small finite domain. A second peak is chosen well above the critical infection probability corresponding to a well adapted genotype of the species that may typically lead to a global epidemic wave. In keeping with the valley crossing mentioned above, the intermediate state γ = 0 presents a minimum in the fitness, with which many of the nodes do not pass on the pathogen. As an initial condition we choose γ = −1.0 at all network nodes and initially infect one randomly chosen node. Then the infection probability is initially below the critical value at λ = 0.45. If a node i is in state I, the sequence variable γ i mutates in each iteration step with the rates given by the scheme where the -1, 0, 1 are the state values of γ. The parameter χ can be interpreted as a mutation rate parameter that quantifies the probability per time step of the respective transitions. The transmission probability λ(γ i ) changes according to the table above. If a node i infects one of its connected sites, the genetic variable γ i is inherited. This value is then, during the same iteration step, mutated. After an infected node transitions to the R-state, its γ i value remains unchanged. ii) In a second scenario we assume a continuous variable γ and a corresponding function λ c (γ), see Fig. 1a . In the literature on evolutionary dynamics, γ is called the sequence variable and the function λ can be viewed as the fitness landscape in the sequence space 31, 32 . Usually, the sequence space is high-dimensional, but for simplicity we consider here only a one-dimensional one. Discrete case. Given the initial infected site and its infection probability λ, an infection can generally spread through parts of the network. If λ stays constant, there is a chance that the pathogen infects small or large spatial domains, which can be mathematically described by the established percolation theory. However, as a consequence of the mutations, the infection probability changes and enables more substantial evolutions of the epidemic. If, in this process, the genetic variable γ evolves towards 1, the epidemic may be able to spread throughout the entire network since λ is then larger than the critical value of 0.5. Fig. 1d shows an exemplary evolution of the genetic variable γ where most of the network is eventually infected by the mutated pathogen (green and yellow). To quantify the effect, we let the system evolve for 10 3 to 10 6 timesteps for each parameter set. For each run we generate a new network with the given parameters. We stop a simulation when the number of R nodes does not change anymore. We then determine the fraction of those runs in which the R-state covers more than 90% of the network, i.e., the outbreak leads to almost complete coverage. This event will be called a sweep in the following. We find that the capability to sweep the network depends strongly on the structure of the network (Fig. 2a) . If no or few long-range links are present and the genetic mutation rate is weak or moderate, the infection wave quickly dies out and overall infection is rare. However, our simulations reveal that for a rewiring rate p > 0.005 a moderate mutation rate suffices to obtain a large probability of full epidemics. In these epidemics almost all nodes of the network become infected. Note that this does not mean that there are no infections spreading for small rates or without mutations. In fact, without mutations (i.e., for a constant infection probability λ ≈ 0.45) infections can spread. This can happen if, as was mentioned above, shortcuts enable the spreading of infections even if the λ value predicts a subcritical behavior for a pure grid. To discuss this in detail we now determine the share of infected nodes per run and average it over all runs. Fig. 2b shows that the average number of infected nodes is substantially larger than 0 for p > 0.005 and all mutation rates. It contrast to the fraction of sweeps, it does not vanish with decreasing χ. The difference of our scenario with mutations to this former mutationless case of SIR in small world networks is the generation of an explosive giant component in a fraction of the runs. For the usual percolation, close to the transition point, the size of a giant component is smaller than the total population. Accordingly, for small mutation rate it is found that the distribution of the fraction of infected nodes per run has a peak at a value between 0 and 1, see Fig. 3a . This means that in a given run the infection either hits the giant component at, e.g., 50% of www.nature.com/scientificreports www.nature.com/scientificreports/ all nodes, or the contagion dies out. However, for sufficiently frequent mutations, in each of our small world networks there is either an almost complete coverage (>90%) or only a tiny fraction of the nodes becomes infected and removed, see Fig. 3c . This points to a very distinctive feature of epidemic outbreaks with critical mutations: If the average fraction of infected nodes is, e.g., 30%, it could either mean that in each run around 30% of the nodes are infected, or, in an extreme case, in 30% of the runs almost all nodes are infected, while in 70% of the runs most nodes stay healthy. From a public health point of view we think that the second case is the more alarming one, since an outbreak of a deadly virus could lead to extinction of the entire population. Therefore, in our model, with short-cuts and random mutations, the outcomes and assessments of risks are different exactly in this respect that the chance of extinction rises dramatically. The reason for this different behavior in our mutating SIR model is the fact that for sufficiently large mutation rate the sequence variable γ can become 1 and settle at the global extremum of the fitness landscape. Fig. 3d demonstrates this by comparing a case of small χ (blue bar), moderate χ (green) and large χ (red). Only in the last two cases, we obtain escape from the fitness maximum at γ = −1. To understand the causes of the described phenomenon, we now analyze the scaling behavior of the percentage of sweeps in dependence on the mutation rate, χ, and the fraction of shortcuts, p. Fig. 4a presents a double-logarithmic plot in which the sweep fraction is plotted versus 1∕χ. There is a clear linear relation on the log-log scales for small χ, which is characterized by a slope independent on p. With increasing p, the lines are shifted to larger values in sweep fraction. For larger mutation rates, the linear dependence is lost and the fraction of sweeps saturates for all p. For very large p, we note that the trend reverses and much a smaller sweep fraction results for a large range in χ (red curve, p = 0.5). We will now present a theoretical explanation for the observed scaling behavior. This is made easier by splitting the phenomenology of a sweep into occurrence of a critical mutation and a following epidemic wave. We now only consider the probability that during a run at least one critical mutation occurs. Plots of this probability (i.e., the probability that during a run at least one node reaches γ ≥ 0) appear similar to those for the sweep This quadratic dependence on χ can be reasoned in the following way: The probability that at least one of the sites mutates twice to reach γ = 1 is where s is the number of sites belonging to the finite cluster activated by the first infected node. The size of the cluster is distributed according to n(s) ~ s −τ with a universal scaling exponent τ ≈ 2.05 33 . For very small χ we can further approximate P mut ≈ sχ 2 ∕2 and we obtain the mean probability for at least one supercritical mutation: This scaling with χ is the one observed in Fig. 4b ( www.nature.com/scientificreports www.nature.com/scientificreports/ 2 (7) τ − Since 4 − 2τ ≈ − 0.1 we expect a very weak dependence for larger χ, which is indeed the case in our simulation data. The fact that the percolation behavior predicts the χ dependence of the mutation fraction suggests that the increase of P mut with p is driven by the percolation transition. Finally, discussing the probability of a sweep following a mutation, Fig. 4a shows that the fraction of sweeps declines again for p larger than 0.1 even if the number of critical mutations does not shrink. An obvious scenario is that for large p the giant cluster formed by the non-mutated pathogens (γ = −1) is large and spanning the entire domain quick enough so that much of the network is sufficiently covered with the subcritical infection before the supercritical form can spread. This would particularly effect those cases in which the mutation rate is small, such that a critical mutation needs much time to develop. continuous case. For Fig. 1a and is chosen so that it has two maxima, one below and one above the critical infection probability. As before, we start with γ = −1.0 at all network nodes and initially infect one randomly chosen node. If a node i is in state I, the sequence variable γ i mutates in each iteration step by where ξ i is a Gaussian variable with mean zero and standard deviation σ. At first glance, we find a similar qualitative behavior as in the discrete case: sweeps in general occur for larger \sigma and p values, see Fig. 5a . The mutation probability scales now exponentially with several regimes. For very small noise, the mutation probability disappears exponentially with σ (data not shown). Similar to the discrete case, the slopes of these graphs do not depend on p in this regime. www.nature.com/scientificreports www.nature.com/scientificreports/ For larger noise, the scaling depends on the fraction of long bonds. Fig. 5a presents the logarithm of the sweep fraction plotted versus the inverse of σ. There appears an exponential relation for a range of approximately 4 < 1∕σ < 7 with the slopes depending on p. For larger rewiring probabilities p there emerges a slight kink for higher values of 1∕σ. This effect is due to the finite grid size. Simulations with lattices of different sizes show that the large-noise regime extends to higher 1∕σ for larger lattices (Fig. 5c) . Results for lattice sizes 200 × 200 and 300 × 300 agree for 1∕σ values up to 7, which also appear to limit the range of linear fitting. Plots of the mutation probability P mut exhibit exponential scaling, see Fig. 5b . Since we discovered a scaling different from the discrete case, an explanation of the phenomenon should be based on a more complex model. We will approximate the active cluster as a regular tree with a coordination number C (the number of subbranches attached to each node). The evolution of the distribution of γ values on the tree for a given time can then be approximated by the birth-death equation with P = P(γ, t). The expression in the square brackets denotes the local proliferation rate. As shown in the SI Appendix 1, the total probability s of escaping through the value of γ = 0 can be obtained by means of a Wentzel-Kramers-Brillouin (WKB) approximation 1 0 with r(γ) = 1 − Cλ(γ) being the local decay rate. We recover the observed 1∕σ dependence with the slope r x dx ( ) For small p the coordination number C can be determined by equating the integral to the slopes in Fig. 5b . We obtain, for instance, C = 2.06 for p = 0. Our numerical simulations of Eq. (10) showed that the exponential scaling found in the WKB approximation survives for larger C values (data not shown) even if the corresponding physically relevant WKB solution cannot be found. Thus, the model tree network can give a good explanation for the exponential scaling with mutation rate σ. A few words on the limitations of this approximation are in order. The dependence of C on the network properties or, in our case, on p, however, remains phenomenological. Additional numerical analysis shows that the described prevalence of supercritical mutations for larger p does not occur in a grid-like network. SI Appendix, Fig. S1 shows that the probability for supercritical mutations is much smaller for a local network (without short-cuts) even if the mean size of the infected clusters is similarly large as in the small-world case. This suggests that increasing the number of long links in the network helps the mutations by way of (i) increasing the percolation cluster (i.e., lowering the percolation threshold) and (ii) by lifting local blocking of mutated nodes. Thus, we conclude that, besides the increasing percolation cluster, it is the topology of the small-world network that drives the occurrence of mutations and global epidemics in our continuous model. We have shown that the interaction of epidemic transmissions of a mutating pathogen with the structure of a network can produce a transition to a regime of complete infection of a host species spatially distributed on the network. The problem has been framed in terms of stochastic escape from a subcritical transmission probability to a supercritical transmission probability. The escape to the potentially disastrous supercritical regime is facilitated by a small number of long-range links on the spatial grid of hosts. We are thus closing a loop in which transmission enables mutation and mutation enables the spreading of disease on the complete network. Our work suggests that risk assessment should be substantially different depending on whether or not mutations are occurring. Under noisy conditions (i.e., allowing mutations) infections are not extensively worse (in a statistical sense) compared to noise-free condition, as the average percentages of infected sites do not change substantially. But mutations are potentially risky, since they are likely to produce a global outbreak in a given population, which is explosive and fast. This makes the latter situation far more difficult and challenging to control epidemiologically. The effect that we have described rests qualitatively on two aspects. First, the small-world links drive the network close to or beyond the critical point for percolation. Thus, for p large enough, the infected node often initiates a large cluster, where each node can undergo a mutation and the total probability for a supercritical mutation is high. Second, the long-range connections allow the mutated form to emerge and to move freely to uninfected parts of the network, so that the mutation can quickly spread and cover large parts of the network (sweep). The latter, however, only holds if the original type has not covered the entire domain before the mutated type appears. Typically, if p is large, the speed of an epidemic wave is increased. In this situation, the non-mutated type indeed covers large parts of the network before the supercritical pathogens can propagate. Oftentimes, in this situation the non-mutated form remains sparse while covering the network so that the probability of full blown epidemics is much reduced, see Figs. 4a,b. Qualitatively the behavior we describe is similar for the cases of discrete and continuous mutational spaces. The scalings of mutation probabilities, however, are different for the two cases. While in the discrete case the probability to a supercritical mutant state scales with the mutation rate in power law form, we found exponential behavior for the continuous case. For both models, we presented analytical derivations of these relations. The scalings with the small-worldness parameter p, however, are more complex and our discussion in Appendix S2 gives a first hint on the differences of the continuous and discrete cases. For the limit of a continuous mutation space, our simulations suggest that long-range links additionally help to overcome a local blocking to the spreading of pathogens in intermediate genetic states (see Appendix S2 for more details). To fend off disease spreading, our results suggest several approaches. First, in the early stage of the epidemics, one should prevent the critical mutation. This could be achieved by a suppression of long-range links for instance by shutdown of transport hubs or preferential vaccination of regions around long-range connected nodes. In this way the development of isolated regions of the network, where the mutation can proceed without competition, is diminished. Second, after a supercritical pathogen has established, one needs to prevent spreading by removal of links around the supercritical nucleation site. Third, it is also conceivable, that, depending on the topology, an increase of p can result in the spreading of the non-mutated form, thus preventing the infection by the mutated form. This and other questions regarding, for instance, the influence of vaccinations are subject of current work on the problem. (2020) 10:5919 | https://doi.org/10.1038/s41598-020-62597-5 www.nature.com/scientificreports www.nature.com/scientificreports/ It is interesting to compare our results with the recent debate whether a potentiating mutation could occur during an epidemic or whether it usually occurs before the start of the epidemic 34 . For instance, in the case of the Ebola outbreak 2013-2016 it was suggested that one or several mutations identified during the outbreak caused the strong infectivity of the virus 18, 19 . Potentially this question might also determine the chances of a severe mutation of the Ebola virus from a droplet to an airborne transmission route 35 . It has been argued that within-host selectivity provides a much quicker adaptation than between-host selectivity since the time-scale of selectivity is much shorter within a host than between hosts. In the context of our study, the within-host scenario is one of well-mixed dynamics (large p, see Fig. 4a ), where we found a domination of the non-mutated form. Only for intermediate small-worldness do we find the mutated form to be competitive. Our results suggest that the decisive mutation benefits from the small-worldness and may thus arise during the epidemic wave on a network with the suitable properties. This finding substantiates the claims for a mutation during the epidemic 18, 19 . Finally, the relevance of our work for other network types should be studied in the future. Besides the small-worldness, other topological properties as scale-freeness and degree correlation have been shown to play an important role for epidemic spreading 36, 37 and should therefore be taken into account. We expect that a similar mechanism to the one described here for the case of epidemics may also be relevant for the related problems of spreading on networks of other phenomena such as rumors and influences. Estimated global mortality associated with the first 12 months of 2009 pandemic influenza A H1N1 virus circulation: a modelling study WHO: Ebola situation report Networks and the epidemiology of infectious disease Epidemics and percolation in small-world networks Percolation on heterogeneous networks as a model for epidemics Reaction-diffusion processes and metapopulation models in heterogeneous networks Epidemic processes in complex networks Complex networks: structure, robustness and function Unification of theoretical approaches for epidemic spreading on complex networks Networks: an introduction Spatial networks Pattern transitions in spatial epidemics: Mechanisms and emergent properties Classes of small-world networks Epidemiology and ecology of West Nile virus in sub-Saharan Africa Why are RNA virus mutation rates so damn high? Evolutionary analysis of the dynamics of viral infectious disease Virus evolution and transmission in an ever more connected world Ebola virus glycoprotein with increased infectivity dominated the 2013-2016 epidemic Human adaptation of Ebola virus during the West African outbreak Stochastic tunnels in evolutionary dynamics Rapid evolutionary escape by large populations from local fitness peaks is likely in nature Rare events in population genetics: stochastic tunneling in a two-locus model with recombination The potential for respiratory droplet-transmissible A/H5N1 influenza virus to evolve in a mammalian host The evolution and genetics of virus host shifts Understanding the emergence of bacterial pathogens in novel hosts Tipping the mutation-selection balance: Limited migration increases the frequency of deleterious mutants Spread of epidemic disease on networks Collective dynamics of small-world networks The evolution of Ebola virus: Insights from the 2013-2016 epidemic Evolutionary dynamics Empirical fitness landscapes and the predictability of evolution Introduction to percolation theory Did a single amino acid change make Ebola virus more virulent? Understanding ebola virus transmission Infection dynamics on scale-free networks Edge-based epidemic spreading in degree-correlated complex networks This paper was developed within the scope of the IRTG 1740/TRP 2015/50122-0, funded by the DFG/FAPESP. J.K. was supported by the project RF Goverment Grant 075-15-2019-1885. S.R., A.P., F.S., I.M.S. and J.K. designed the study, interpreted the data and wrote the main manuscript text. S.R. and A.P. created the software. S.R. F.S. and I.M.S. derived the equations and analyzed the solutions. I.M.S. and S.R. wrote the supplementary text. The authors declare no competing interests. Supplementary information is available for this paper at https://doi.org/10.1038/s41598-020-62597-5.Correspondence and requests for materials should be addressed to S.R.Reprints and permissions information is available at www.nature.com/reprints.Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.