key: cord-0798623-tg8t9jf0 authors: Li, Chentong; Liu, Jiawei; Zhang, Yingying; Lei, Huan; Xu, Jinhu; Rong, Yao title: A Markov based model to estimate the number of syphilis cases among floating population date: 2022-01-24 journal: Infect Dis Model DOI: 10.1016/j.idm.2022.01.001 sha: 4bd6a898d53b42b79a2ef1d70daa9304c5c02464 doc_id: 798623 cord_uid: tg8t9jf0 Syphilis is a sexually transmitted disease that spreads widely around the world, infecting tens of millions of people every year. In China, syphilis not only causes more than 1 million infections every year, but also has its own characteristics in spreading pattern: this disease always spreads with the migration of floating population. There have been many related investigations and studies on the transmission of syphilis with the floating population in China, but the study of quantitative modeling in this field is very limited. In this paper, based on the Markov process model and datasets collected in Zhejiang Province, China, we construct a new model to analyze the transmission and immigration process of syphilis. The results show that immigrant patients are one of the sources of infection of syphilis in Zhejiang province, and the infection rate is remarkable which should not be ignored. By using the PRCC method to analyze the relationship between parameters and infected cases, we also find two main effective measures that can control the spread of syphilis and reduce the infection rate: the self-attention of infected persons, and the use of sexual protection measures. With the increasing frequent exchanges of people among different countries and regions, studying the transmission of diseases with the floating populations has become more and more important. The method we use in this paper gives a new insight studying this issue, providing a quantitative research method using the data of diagnosed cases. All the methods and models in this paper can be extendly used in the studies of other diseases where immigrant patients should be considered. Syphilis is a worldwide sexually transmitted disease caused by the bacterium Treponema pallidum. In 2015, there were about 45.4 million people infected with syphilis in the world (Vos et al., 2016) , of which 6 million were newly infected cases (NewmanJane Rowley et al., 2015) , resulting in about 107,000 deaths (Wang et al., 2016) . In mainland China, the first case of syphilis was found in the year 1505. By 1950, 84% of sexual workers, and 2e3% of rural residents had suffered from this disease (Chen et al., 2007) . In the 1960s, the Chinese government leaded an elimination campaign of sexually transmitted disease, so syphilis and other sexually transmitted disease were virtually eradicated (Chen et al., 2007) . However, with the relaxation of government prevention and control measures, new cases of syphilis appeared in 1979 (Chen et al., 2007) . And in 2015, the annual incidence rate of syphilis in China increased to 9.81 per 100,000 people (Wong et al., 2018) . Nowadays, syphilis is regarded as a Class B infectious disease in mainland China and the Ministry of Health of China has established a national-wide surveillance system to control this disease. In China, sexual workers are one of the main sources to transmit sexually transmitted disease, such as AIDS and syphilis (Li et al., 2009; Zhou et al., 2020) . Many of them come from rural areas of poor provinces, and become sex workers in the immigrant workers' gathering area of big cities in rich provinces, such as Zhejiang Province (Ma et al., 2017; Xiao et al., 2013; Zhou et al., 2020) . The survey data on Ningbo city, Zhejiang province, shows that 81.09% of sexual workers comes from other provinces (Jiang et al., 2012) . Another source of syphilis are the immigrant workers, who also come from rural areas of poor provinces and work in the cities of rich provinces. Many of them engage in sexual relationships with sex workers or other non-marital partners (Yang et al., 2014) , which accelerates the spread of syphilis and increases the difficulty of prevention and control of this disease. With the rapid development of China's economy and the increase in population mobility, the problem of infectious diseases carried by the floating population cannot be ignored. Some survey works (Shen & Pang, 2008; Xie, 2000) show that the floating population could accelerate the spread of many infectious diseases, such as malaria, AIDS and hepatitis B. The latest data show that those big cities or rich provinces with lots of immigrant workers (such as Beijing and Zhejiang Province) are also the places where new cases of syphilis are found (Tucker & Cohen, 2011) . Although there are many data investigations and studies on the transmission of diseases with the floating population, few works have constructed models to quantitatively analyze how the floating population could influence the transmission of disease, such as the syphilis. To compensate the limitation of theoretical models, in this paper, a method based on Markov process is introduced to estimate the floating population. Based on Markov process and Bayes' theorem, this method could generate the estimated number of infections of a disease over time (Sweeting et al., 2005) . This method has long been used to estimate the HIV incidence rate with the AIDS diagnosed cases (BrookmeyerMitchell et al., 1994) . In the work of Birrell et al. (Birrell et al., 2012) , the authors applied this method to the new HIV, AIDS diagnoses with the help of data on CD4 counts in England and Wales. Using this method, the authors characterized the time-varying distribution of the time between infection and diagnosis, and also estimated the HIV incidence. Futhermore, in the subsequent work, Birell and his coauthors (Birrell et al., 2013) used this method to analyze the spread of HIV among the nationwide population of UK. Based on Birell's work, Sun et al. (Sun et al., 2020) further estimated the cases of HIV and AIDS within different ranges of CD4 counts in Guangxi province, China. Overall, in this paper, we improve the traditional method and introduce a novel Markov model which can estimate the number of infected patients among the floating population. Based on this method, we divided the people newly diagnosed with syphilis into two parts: people infected in the working province and those infected in other provinces. Based on the Markov processes of those people who were infected outside the working province and the infection pattern of the syphilis, we construct a new model to describe the infection population in the floating population. In this paper, we refer Zhejiang province as the working province. Using the likelihood function and the data on diagnosed cases in Zhejiang province (Zhejiang; Zhejiang provincial burea, 1525), we estimate how the floating population can affect the spread of syphilis in Zhejiang Province. In actual calculations, we use the data from January 2009 to October 2020. The samples after October 2020 are dropped to reduce the influence of control strategies of COVID-19 on our model. In this section, we mainly describe how the model is constructed and show the likelihood function used in model fitting. Firstly, a Markov model which can describe the probabilities of the immigrant patients' behaviors is constructed. Based on this model, we find an equation to calculate the expected number of diagnosed cases in each month. Then based on the Poisson distribution, a likelihood function that could link the expected number of new cases and the real data is established. Finally, by the standard estimation parameter method and the likelihood function, we fit the model. All of the parameters and variables that used in this section are listed in Table 1 and Table 2 . The state probabilities of one person infected out of his working province but finally diagnosed in the working province at time t are made up of three parts:(1) the probability that a person is infected outside his working province but has not immigrated to this province; (2) the probability that a person was infected outside his working province and has already immigrated to the working province but not been diagnosed; (3) the probability that a person was infected outside his working province and has been diagnosed in the working province. Let p 0 (t), p 1 (t) and p 2 (t) denote these three probabilities. Then the equations can be written as, 8 > > > > > > > < > > > > > > > : dp 0 ðtÞ dt ¼ Àap 0 ðtÞ; dp 1 ðtÞ dt ¼ ap 0 ðtÞ À dp 1 ðtÞ; dp 2 ðtÞ dt ¼ dp 1 ðtÞ; (1) where a is the immigration rate, and d is the diagnostic rate. The diagram of that process is shown in Fig. 1 . Many of the immigrant workers or sexual workers tend to work in the working province for a long time (Yang et al., 2014) . Thus in this Markov process, we don't consider the case where the person migrates back. When the person is infected at time 0, the initial conditions can be given as p 0 (0) ¼ 1, p 1 (0) ¼ 0, and p 2 (0) ¼ 0. Thus, the solutions of equation (1) Therefore, the probability M of N person who come from other place and finally diagnosed at Zhejiang province after the time t could be given as, which is a binomial distribution with the expectation Np 2 (t). This expectation and similar binomial distribution process with the same expectation formula will be used in the following equations. One person who is infected with syphilis will hardly self-recover from this disease until the person is diagnosed and treated by the specific therapies (Clement et al., 2014) . In China, syphilis is a national legally classified B infectious disease and all of the newly diagnosed syphilitics must be reported to the CDC (China, 2018). Based on this situation, the expected number of newly infected people in the patients' working province at time t can be written as the sum of the expected numbers of people who are infected by (1) the diagnosed and not healed people, (2) the newly incoming undiagnosed immigrants, and (3) the other undiagnosed people at time t: where g t,t ¼ e Àg(tÀt) is the probability that one person is diagnosed at time t, but will not be healed or migrate to another province at time t, and g is the rate of infection ability loss due to either recovery or migration. l(t) is the number of immigrants who have already been infected outside their working province at time t. And f t,t ¼ e Àd(tÀt) is the probability that one person is already infected at time t but not diagnosed at time t, and b is the infection rate, respectively. The term S(t) is the number of susceptible people in the immigrant patients' working province at time t and this comes from the population data in the Zhejiang Statistical Yearbook (Zhejiang provincial burea, 1525). Since the statistical yearbook includes only annual data, in order to get the monthly data, we assume that the population increase in each month is the same in the actual calculation, and then use the population interpolation of the adjacent two years to calculate the population of each month. Meanwhile, the number of susceptible people (ten million) is much larger than that of syphilis cases (thousands), thus in the actual calculation, the change in the number of susceptible people caused by infected cases is ignored. The function m(t) represents the expected number of diagnosed people at time t and can be written as the sum of the expected numbers of residents and new immigrants who are diagnosed at time t: Within the month around the Spring Festival holiday (Jiang et al., 2015; Yao et al., 2019) , many workers including the sex workers will go back to their hometown and stop their work until the end of the Spring Festival. So, the infection rate b will become smaller during that period. Thus, we could rewrite the infection rate as p(t)b, where pðtÞ ¼ p; if t is the month around the Spring Festival holiday; 1; others; and 0 < p < 1 is the parameter to evaluate the effect of the Spring Festival on infection rate. Based on the Poisson distribution, the probability of D(t) cases at time t given the model parameters could be written as, where q ¼ {l(t), a, b, d, g, p, m(t), I(t)} is the set of parameters and terms in model (3), N is the number of data points, and t i , i ¼ 1, 2, …, N are the time points in data, respectively. Hence the likelihood function can be written as, where log G(x) is the log Gamma function. Based on equation (4) In the numerical calculation, we use the date 12 months before the present time to replace the negative infinity in the integral formulas (2) and (3), so that we can get an approximate estimation while increasing the calculation efficiency. By that approximate estimation, the samples collected from January 2009 to December 2009 are used as the known diagnosed cases and the prior of the I(t) (which is the initial values of the estimation), and formula (4) is used to fit the other 9 years' samples. p 0 (t) The probability that a person is infected outside his working province but has not immigrated to this province. p 1 (t) the probability that a person was infected outside his working province and has already immigrated to the working province but not been diagnosed. p 2 (t) The probability that a person was infected outside his working province and has been diagnosed in the working province. The number of immigrants who have already been infected outside their working province at time t. The number of the susceptible person at the patients' working province at time t. I(t) The expectation number of newly infected cases at the patients' working province at time t. The expectation number of newly diagnosed cases at the patients' working province at time t. D(t) The data of newly diagnosed cases at the patients' working province at time t. The function l(t) is considered as a piece-wise function and is assumed as a constant in every three months (Jan to March, April to June, July to September, and October to December have different constants in different years). All of the source codes are available at the authors' GitHub (https://github.com/ChentongLi/Migration-population-estimation). One of the data sources is the official website of the Zhejiang Provincial Center for Disease Control and Prevention (Zhejiang). According to the requirements of China's infectious disease prevention and control, the CDC in each province needs to count and announce the number of cases infected with Class A and B infectious diseases each month, and syphilis is a Class B infectious disease. Another source on the number of susceptible people S(t) comes from the official website of Zhejiang Provincial Statistics Bureau (Zhejiang provincial burea, 1525). In China, the statistical bureaus of each province announce the number of residents in the previous year at the beginning of each year. per health person per month in Zhejiang province and its 95% confidence interval is (1.43 Â 10 À9 , 1.92 Â 10 À9 ). Considering the whole population of Zhejiang province (around 30 million residents (Zhejiang provincial burea, 1525)), although the infection rate b is small, it can still cause more than 30,000 diagnosed patients per year. Disease prevention and control departments need to consider measures such as the distribution of condoms to reduce the infection rate and control the spread of syphilis. The mean value of the parameter p, which is used to evaluate the Spring Festival effect, is 0.12 (with the 95% confidence interval (0.03, 0.20)). It shows that the Spring Festival has played a role in reducing the spread of syphilis in China, because people at this time seldom have sexual relationships with sex workers. The mean value of the parameters a, d and g of the Markov process (1), are 9.95 per month (95% confidence interval (6.05, 13.88)), 1.73 Â 10 À2 per month (95% confidence interval(1.64 Â 10 À2 , 1.79 Â 10 À2 )), and 1.92 per month (95% confidence interval (1.15, 2.71)), respectively. These parameters reflect the immigration and infection process of syphilis, and can be used in other related works on simulating the immigration of workers and syphilis transmission among healthy people. The partial rank correlation coefficients (PRCC) between the model parameters and the mean infected cases from the year 2010e2019 (MI) are also calculated in this paper shown in Fig. 4 . The samples for calculating the PRCC are collected from the 95% confidence intervals of the model parameters a, b, d, g and p, which are generated from the MCMC method. The result shows that the parameters d and g have negative effects on the mean infected number, while the other parameters have positive effects. And the parameter p, which is a value to reflect human behavior, has the greatest impact on the mean infection number over these years. This means that the most effective way to control syphilis is that everyone tries to control the behavior of themselves. The second greatest factor in disease control is the infection rate, which means measures to decrease the infection rate, such as the use of condoms, can have great impacts on the control of syphilis. The estimated results of the number of patients infected in Zhejiang province are shown in Fig. 5 . The black line shows the mean values in each month and the sky-blue shadow shows the 95% confidence intervals. This figure shows the transmission of syphilis in Zhejiang province. Compared with other years, more people were infected in Zhejiang province in 2010 and 2011. From the year 2012e2019, the number of infected people in each month doesn't show an obvious difference and the differences between years are narrow. The figure also shows the Spring Festival leads to fewer infection cases since at this time people pay more attention on their behavior. The estimated results of the number of the immigrants who were infected out of Zhejiang province and diagnosed in Zhejiang province in every three months (the piece-wise function l(t)) are illustrated in Fig. 6 . The distribution of the immigrant syphilis patients shows no clearly regular pattern. The first three months of each year always touch the lowest C. Li, J. Liu, Y. Zhang et al. Infectious Disease Modelling 7 (2022) 243e251 number, and the number of patients in the second half of each year is always larger than that in the first half. The number of infected people in the warm seasons is sometimes larger than that in the cold seasons. The peak of the number of immigrant syphilis patients is at the period from October to December of the year 2018, with the mean value 3535 and the 95% confidence interval (2641, 4437) . Compared the result shown in this figure with the number illustrated in Fig. 5 , immigrant patients are one of the main sources of syphilis in Zhejiang province, hence the national disease control department needs to carry out effective actions to supervise these key populations (especially the immigrant sex workers). In this paper, based on a Markov process method and the data on the diagnosed cases of syphilis in Zhejiang province, we mainly estimate the possible number of syphilis patients among the floating population. The result shows that immigrant patients are the main source of syphilis in Zhejiang province, implying that the government should pay more attention to the immigrant sex workers. From the estimation result of the parameters, we find that the Spring Festival has an effect of decreasing the transmission of syphilis. We also find that the infected cases in Zhejiang province stay as an almost stable distribution among different months. By the PRCC analysis, two main measures can be considered to control syphilis: (1) the The number of new coming with syphilis resident on each time interval infected people pay attention to their behaviors and (2) the usage of practical ways to reduce the infection rate, such as the wide use of condoms among sex workers. By using the Markov process and infection equations to describe the immigration and infection processes of the syphilis patients, the model is constructed within a floating population. This method can estimate not only the number of infected cases but also the number of floating patients via the data of diagnosed cases, which improves the estimation ability of the traditional method. With the rapid development of modern economy and the increasing frequent exchanges of people among different countries and regions, studying the disease status within the floating populations has become more and more important. And the method we use in this paper can give a new insight studying this problem, providing a quantitative research basis via the diagnosed cases. All of the methods mentioned in this paper can be extendly used in other disease studies where immigrant patients should be considered. Other than the on-site investigation of the spread of diseases and the patients in the floating population (Jiang et al., 2012) , a quantitative analysis on syphilis has been carried out. The number of new infection cases and patients among the immigrant population, as well as the transmission and immigration parameters have been estimated in this paper by using the data on diagnosed cases in each month. These quantitative values can be used to guide how to control diseases more accurately in reality and deepen people's understanding of diseases. Compared with the model in (Sun et al., 2020) , we have innovations in model and likelihood function construction in terms of calculation details. Based on the Markov process model, we have not only considered the factor of floating population but also modified the likelihood function to make it more in line with the issues considered in this paper. For the model described in this article, the environmental factors in the spread of the disease are ignored because it is a sexually transmitted disease. However, for air-transmitted diseases such as influenza, and vector-transmitted diseases such as dengue fever, these diseases are sensitive to the effects of temperature and humidity, and can also migrate with the migration of the population. Therefore, to meet the more practical needs for disease studies, a more comprehensive model incorporating environmental factors should be considered in future work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted. Mcmc techniques for parameter estimation of ode based models in systems biology Estimating trends in incidence, time-to-diagnosis and undiagnosed prevalence using a cd4-based bayesian back-calculation Hiv incidence in men who have sex with men in england and wales 2001e10: A nationwide population study. The Lancet Infectious Diseases AIDS epidemiology: A quantitative approach Syphilis in China: Results of a national surveillance program Treatment of syphilis: A systematic review Aerosol composition and sources during the Chinese spring festival: Fireworks, secondary aerosol, and holiday effects Investigation on the prevalence of aids among female sex workers in a supervision place in ningbo city, zhejiang province. Disease Surveillance Characteristics and determinants of sexual behavior among adolescents of migrant workers in Shanghai (China) Consistent condom use and its correlates among female sex workers at hair salons: A cross-sectional study in zhejiang province Global estimates of the prevalence and incidence of four curable sexually transmitted infections in 2012 based on systematic review and global reporting Analysis on the characteristics of infectious diseases among floating population in huzhou city, zhejiang province Declining trend in hiv new infections in guangxi, China: Insights from linking reported hiv/aids cases with cd4-at-diagnosis data Bayesian back-calculation using a multi-state model with application to HIV China's syphilis epidemic: Epidemiology, proximate determinants of spread, and control responses Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990e2015: A systematic analysis for the global burden of disease study Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980e2015: A systematic analysis for the global burden of disease study Distribution of reported syphilis cases in south China: Spatiotemporal analysis Association rules based approach to aids knowledge acquisition among female sex workers and its relationship with demographic characteristics Epidemiological characteristics of infectious diseases among floating population in zhejiang province from 1997 to 1998 Reflections on the migrant workers'sexual morality in the process of urbanization: Overview on the migrant workers non-martial sexual behaviors The effects of firework regulation on air quality and public health during the Chinese spring festival from 2013 to 2017 in a Chinese megacity Zhejiang provincial bureau of statistics. Statistical yearbook of zhejiang province The prevalence and correlates of oral sex among low-tier female sex workers in zhejiang province