key: cord-0878635-7s622m2h authors: Hu, Xue-qin; Quirchmayr, Gerald; Winiwarter, Werner; Cui, Meng title: Influenza early warning model based on Yunqi theory date: 2012-04-02 journal: Chin J Integr Med DOI: 10.1007/s11655-012-1003-4 sha: 6cc3b34d1d43c90f8fb6f8b02c68a0df8ffb7f20 doc_id: 878635 cord_uid: 7s622m2h OBJECTIVE: To establish an early warning model to simulate the outbreak of influenza based on weather conditions and Yunqi theory, an ancient calendar theory of Chinese medicine (CM). METHODS: Tianjin, a northeastern city in China, was chosen as the region of research and applied the influenza-like illness attack rate (ILI)% as the baseline and warning line to determine the severity of influenza epidemic. Then, an influenza early warning model was constructed based on the theory of rough set and support vector machines (RS-SVM), and the relationship between influenza and meteorology was explored through analyzing the monitoring data. RESULTS: The predictive performance of the model was good, which had achieved 81.8% accuracy when grouping the obtained data into three levels that represent no danger, danger of a light epidemic, and danger of a severe epidemic. The test results showed that if the host qi and guest qi were not balanced, this kind of situation was more likely to cause influenza outbreaks. CONCLUSIONS: The outbreak of influenza closely relates to temperature, humidity, visibility, and wind speed and is consistent with some part of CM doctrine. The result also indicates that there is some reasonable evidence in the Yunqi theory. Yunqi theory is a unique theory that ancients studied the relationship between climate change and disease. Guided by the thought of correspondence between human and nature, it associates the meteorological changes of nature with the state of the human body to predict the occurrence of disease in the human body and proposes a cyclical model for the prevention and treatment of diseases. (1) Since the severe acute respiratory syndrome (SARS) epidemic in 2002, some scholars used this theory to test and verify the trend of SARS. Gu (2) summarized that the drought situation and the high temperature in 2000 were the two main climate factors that caused the epidemic outbreak. This phenomenon is consistent with the prediction of "an epidemic will develop in three years" in Yunqi theory. Since then, this theory has once again aroused people's interest and attention. In this paper, in order to build the meteorological prediction model of influenza based on Yunqi theory, we will briefl y introduce two related research background facts about our work. Influenza is a highly contagious infectious disease, with the characteristics of sudden onset, rapid spread, serious infection, and regional groupoccurring. In this paper, influenza was diagnosed based on the World Health Orgnization (WHO) definition of influenza-like illness (ILI), which states that "fever 100 °F (37.8 ℃), oral or equivalent, and cough or sore throat, and in the absence of other laboratory confirmed evidence." Previous research on the health effects of climate change has mainly focused on the impact of meteorological indicators such as temperature and humidity. However, in this paper, we added a new indicator to our early warning model, i.e., the relationship between host qi and guest qi based on Yunqi theory. So far there exist 441 network laboratories and 556 national infl uenza surveillance sentinel hospitals in China, and the surveillance network has covered all provinces. Physician-diagnosed ILI is considered to be a widely applicable alternative to infl uenza virus isolation. However, techniques for the detection of Date infl uenza viruses have not been widely used at clinical hospitals in China. Therefore, this paper applied the ILI attack rate (ILI%) as the baseline and warning line to determine the severity of influenza epidemic. We hoped to do the early infl uenza forecast by observing the warning line. In this paper, the data on ILI% were obtained from the Chinese Center for Disease Control and Prevention (CDC), and the duration of monitoring was from October 3, 2005 to September 20, 2009 (a total of 207 weeks). Number of cases of ILI Number of cases of outpatient and emergency When cases that met the case definition of ILI were found, designated staff within China CDC's medical institutions should fill the case report forms for infectious disease within the specified time limit. Therefore, the data had better accuracy and representativeness. The epidemic curve of weekly reported ILI% is shown in Figure 1. continuous attributes directly; therefore, we had to carry out the classification of ILI% first. We set the baseline value and warning line value as the dividing line between groups and classifi ed the whole infl uenza data into three groups: light, moderate, and serious. The baseline and warning line settings were as follows. We did normality tests on the ILI% data during the epidemic periods of 2005-2009 (November to April of next year) by PASW statistics 17.02; if they met features of the normal distribution, we chose ±2s as the warning line; if not, we used the 75th percentile value as the warning line. Because the distribution of ILI% was abnormal, we took 11.8% (P75) as the baseline. The same principle was applied to the establishment of baseline values. First, the ILI% data in the non-epidemic period (May to October) was tested for normality; if they met the normal distribution, ±2s was chosen as a baseline value; if not, we took the 75th percentile as the baseline value. In this paper, we took 17.7% (P75) as the warning line value through statistical analysis. A non-epidemic period was under the baseline, a serious epidemic was beyond the warning line, and between these two lines was a light epidemic period. The groups of infl uenza data are shown in Table 1 . Meteorological data were downloaded from the National Meteorological Information Center (http://cdc. cma.gov.cn/index.jsp), which were the daily monitoring data in the same period, including mean temperature, dew point, mean humidity, mean sea level pressure, mean visibility, and mean wind speed. A total of 14 meteorological factors have been selected in this paper. In order to correspond with the weekly report of ILI%, meteorological elements were processed into weekly averages as well. Rough set (RS) theory could not handle the According to the Yunqi theory, one year can be divided into six seasons, and each season has two qi names: host qi and guest qi. (3) There are three types of relationships between these two kinds of qi, which are listed in Table 2 . We took the Yunqi situation of the year 2006 as an example that is listed in Table 3 . Relationship between host qi and guest qi Type The host qi and guest qi were balanced 0 The host qi and guest qi were not balanced and antagonistic -1 The host qi and guest qi were not balanced but still in well-being 1 early 1980s, (4) is a mathematical tool to deal with vagueness and uncertainty. One of the main advantages of RS theory is that it does not need any preliminary or additional information about data. Attribute reduction is a key problem in RS theory and its applications. A reduct is a minimal attribute subset of the original data that has minimal redundancy but still has the same discernibility power as all the attributes in the RS. (5) In this paper, we had 14 conditional attributes, and some of them were redundant attributes. Therefore, we used attribute reduction based on RS theory to acquire the minimal attribute subset that included six attributes. It improved the classification efficiency of support vector machine (SVM) when we used the minimal attribute subset as the input nodes of our model. Meteorological factors are continuously changing values within a certain range; therefore, these data have to be discretized first. The equal frequency binning technique was used to do the discretization of the raw data. The interval number 3 was used to get the break point set of the decision table. A genetic algorithm was implemented to do the attribute reducts. The algorithm described by Øhrn, et al (6) was used, which supported for both cost information and approximate solutions. The algorithm's fi tness function f(B) is where S is the set of sets corresponding to the discernibility function. The subsets B of A were found through the evolutionary search driven by the fi tness function. The parameter α defi nes weighting between subset cost and hitting fraction, while ε was relevant in the case of approximate solutions. The result of attribute reduction is listed in Figure 2 . After consulting experts and combining the results with the expertise, we selected the fi rst reduction as the set of attributes that included six attributes. It is shown in Figure 2 that Chinese medicine (CM, Yunqi theory) was included in each set of attribute reduction, from which could be conjectured that these two events might have a certain correlation. Based on the result of attribute reduction, 104 decision rules generated by the genetic algorithm were acquired. By the method of basic filtering, the rule set was reduced to 27 rules. Three examples of rules for decision-making are listed in Table 4 . As seen in Table 4 , influenza epidemics had certain characteristics, namely low temperature, low visibility, proper humidity, low wind speed, and the host qi and guest qi not being balanced but still in wellbeing. The partial results of our studies were consistent with other studies. For example, a study conducted by Lowen, et al (7) indicated that influenza virus transmission was dependent on relative humidity and temperature, especially the relative humidity. Based on our test, we summarized that the visibility and wind speed played important roles in the infuenza epidemic. Recently, as the Chinese economic has been booming, the air pollution of the city got worse. The low level of visibility became gradually one of the accelerators to the respiratory system diseases. For example, Brunekreef, et al (8) found that exposure to pollutants such as airborne participate matter and ozone has been associated with increases in mortality and hospital admissions due to respiratory and cardiovascular diseases. Furthermore, we also found that the low wind speed would accelerate the flu epidemic. Probably, the low wind speed slows down the air fl ow and leads to a higher concentration of virus. In this paper, RS was used for information preprocessing to get the attribute reduction set, and then SVM was applied to do infl uenza classifi cation and predication. During network training, the attribute set obtained by RS reduction constituted the input layer of the network. The individual steps of infl uenza prediction were described in the following fi ve sections. We had a new set D SVM according to the new attribute set and the corresponding raw data D RST . Then, we normalized the new attribute set with , where x represented the normalized attribute value and x i represented the attribute value before normalization. Max(x i ) and min(x i ) were the maximum and minimum values of the ith attribute. Sixty-three percent data were randomly selected from D SVM as training set D SVM_Tr . Cross-validation was used to train the data, the penalty factor was set C=5, and radial basis function was chosen as the kernel function. The remaining 37.0% data formed the test set D SVM_Ts . It was classified by the SVM classifi er to output the classifi cation results. The classification prediction test was done on the ROSETTA platform based on RS theory. According to the process described above, the classification prediction on the Matlab7.8.0 platform was completed. Finally, an accuracy of 81.8% (63/77) was achieved. This was better than the result of RS if we did the prediction on the ROSETTA platform directly. The result of RS-based prediction only reached 61.3%. Specifi c classifi cations of the groups are shown in Figure 3 . The prediction accuracy for each group is listed in Table 5 . Our experimental results suggest that the prediction accuracy of RS-SVM was not very high even if it was higher than RS. When analyzing the reasons, we found that scarce (and possibly noisy) data might be the reason. Here, we focused on the second and third groups of data to analyze the cause of classification errors. The analysis results suggest that there were some objective reasons that led to such classifi cation errors. For example, erroneous data item No. 72 belonged to the third group actually; however, our model forecasted it as the second group. We checked the date of this week, which was from May 14, 2007 to May 20, 2007 . We also checked the meteorological records of this week to find that the maximum wind speed was higher than usual for this period, and the instantaneous wind speed reached 50 km/h or more in 4 consecutive days. We checked the weather records at that time and found that a number of dust storms have occurred during this week in Tianjin. Therefore, we speculate that this particular error has a certain relation with the dust storms. In addition, by looking at the data for the erroneously classifi ed items 22 and 26, we suspect that these errors may be caused by the clinician's diagnosis errors since the diagnosis of influenza is easily confused with the common cold or other upper respiratory tract infectious diseases. The low reported infl uenza data will lead the RS-SVM model to misjudge the second group as the third. In summary, the main work of this paper collected the continuous ILI% data and related meteorological data from Tianjin city for 5 years. Then, the relationship between influenza and weather was analyzed based on RS theory. Finally, (1) From the results of RS attribute reduction, we draw the conclusion that ILI% had a close relationship with mean temperature, mean humidity, minimal visibility, maximal wind speed, and mean wind speed. It was also found that influenza spread and Yunqi theory also have a very close relationship, which will help to develop the theory of CM. (2) Some decision rules were established for influenza epidemics according to the meteorological factors. The results showed, for example, that when mean temperature was below 4.6 ℃, mean humidity was between 64 and 87, maximal wind speed was below 18.7 km/h, mean wind speed was below 7.2 km/h, and the host qi and guest qi were not balanced but still in well-being, there was a high possibility that a serious infl uenza epidemic season would occur. (3) An RS-SVM was an established influenza early warning model, which had better performance when compared with the RS prediction model. It illustrated that an RS-SVM model for influenza prediction was entirely feasible and would produce a satisfactory result. In future work, we will improve our study in the following areas: (1) based on ILI% data processing, we will use the average ILI% of 3 weeks (the same, before, and after week) as the ILI% value of 1 week. This method will make up for the shortcomings of data bias (such as holidays) to some extent and is conducive to data stability of monitoring. (2) So far, we have just applied a linear warning line. In the future, we will try a non-linear warning line to observe whether it can produce a more objective analysis of the influenza trend and increase the sensitivity of warning. (3) Since ILI% is not entirely consistent with influenza pandemics, we have to find a better measure when ILI% data are inconsistent with the rate of influenza virus isolation and influenza outbreaks. (4) In this paper, we had only used one Yunqi factor in our predictive model; in the future, we hope we can apply more Yunqi factors, such as pestilence every 3 years, qi going too far, and qi not going far enough, in disease forecasting and prevention. Through such work, we hope to open a new door for this ancient theory and to promote and develop it. In short, we hope our research can help the public health officials and professional medical staff to better respond to regional, seasonal infl uenza. Our study results can be further tested through future, additional laboratory, epidemiological, and other modeling studies. Practical doctrine of Yunqi theory: 1. climate and disease prevention Yunqi theory in Canon of medicine and disease 2. prediction Yunqi theory of traditional Chinese medicine Beijing: Beijing Science and Technology Press Rough set theory and its applications Attribute reduction in decision-theoretic 5. rough set models Rough sets 6. in knowledge discovery 2: applications, case studies and software systems Influenza 7. virus transmission is dependent on relative humidity and temperature Air pollution and health We are grateful for the guidance and help from the Chinese Center for Disease Control and Prevention.