key: cord-0691120-2cj8aeay authors: Shim, Kyu Seok; Kim, Min Hyung; Shim, Choong Nam; Han, Minkyu; Lim, In Seok; Chae, Soo Ahn; Yun, Sin Weon; Lee, Na Mi; Yi, Dae Yong; Kim, Hyery title: Seasonal trends of diagnosis of childhood malignant diseases and viral prevalence in South Korea date: 2017-11-08 journal: Cancer Epidemiol DOI: 10.1016/j.canep.2017.11.003 sha: 1a5faed652d6fb7dc9e86aeee352270c9b0f62d3 doc_id: 691120 cord_uid: 2cj8aeay BACKGROUND: Several studies have reported a seasonal trend in the diagnosis of childhood cancer suggesting seasonal factors such as infection. The present study aimed to analyze the diagnosis pattern of childhood malignant diseases using public health data, and to compare this pattern with seasonal viral infection trends. METHOD: Using the open data source of the Health Insurance Review and Assessment Service, we extracted data regarding all patients under 21 years of age and who had any cancer, aplastic anemia or myelodysplastic syndrome between September 2009 and December 2013. The positive detection rates of 11 viruses was collected from the surveillance data of Korea Centers for Disease Control and Prevention, and seasonality analysis were conducted with both data. RESULTS: In total, 9085 patients were diagnosed with malignant disease during the study period; there were about 175 new cases per month on average. Monthly stacked time series by year showed an apparent seasonal variation with the highest monthly average in January as 236, and the lowest in September as 120. In winter, significantly more patients were diagnosed with acute lymphoblastic leukemia, acute myeloid leukemia, neuroblastoma, and Hodgkin’s lymphoma than in other seasons. There was a temporal correlation with the diagnostic trends of several diseases and the prevalence of recent human parainfluenza virus. CONCLUSION: This study tentatively suggests that the diagnosis of childhood malignancy follows a seasonal trend in Korea, and has a possible correlation with viral prevalence in several diseases. Further long-term analysis of epidemiological data is needed to explore possible causality. Malignant tumors of childhood are the most common disease-related cause of death among patients younger than 20 years old in Korea [1] . Annually, 16.6 per 100,000 children are diagnosed with malignancies; this accounts for about 1% of all cancer patients. Despite decades of research, the etiology of childhood cancer remains largely unknown. However, one hypothesis has suggested that pediatric cancer is related to infection. In particular, precursor B cell acute lymphoblastic leukemia (ALL), several subtypes of lymphoma, and nasopharyngeal cancer are reportedly associated with viral infection [2] [3] [4] [5] . Several studies have investigated temporal variations in the diagnosis of childhood cancer. Summer peaks of ALL diagnosis have been observed amongst both children and adults in United Kingdom [6] . In the United States, summer peaks of ALL, hepatoblastoma, and rhabdomyosarcoma diagnosis have been observed amongst children, and a winter peak of central nervous system (CNS) tumor diagnosis has been observed amongst children in Southern USA [7] . In Denmark, seasonal variation in diagnosis-with a peak in March-was found amongst children and adolescents with Hodgkin's lymphoma (HL) [8] . According to the recent study of Northern England, there was statistically significant sinusoidal variation in girls for all lymphomas (peak in March) and HL (peak in January), and in boys for osteosarcoma (peak in October) [9] . Such variations in the rate of diagnosis may arise from seasonal differences in environmental risk factors, such as infections. That is, childhood cancer might develop in a two-step process: exposure to infection may be the "second hit" that promotes overt malignancy in children with underlying genetic vulnerability [4, 5] . If a triggering infection of this kind exhibited a seasonal pattern, as influenza or gastro-enteritis does, some seasonal variation would be expected in the date of childhood cancer diagnosis. In this regard, we searched public health data for evidence of (1) any seasonal variation in the diagnosis of childhood malignant diseases, and (2) any correlation between these trends and the prevalence of a specific virus. No previous studies have reported seasonal trends in pediatric malignant disease diagnosis in Korea. This study aimed to analyze the diagnosis pattern of pediatric malignant disease, and to compare the pattern with seasonal trends in viral infection. We extracted data regarding all pediatric malignant disease from the open data source of the Health Insurance Review and Assessment Service (HIRA). South Korea has a universal health coverage system, and the National Health Insurance covers approximately 98% of the South Korean population. Claims data are collected by the HIRA when healthcare service providers in South Korea seek reimbursements for services that are covered by the National Health Insurance Corporation. The claims data of the HIRA are open access to approved researchers who submit a profile of their research. They contain the information of 46 million patients per year, which accounts for 90% of the total population of South Korea; the claims data include information regarding patients' diagnoses, hospitalization, and prescription drugs; such information constitutes a valuable resource for healthcare research. In Korea, the disease code registered in the HIRA database is entered directly by the patient's primary care physician. Since 2005, the "National cancer registration and reimbursement program" has been implemented in Korea, which reimburse for 90% of the medical expenses of patients with malignant diseases. Therefore, all the patients with malignant diseases should be registered in the database after a definite diagnosis, so that the diagnosis of the patient registered in the HIRA database would be relatively correct. Using this open claims data, we used selected diagnostic codes to extract the data of patients with pediatric malignant diseases. The source population comprised all patients under 21 years of age who had been diagnosed with cancer, aplastic anemia (AA) or myelodysplastic syndrome (MDS), as defined by the International Classification of Diseases (ICD, 10th revision), between September 2009 and December 2013 (Table 1) . Data selection and mining were conducted using SAS (version 9.2) and R (3.2.2; http://www.r-project.org/). The Korea Centers for Disease Control and Prevention collect data regarding pathogens that cause acute respiratory infection and enteric diseases. During each year of the study period, more than 1000 respiratory specimens were collected from about 100 participating hospitals across the country, and the causative pathogens were identified using standardized diagnostic procedures in a central laboratory. In addition, the prevalence of enteric pathogens was under weekly surveillance, with cohorts comprising 17 local environmental and health institutes, as well as 105 participating hospitals. Each week, the percentage of positive cases among all specimens was reported for each pathogen. We calculated the average monthly positive detection rates (PDRs) of 7 respiratory viruses (adenovirus, parainfluenza virus, respiratory syncytial virus, influenza virus, corona virus, rhinovirus, and bocavirus) and 4 acute diarrhea viruses (adenovirus, rotavirus, norovirus, and astrovirus). The PDR data was collected from 2009 to 2013. Because seasonality is a general component of monthly time series, we used the autoregressive integrated moving average (ARIMA) modeling approach to build a model of seasonal variation in the diagnosis of childhood cancer. The seasonal ARIMA model assumes that the current observation is related to past observations through time. The general multiplicative form of the ARIMA model is written as follows: ARIMA (p, d, q) × (P, D, Q). This model has non-seasonal, autoregressive, and moving average parameters of orders p and q, seasonal, autoregressive, and moving average parameters of order P and Q, and two non-seasonal and seasonal differencing parameters of orders d and D, respectively. To determine the general form of the model to be fit, residual ACF (Autocorrelation Function) was examined. Considering the ACF graphs, different ARIMA models were identified for model selection (Fig. 1 , Supplemental Fig. 1 ). The model of the minimum Akaike's information criterion (AIC) was selectd as the best-fit model. The Granger approach [10] was used to investigate how many of the current values in the time series y could be explained by other values in the time series y. The commonly used EViews software was originally developed for use in economics, but it can be useful in other statistical applications; EViews version 5.0 (http://www.eviews.com/) was used to analyze the data. The null hypotheses were that series x does not Granger-cause series y in the first regression, and that y does not Granger-cause x in the second regression. Seasonal differences in newly diagnosed childhood cancer were analyzed using the Chi-Square test, and P-values < 0.05 were considered statistically significant. To carry out time series modeling, we fitted an ARIMA model to the residuals of an order 3, polynomial, time-trend model with a significant trend. Table 3 shows the parameters of the ARIMA models in newly diagnosed childhood malignant disease patients, and the AIC parameters of the model. Table 1 shows the classification of disease groups, and Supplemental Fig. 2 shows a monthly time series of each disease group. Seasonal trends were apparent in acute myeloid leukemia (AML), Non-Hodgkin's lymphoma (NHL), and especially ALL and germ cell tumors (GCTs). In the case of ALL, the number of diagnoses was lowest in September; this increased towards a peak in December and January. To ascertain seasonal differences in newly diagnosed childhood cancer by disease category, we carried out Chi-Square test, which showed that AML, HL, and neuroblastoma (NBL) had statistically significant seasonal differences (Table 4 ). That is, significantly more patients were diagnosed with AML or NBL in winter, and HL was diagnosed more in winter than in autumn. Supplemental Table 1 outlines the PDRs of viruses. The PDRs of most viruses showed apparent seasonal variation. Specifically, the PDR of the parainfluenza virus was highest from May to August. That of adenovirus was highest in September, and those of respiratory syncytial virus and norovirus were highest from November to December. Influenza and norovirus had their highest PDR from January to April, while the PDR of coronavirus was highest from December to February. Finally, the bocavirus PDR was highest from April to June. To assess approximate viral trends, the monthly PDRs of all viruses were summed up; this total PDR was lowest in September and highest in December. If any virus prevalent time affected cancer diagnosis, the prevalence of that virus might increase before a peak in cancer diagnosis. Thus, a Granger causality test was conducted between the virus PDR data and the cancer diagnostic data from 1 to 2 months later. The results of this Granger causality test are shown in Table 5 . Among the seven respiratory viruses analyzed, the prevalence time of human parainfluenza virus was related to the diagnostic data of 9 disease categories after 1 month, as well as to the summed data (ALL, AML, bone tumors, CNS tumors, GCT, NHL, nasopharyngeal cancer, retinoblastoma, sarcoma, and all childhood cancers). After 2 months, it was related to the data for ALL, AML, bone tumors, CNS tumors, NHL, retinoblastoma, and all childhood cancers. In addition, respiratory syncytial virus was related to the HL and NBL data after 1 month and to the aplastic anemia and AML data after 2 months. Among the enteric viruses, astrovirus was related to newly diagnosed sarcoma after 1-2 months. This study was carried out to investigate whether the diagnosis of childhood malignant diseases follows any seasonal trend, and there is any correlation between this trend and the prevalence timing of specific viruses. Seasonal trends were apparent in AML, NHL, and especially in ALL and GCTs. These trends suggest that viruses or allergic causes, which have a seasonal prevalence pattern throughout the year, might affect the occurrence of childhood cancer. Previous studies into the seasonal patterns of childhood cancer have mainly focused on ALL, and several studies have reported a summer peak in the diagnosis of childhood ALL [6, 7, 11, 12] . Other investigation has found a winter peak in childhood ALL diagnosis [13] . Harris et al. compared the seasonal risk of childhood ALL between cases diagnosed in the northern USA (greater than 40°N latitude) and those diagnosed in the southern USA (less than 40°N latitude) [14] . They found complex, trimodal patterns, with seasonal peaks in April, August, and December at northern latitudes, and in February, July, and October in the southern locations. They suggested that these peaks coincided with seasonal elevations in allergic and infectious processes: the indexes of tree and grass pollen are higher in the spring, and that of ragweed pollen is higher in the summer; conversely, influenza is more prevalent in the winter. In addition to ALL, a cancer seasonality study involving adolescents in United Kingdom revealed significant evidence of seasonality in the 2009 231 242 207 253 233 2010 276 202 197 166 174 197 230 203 155 119 126 277 194 2011 204 196 218 165 155 157 219 186 138 108 135 210 174 2012 238 167 182 169 155 144 206 99 28 106 258 181 161 2013 227 176 160 143 149 155 186 128 48 80 153 201 151 Ave. 236 185 189 161 158 163 210 154 120 131 176 224 175 Mon, month; Ave., average. diagnosis of HL, with a peak in November and December [11] . These findings are more consistent with an infectious exposure than with any other environmental exposures. Variation in the peak seasons would be caused by climate changes, which clearly influence the timing of particular infectious outbreaks [5] . In our study, childhood ALL showed a seasonal trend: diagnosis increased in the winter, and there was decreasing trend subsequently until September, when the lowest number of cases were diagnosed. Childhood ALL is a relatively well-known disease associated with infection, with a peak age of development of between 2 and 5 years. The disease is associated with industrialization and modern societies, and increased prevalence has led to the formulation of two infection-based hypotheses: Kinlen's "population-mixing" hypothesis, and Greaves' "delayed-infection" hypothesis [15, 16] . Kinlen's hypothesis predicts that clusters of childhood ALL cases result from the exposure of susceptible individuals to common, but fairly innocuous infections after population-mixing with carriers [15] . The delayed-infection hypothesis of Greaves is based on a minimal "two-hit" model; it suggests that some susceptible individuals have a prenatally acquired, pre-leukemic allele, and that these individuals had little or no exposure to common infections early in life because they live in an affluent hygienic environment [16] . Such insulation from infection predisposes the immune system of these individuals to aberrant or pathological responses after subsequent or delayed exposure to common infections at an age when there is increased lymphoid-cell proliferation. It might be noticeable that the diagnosis of childhood AML, HL, NHL, and NBL increased during the winter in our study. In particular, there have been limited epidemiologic data on NBL. Recently, an Italian neuroblastoma cohort study reported that stage 4S NBL had a significant peak among July births; however no trend in the date of diagnosis was found [17] . In the present study, we could not classify the patients' stages; however, stage 4S NBL accounts for less than 5% of all NBL cases, so our seasonal NBL pattern is surely different from that reported in the Italian study [18] . With regards to AML, while ALL and AML are biologically different diseases with distinct etiologies [19] , it is possible that different infectious agents play a role in the development of both. Our preliminary evidence for seasonal peaks in several diseases warrants further investigation. We also found significant July and winter peaks in the diagnosis of GCTs. In Western countries, No GCT time series have been published; however, our results suggest that seasonal factors could influence the development of GCT. Infection is known to affect cancer development, but only a few viruses have a confirmed causal relationship. Approximately 18% of the global cancer burden has been attributed to infectious agents, with estimates ranging from 7% in developed countries to about 22% in developing countries [20] . Chronic infections by the hepatitis B and C viruses, human papilloma virus, and Helicobacter pylori are reportedly responsible for approximately 15% of all human cancers, especially in adults. Malignancies known to be associated with infection have been considered as resulting from the prolonged latency that occurs in chronic infections. In general, pathogenic infections could be necessary, but not sufficient for the initiation and progression of many cancers. That is, cancer initiation may require additional co-factors, including secondary infections [5] . Therefore, in patients with chronic infection by one agent, a secondary co-infection with another agent may be an important co-factor in cancer initiation and progression. Co-infections are relatively common in areas with a high prevalence of infectious agents, especially in developing countries. These co-infections can cause an imbalance in the host immune system by affecting the persistence of, and susceptibility to, malignant infections. In this study, we used the surveillance data for viruses from about 100 participating hospitals across the country. The participating hospitals are located in all the 16 provinces throughout the country; not only university hospitals but also primary medical centers are participating. Therefore, this data could reflect the nationwide viral prevalence and infection trends in Korea. No respiratory or enteric viruses have been postulated as risk factors in cancer yet. However, in the present study, a period of high parainfluenza virus prevalence was temporaly associated with the diagnosis of several diseases. This temporal association does not confirm a causal relationship between viral infection and cancer development, and it has not been reported that parainfluenza virus affects tumorigenesis. Such correlations might occur because frequent hospital visiting during viral infections could lead to the diagnosis of malignant diseases. However, there were several disease groups without seasonality, and GCT showing an apparent seasonality had a summer peak not corresponding to any viral prevalence trend in this study. Thus, hospital visiting factors could not explain all the data in this study. In addition, cellular immune response such as T cell dysfunction occurs during parainfluenza virus infection [21] , which might affect host immune dysregulation and cancer susceptibility. If infection affects tumorigenesis, the seasonal prevalence pattern of childhood cancer diagnosis would vary by region and country according to differing infection patterns. Further longitudinal multifactorial study is needed to elucidate infection related etiological factors. The present study was the first to investigate seasonal trends in childhood malignancy diagnosis in Korea. However, unavoidable limitations need to be considered. First, the present study was an ecological analysis. We had no information at the individual level regarding potential confounding factors. Second, we had no information regarding the month of onset of clinical symptoms, which may be a more important indicator because there may be considerable variability in the time between the first appearance of symptoms and diagnosis. Thirdly, there may have been selection bias: although duplicated patients were excluded, it is possible that some of the patients had been diagnosed before the study period. This selection bias is inevitable in cross-sectional cohort studies; however, more long-term study could minimize such bias. The fourth, this type of study cannot evaluate exposure to specific infectious agents, and individual exposures to certain virus could not be confirmed in the open data source. Finally, because the etiology of most cancers is likely multifactorial, not all cases would necessarily have an infectious cause. However, the combinational effects would probably result in an underestimate, rather than an overestimate, of any putative seasonal influence. In conclusion, this study tentatively suggests that the diagnosis of childhood malignancy follows a seasonal trend in Korea, and has a possible correlation with viral prevalence in several diseases. Further long-term analysis of epidemiological data is needed to explore possible causality. H Kim designed the study, collected, reviewed, and analyzed data; KS Shim, MH Kim, and CN Shim wrote the paper; M Han performed statistical and bioinformatic analysis; IS Lim, SA Chae, SW Yun, and NM Lee contributed data collection; DY Yi organized writing process. All authors made substantial contributions to the concept and design of the study, and gave final approval for the manuscript to be submitted. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Conflict of interest relevant to this article was not reported. Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.canep.2017.11.003. Incidence and Survival of Childhood Cancer in Korea Pattern of Epstein-Barr virus association in childhood non-Hodgkin's lymphoma: experience of university of malaya medical center Epstein-Barr virus infection in the pathogenesis of nasopharyngeal carcinoma Acute lymphoblastic leukaemia An infectious aetiology for childhood acute leukaemia: a review of the evidence Seasonality in the diagnosis of acute lymphocytic leukaemia Seasonal variations in the diagnosis of childhood cancer in the United States Seasonal variation in month of birth and diagnosis in children and adolescents with Hodgkin disease and non-Hodgkin lymphoma Season of birth and diagnosis for childhood cancer in Northern England Investigating causal relations by econometric models and crossspectral methods Seasonal variations in the onset of childhood leukaemia and lymphoma Season of birth and diagnosis of children with leukaemia: an analysis of over 15 000 UK cases occurring from 1953-95 Seasonal variations in the onset of childhood leukemia/lymphoma The seasonal risk of pediatric/juvenile acute lymphocytic leukemia in the United States Infections and immune factors in cancer: the role of epidemiology Infection, immune responses and the aetiology of childhood leukaemia Seasonal variations of date of diagnosis and birth for neuroblastoma patients in Italy Clinico-epidemiology of neuroblastoma in north east Egypt: a 5-year multicenter study Epidemiology of childhood leukemia, with a focus on infants Multiple infections and cancer: implications in epidemiology Altered function in CD8+ T cells following paramyxovirus infection of the respiratory tract