key: cord-0290701-7deizccd authors: Simmachan, T.; Lerdsuwansri, R.; Wongsai, N.; Wongsai, S. title: Modelling road accident fatalities with underdispersion and zero-inflated counts date: 2022-05-16 journal: nan DOI: 10.1101/2022.05.13.22275063 sha: 0237a6607f876432317af1a38047b0b853135ea4 doc_id: 290701 cord_uid: 7deizccd Background Thailand was rank second in the world in 2013 on the road accident fatality (RAF) rate, killing 36.2 of every 100,000 Thai peoples. In the past decade, during Songkran festival, the traditional Thai new year, the number of road traffic accidents (RTAs) was markedly higher than normal day life, but few studies have yet investigated this issue as the effect of festivity. The objective of this study was to investigate factors contributing to RAF using various count regression models. Methods Data of 20,229 accidents in 2015 were collected from the Department of Disaster Prevention and Mitigation, Thailand. Poisson, Conway-Maxwell-Poisson, and their Zero-Inflated versions were applied to analyze factors associated with the number of fatalities in an accident. Results The RAFs in Thailand follow a count distribution with underdispersion and excessive zeros which is rare. The best fitting model, the ZICMP regression model returns significant predictors (road characteristics, weather conditions, environmental conditions, and month) on the number of fatalities in an accident. The model consists of the count part encapsulating both non-excess zeros and death counts and the zero-part representing the considerable number of zeros during the festival months. The estimated proportion of the zero-part is 0.275 accounting for 5,563 non-fatal accidents. More specifically, the excessive number of no deaths can be explained by the month factor. The mean number of fatalities was lower in the festive periods than other months, with the highest in November. Conclusion For long, Thai authorities have put a lot of efforts and resources into improving road safety over the festival weeks, often they failed. This study indicates that peoples risk perception and public awareness of RAFs are mislead. Instead, nationwide road safety should have been announced by the authorities to raise the awareness of society towards everyday personal safety and the safety of others. excessive number of no deaths can be explained by the month factor. The mean number of fatalities 29 was lower in the festive periods than other months, with the highest in November. Table 1 . Poisson regression has a limitation in its variance assumption. The 126 expected death counts at the accident ( ) was calculated using to ensure that 127 . is the vector of covariates and is the vector of estimable coefficients. Given the p.m.f., link 128 function, and an assumption of independent observations, the log-likelihood function for observation is 129 given as 130 . (1) Summing over observations, the log-likelihood function is 132 . ( 133 The CMP distribution is a generalization of the Poisson distribution serving to model both 135 underdispersed and overdispersed data. This distribution was originally proposed by [27] , but its 136 implementation is contributed by [28] [29] . The p.m.f., mean and variance are defined in Table 1 CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.13.22275063 doi: medRxiv preprint of death counts is greater (less) than its variance, an underdispersion (overdispersion) is observed. According to the relationship between the mean and variance of the CMP distribution, ( . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. 202 To assess the model adequacy, the generalized Pearson χ2 statistic is used and it is computed as follows. The numerator is the squared difference between the observed death count and the expected value of 206 the death count, and the denominator is the expected value of the death count. In large samples, the 207 distribution of this statistic is approximately chi-squared with n − k degrees of freedom, where n is the 208 total number of observations and k is the number of estimated parameters including the intercept. The 209 small value of the statistic tends to the decision to not reject the null hypothesis. If we fail to reject the 210 null hypothesis, it shows that the model is adequate. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.13.22275063 doi: medRxiv preprint Thailand with a long holiday period of more than 5 days. However, it is noticeable that the fatalities per 235 accident per month were lower in these festival months than other months. The highest proportion of 236 0.396 was seen in November, and the lowest of 0.165 was in April. A greater number of non-fatal injuries 237 were seen during the celebrations. It plays a crucial role in excessive zero participation in the data. Six explanatory variables were used in estimating road accident fatalities (Table 1) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. Results of coefficients and standard errors are shown in Table 4 and 95% confidence intervals in is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. Model Selection The most widely used model for analyzing count data is the Poisson model. There is a limit to the 319 Poisson model as it assumes a unit variance-to-mean ratio. In road safety data, this restriction may be 320 violated because the observed count may present a variance that is greater or smaller than expected, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) response. Since the underdispersion is not accounted for, the classical models produce a lower standard 406 error of estimated parameters than expected. We highlighted that having many zeros does not 407 necessarily mean that a zero inflated model is needed. It is a matter of what issues of concern need to 408 be addressed and the possibility of finding associated factors for capturing such zero-inflation issues. A 409 key finding is that the month of the year has a power to figure out the difference between the death 410 counts and the extra zero death counts, referred to as uncommon nonfatal accidents in our case. Using . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.13.22275063 doi: medRxiv preprint Global status report on road safety 2015 Modelling road accident fatalities in Thailand and other 450 Asian countries Global Road Safety Performance Targets Global Plan for Decade of Action for Road Safety Division of Injury Prevention, Department of Disease Control, Ministry of Public Health, Thailand 457 Injury Data Collaboration Center Prevalence of fatality and associated factors of 459 road traffic accidents among victims reported to Burayu town police stations, between Department of Disaster Prevention and Mitigation, Ministry of Interior Road Traffic Injuries in Thailand: Current Situation Helmet use and associated factors among thai motorcyclists 468 during Songkran festival Non-seatbelt use and associated factors among Thai drivers 470 during Songkran festival Outcomes of emergency medical 472 service usage in severe road traffic injury during Thai holidays Determination of the impact of rainfall on road 475 accidents in Thailand. Heliyon Analysis of crash frequency and crash severity in thailand: Hierarchical structure models approach. 478 Sustain Road traffic 480 injuries in Thailand and their associated factors using Conway-Maxwell-Poisson regression model A Poisson Regression Model of Highway Fatalities An analysis of factors influencing accidents on road bridges in 485 Norway Statistical modeling of numbers of human deaths per road 487 traffic accident in the Oromia region, Ethiopia. PLoS One A flexible zero-inflated model to address data dispersion 31. van den Broek J. A Score Test for Zero Inflation in a Poisson Distribution Modern Applied Statistics with S Fourth edition Regression models for count data in R Package COMPoissonReg. Package "COMPoissonReg Economic development and 522 road traffic injuries and fatalities in Thailand: An application of spatial panel data analysis Modeling Count Data in Risk Analysis and Reliability Engineering Development and application of traffic 527 accident density estimation models using kernel density estimation Comparison of severity of motorcyclist injury by crash types Latent class analysis of factors that influence weekday and weekend 532 single-vehicle crash severities . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2022. ; https://doi.org/10.1101/2022.05.13.22275063 doi: medRxiv preprint