key: cord-0434063-no5buhev authors: Guitart, Anna; R'io, Ana Fern'andez del; Peri'anez, 'Africa; Bellhouse, Lauren title: Midwifery Learning and Forecasting: Predicting Content Demand with User-Generated Logs date: 2021-07-06 journal: nan DOI: nan sha: 25bf181c76ff336f766d1e4ef1d5b8bd06969766 doc_id: 434063 cord_uid: no5buhev Every day, 800 women and 6,700 newborns die from complications related to pregnancy or childbirth. A well-trained midwife can prevent most of these maternal and newborn deaths. Data science models together with logs generated by users of online learning applications for midwives can help to improve their learning competencies. The goal is to use these rich behavioral data to push digital learning towards personalized content and to provide an adaptive learning journey. In this work, we evaluate various forecasting methods to determine the interest of future users on the different kind of contents available in the app, broken down by profession and region. The rapid expansion of mobile health applications in low-and middle-income countries, and the large volume of data generated by their users, has created unprecedented opportunities for applying artificial intelligence (AI) to improve individual and population health [14, 27] . The application of data science models to the digital tools' behavioral logs of frontline healthcare workers and patients can lead to improvements in clinical research and practice, and health service delivery. And public health can use big datasets Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. DSHealth 2021, August 14-18, 2021 to promote healthy habits and ameliorate self-management, by providing people with health and well-being plans based on their particular medical and social circumstances [18, 20] . Every year 300,000 women and 5 million newborns die of causes related to pregnancy and childbirth [8, 11] . Additionally, for every maternal death approximately 20 women suffer serious birth injuries [28] . Nearly all of these deaths and disabilities occur in lowand middle-income countries, and almost 90% of them could be prevented if the woman gave birth with qualified assistance from a skilled midwife [19] . Additionally, 80% of all newborn deaths result from conditions which are preventable and treatable, and for which proven, cost-efficient interventions exist [21] . Almost all intrapartum and many antepartum stillbirths could be prevented with quality essential childbirth care and antenatal care [16] . Here we show an analysis of the logs from skilled birth attendants using the Safe Delivery App [10] , a digital training and learning mobile application developed by the Maternity Foundation-a nonprofit that develops digital learning tools to ensure all women and newborns have a safe childbirth [9] . This work represents a step towards content personalization for midwives, in a sector that has traditionally been left out of big technological developments. Forecasting the demand for learning content by profession and region can lead to a better understanding of user habits and improve the management of campaigns [29] . We apply several forecasting methods to evaluate their accuracy and production feasibility with the aim of using the outcomes for future experimentation and incentive analysis. Previous studies using similar methodologies and user logs can be found in [7, 12] . We compare the performance of different time series forecasting methods in predicting the daily demand (in number of users) per type of content (module, in the app's language) and user's profession. Training and prediction were performed using the gluonTS [1] , keras [6] and mxnet [5] Python libraries and the Forecast R package [15] . 1.1.1 Seasonal naïve forecaster. The naïve forecast model [1] was used as a benchmark. Its forecasts are given by the exact values at the equivalent time points of the previous season. For prediction lengths larger than a season, the season is repeated multiple times, whereas for time series shorter than a season, the mean observed value is used as the prediction. forecaster. The SARIMA model [3] was used as an additional benchmark, as it is one of the best performing and most widely used classical approaches to time series analysis and forecasting. At each time step, the time series value is a combination of regular and seasonal autoregressive (where the value depends on the previous values) and moving average (where the value depends on the previous errors) polynomials. In addition, one can take as many differences as needed in the original time series to make it stationary. The last decade has seen the rapid growth of deep neural network architectures to tackle a great variety of problems [17] , due to increasing computational resources and data availability, as well as improved methodology. One shortcoming of this approach is its difficulty to include categorical features, due to their lack of continuity. Entity embedding [13] can be used to effectively learn the representation of categorical variables in multidimensional spaces, increasing their continuity and thus providing an intelligent way of using them as features in deep learning models. In particular, it overcomes the problems faced by the more traditional approach of one-hot encoding, namely the need for excessive computational power and its tendency to overfitting. 1.1.4 Autoregressive recurrent networks (DeepAR). The use of autoregressive recurrent networks to simultaneously predict many time series was introduced in [25] . The method trains either long short-term memory (LSTM) or gated recurrent unit (GRU) networks, where the inputs at each time step are the covariates, the target value from the previous time step (which makes it autoregressive) and the previous output of the network (which makes it recurrent). A global model is learned from all the time series that can be used to generate probabilistic forecasts for the individual time series, each with its own individual distribution. This technique has been previously used in connection with healthcare in [22] . 1.1.5 Low-rank Gaussian copula processes. This is a multivariate approach with deep learning elements, described as GP-Copula in [24] . It combines a time series model based on autoregressive recurrent newtworks with a Gaussian copula process to parametrize the output distribution. This copula has low-rank structure in order to keep the number of parameters and computational complexity within reasonable bounds. Our dataset comprised user logs extracted from Maternity Foundation's Safe Delivery App. This app targets skilled birth attendants around the world, empowering them to provide a safer birth for mothers and newborns through evidence-based and up-to-date clinical guidelines on maternal and neonatal care, including the core components or "signal functions" of Basic Emergency Obstetric and Newborn Care. number of professionals that accessed a particular module per day, and took them as a proxy for the demand for that specific content. Figure 1 presents the time series for various modules in the case of nurses. Even though all modules show a similar overall usage trend, each series exhibits different scale and usage patterns. Similar series are obtained for the other professions. Our goal is to predict the app usage per profession, in order to personalize the content and get a better grasp of usage dynamics. The goal was to predict the daily values of the usage time series for each month and each module-profession combination, training the model with all the available data until the end of the previous month. Cross-validation was performed using a rolling window [4, 23] from 2020-06-01 to 2020-12-01,considering all historical data before the prediction date for the training samples and 30 days of data before the prediction date for the test sample (training samples were split into training and validation sets). The final configuration was selected as the one that got the best results in the cross-validation rolling-window process. All models used the profession and module as categorical features, and the day of the month, day of the week, month and year as covariates. The inclusion of COVID-19-related covariates and information on the Safe Delivery App training for healthcare professionals was tested but did not result in any clear improvement. Regarding the specifications of each method, the SARIMA forecasting was performed using the auto-ARIMA functionality, which means that all combinations of regular polynomials up to degree 5, seasonal polynomials up to degree 2, up to 2 regular and up to 1 seasonal differences were tried, using the Akaike Information Criterion to select the best of them. Our neural network with categorical embeddings had 3 fully connected layers with 1000, 500 and 1 cells; the activation function for the dense layers was ReLU for the first and a sigmoid for the second layer, and we used the mean absolute error as the loss function and Adam as the optimization method. The best performing DeepAR model was found to be that using 20 2-layer LSTM cells, a negative binomial distribution, a dropout rate of 0.01, 300 epochs and a training batch size of 30. The selected GP-Copula variant had exactly the same settings, except that only 5 epochs were considered-as this method is much more computationally intensive and the model was already reaching convergence. Most of the forecasting models evaluated were able to capture the trend of the time series, with results differing mainly in the estimation of the daily patterns specific to each time series. Results are summarized in Table 1 displayed for each model and for each prediction month. We can observe that for more recent months, the scores tend to be lower, partially due to the enlargement of the historic data used in the training sample. Figures 3 and 4 shows an example of the forecasts for each model. It illustrates that, while the performance of the DeepAR and GP-Copula methods is relatively similar (left panel), the former shows a tighter 50% confidence prediction interval that fits better the shape of the actual series. The ARIMA model also produced remarkably accurate predictions, which accounts for its extended use even nowadays that more sophisticated methods are available, although it shows a larger forecast uncertainty (as shown by the wider confidence interval). The forecasts for the other evaluated models are displayed in the right panel. The GP-Copula model trained over just 5 epochs performs similarly than the DeepAR model trained over 300 epochs, though it still needs more time and resources. For that reason, DeepAR would be the preferred option in a production environment. However, if higher accuracy were critical and there were no constraints on computational time and resources, the use of GP-Copula with an increased number of epochs would be justified. Overall, we found that the DeepAR and GP-Copula deep learning models are the most accurate for daily forecasting of the content demand. This result holds across different contents (modules) and user types (professions), as these two models show less error variability in the overall results of each individual time series. Other models such as Facebook's Prophet [26] and DeepVAR (the simplest multivariate extension of DeepAR) [24] , both with their default settings, were also tested but performed worse than the ARIMA benchmark, so they were not included in the analysis. Although the evaluated dataset corresponds to India, this methodology shows potential to be applied to different countries or geographical areas, and also to additional contents. DeepAR constitutes a generalizable model that can correctly capture the trend behavior of the time series and anticipate user demand for a particular content depending on the user profile. We provided a solution that could be used in operational settings to get real-time demand estimates, due to the flexibility and speed of the model implementation. All data used in this analysis comes from the Safe Delivery App logs and belongs to the Maternity Foundation. For inquiries regarding its use, please contact them at mail@maternity.dk. GluonTS: Probabilistic and Neural Time Series Modeling in Python A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms Time Series Analysis, Forecasting and Control Evaluating time series forecasting models: An empirical study on performance estimation methods MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems A Time Series Approach to Player Churn and Conversion in Videogames United Nations. Interagency Group for Child Mortality Estimation. 2020. Levels & Trends in Child Mortality: Report 2020: Estimates Developed by the UN Inter-Agency Group for Child Mortality Estimation. United Nations Children's Fund Maternity Foundation. 2021. Maternity Foundation Maternity Foundation. 2021. Safe Delivery App Cost of Ending Preventable Maternal Deaths Forecasting Player Behavioral Data and Simulating in-Game Events Entity Embeddings of Categorical Variables Artificial intelligence for global health Automatic Time Series Forecasting: The forecast Package for R Stillbirths: rates, risk factors, and acceleration towards 2030 Deep Learning Digital health data-driven approaches to understand human behavior Potential impact of midwives in preventing and reducing maternal and neonatal mortality and stillbirths: a Lives Saved Tool modelling study Big data and data science in health care: What nurses and midwives need to know Every newborn: an action plan to end preventable deaths Covid-19: A comparison of time series methods to forecast percentage of active cases per population Consistent cross-validatory model-selection for dependent data: hv-block cross-validation High-dimensional multivariate forecasting with low-rank Gaussian Copula Processes DeepAR: Probabilistic forecasting with autoregressive recurrent networks Forecasting at scale Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? United Nations Population Fund World Health Organization, UNICEF and The World Bank Forecasting medical device demand with online search queries: A big data and machine learning approach The authors wish to thank Javier Grande and Wei Xiang Low for their careful review of the manuscript. This work was supported, in whole or in part, by the Bill & Melinda Gates Foundation INV-022480. Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript version that might arise from this submission.