key: cord-0219847-wkffgoh3 authors: Verbelen, Roel; Antonio, Katrien; Claeskens, Gerda; Crevecoeur, Jonas title: Modeling the occurrence of events subject to a reporting delay via an EM algorithm date: 2019-09-18 journal: nan DOI: nan sha: efabadba85e5806f186506318a32cad269a784f8 doc_id: 219847 cord_uid: wkffgoh3 A delay between the occurrence and the reporting of events often has practical implications such as for the amount of capital to hold for insurance companies, or for taking preventive actions in case of infectious diseases. The accurate estimation of the number of incurred but not (yet) reported events forms an essential part of properly dealing with this phenomenon. We review the current practice for analysing such data and we present a flexible regression framework to jointly estimate the occurrence and reporting of events. By linking this setting to an incomplete data problem, estimation is performed via an expectation-maximization algorithm. The resulting method is elegant, easy to understand and implement, and provides refined insights in the nowcasts. The proposed methodology is applied to a European general liability portfolio in insurance. This paper reviews and extends the literature on statistical models for problems where individuals (or objects) under study experience two events. The first, also called the initiating or primary, event occurs at time x, and the second, the so-called secondary or consequent, event only occurs at a later time s ≥ x. The presence of the delay u = s − x between the two events leads to statistical challenges, because the currently observed number of primary events is right-censored while observation delays are right-truncated, see e.g. the early contributions by Lagakos et al. (1988) , Kalbfleisch and Lawless (1989) , Harris (1990) and for an introduction to the field. The term back-calculation (Brookmeyer and Gail, 1988; Bacchetti et al., 1993) refers to the reconstruction of the past history of first events that must have occurred to give rise to the observed pattern of second event cases, under the assumption of a known delay distribution. Nowadays, nowcasting is often used for estimating the current number of first events using only the available partial information on the reported or registered secondary events (see Höhle and an der Heiden, 2014; van de Kassteele et al., 2019; Bastos et al., 2019, for recent examples of nowcasting problems in epidemiology). Such delays occur in different ways and in a variety of subject areas. In an insurance setting a claim is only reported some time after its occurrence, because the damage was not 1 immediately noticed or the insured needed some time to file the claim to the insurance company. A proper estimation of these unreported claims is important, since financial regulations force insurance companies to hold sufficient capital reserves to be able to fulfill their future liabilities with respect to such claims. Jewell (1989) sketches first contributions to the modelling of these occurred-or incurred-but-not-reported (OBNR or IBNR) claims in an insurance context. In disease modelling at least two examples of these delays are studied in the literature. In the first, x represents the time of diagnosis of a case (or another relevant event, like hospital admission in Donker et al., 2011) and s is the time of reporting of the case to the organization that coordinates the surveillance of the disease. This reporting delay may result from various processes including logistics, or the time to complete a test and to report a confirmed case in a health database. Statistical surveillance systems for the detection of outbreaks of infectious diseases have to properly adjust for reporting delays in order to take timely preventive action (see e.g. Farrington et al., 1996; Noufaily et al., 2015 Noufaily et al., , 2016 . In the second example x measures the time of infection (with a virus, for example) and s is the time of onset of disease symptoms. Insight in the distribution of the incubation time u = s − x is important to estimate the current number of infections in a population. In reliability engineering and quality management, the statistical analysis of warranty data requires taking the time between the failure and its reporting into account in order to predict the number of future warranty claims from all units in service (see e.g. Wu, 2013) . The contribution of our paper is threefold. First, Section 2 sketches (a selection of recent) contributions on nowcasting that appeared in actuarial, statistical or epidemiological literature. Our literature overview stretches across multiple disciplines and structures these papers along the time scale used in the modelling framework, where we distinguish between models for events in continuous time and models for data aggregated over a coarser discrete time grid. Second, we contribute to the literature by proposing in Section 3 a flexible yet practical modelling and estimation framework capable of dealing with any parametric structure for both the occurrence as well as the reporting process. We employ an expectation-maximisation (EM) method (Dempster et al., 1977) to jointly estimate the occurrence and reporting delay of the events in the presence of covariates, allowing us to acquire the necessary insights in the dynamics of both. Third, Section 4 presents the results of applying the model in a case study where out-of-time evaluations are used to asses the predictive performance of the method. Moreover, we benchmark our results against the findings obtained with a selection of modelling strategies from the literature overview. Section 5 concludes. The supplementary document contains some additional results regarding the case study. The event of interest (e.g. the failure of a product, the occurrence of an insured event or the diagnosis of disease) happens at time X, though it is only reported or observed at a later time S. The reporting delay is then U = S − X. Figure 1 represents the occurrence of multiple events over time. When the reporting occurs before τ , the observation is complete (events 1 and 3 in Figure 1 ), while reporting after τ corresponds with a currently unreported case (events 2 and 4 in Figure 1 ). At time τ the analyst has to predict or evaluate the number of events that incurred in the past, but will only be reported in the future. This is challenging because the analyst faces incomplete data, with no information available for the unreported cases. Leaving the continuous time setting, the dashed grid in Figure 1 pictures the aggregation of events when choosing a coarser discrete time scale. The objective is to use the observed, reported events in the upper triangle ∆ r = {(X, U ) | X ≤ τ, U ≤ τ − x} to predict unreported events in the lower triangle ∆ nr = {(X, U ) | X ≤ τ, U > τ − X}. Table 1 . When data are collected in (almost) real-time, predicting the number of unreported events reduces to estimating the two-dimensional density f X,U for the occurrence and reporting of events in [0, τ ] × [0, τ ]. Hereto, only observations observed on the upper triangle in Figure 1 are available. Martínez Miranda et al. (2013) and more recently Hiabu et al. (2021) estimate this density with a two-dimensional kernel density estimator with support on the upper triangle. By assuming a multiplicative kernel, the local linear density estimate can be extrapolated to the lower triangle which then leads to a forecast for the occurred but not yet reported events. In many applications, we are interested in the marginal densities, i.e. the occurrence intensity of events on the one hand and the reporting delay distribution on the other hand. This interest in the marginal densities is commonly translated into the following assumptions (A1) The event counting process N (x) = i≥1 I{X i ≤ x} for x ≥ 0 follows an inhomogeneous Poisson point process with an intensity λ(x; α), which depends on some parameter vector α. (A2) Conditional on the occurrence time X, the reporting delay U follows a positive continuous distribution with density f U |X (·; θ) and parameter vector θ. The reporting delay U is independent of the event counting process N (x). events in the upper 3 triangle. The associated log-likelihood of these reported events becomes . Optimization of this likelihood is complicated by the joint presence of λ(·; α) from the occurrence process and F U |X (·|·; θ) from the reporting process, prohibiting a split of the likelihood. The likelihood in (1) already appeared in Kalbfleisch and Lawless (1989) , under the assumption of independence between the time of the initiating event and the duration of the delay. Two strategies for optimizing this likelihood commonly appear in the literature. First, a strand of research directly optimizes this likelihood, despite its complexity. For example, Kalbfleisch and Lawless (1989) use a parameterized Poisson process for infections and some parametric models for incubation time, allowing the joint estimation of the occurrence and reporting processes in continuous time. In actuarial science Haastrup and Arjas (1996) present a Bayesian and Wahl (2019) a frequentist approach with the likelihood in (1) as focus. The latter contributions build upon the fundamental ideas for the IBNR problem in actuarial science proposed by Jewell (1989) , Norberg (1993) and Norberg (1999) . A second strategy opts for simplicity by proposing a (heuristic) two-stage approach. For example, Antonio and Plat (2014) fit a parametric distribution to the observed reporting delays in a first step and then plug in the estimated reporting delay distribution when estimating the parameters in the thinned Poisson process for the occurrence of reported events. The log-likelihood of the reporting process becomes . Given the occurrence time X = x of an observed event, its reporting delay is a right-truncated variable U with truncation point τ −x. Lagakos et al. (1988) , and Lawless (1994) (among others) focus on estimating the reporting delay under right-truncation by inverting the direction of time, effectively transforming the right-into left-truncated data. Standard statistical methods for left-truncated data, such as the Cox proportional hazard model (Cox, 1972) , can then be used to model the reporting of events. Badescu et al. (2016 Badescu et al. ( , 2019 and Avanzi et al. (2016) follow a strategy similar to Antonio and Plat (2014) , but model the event occurrence process as a marked Cox process to allow for overdispersion and serial dependency. Along this strand Verrall and Wüthrich (2016) decouple the full likelihood in (1) by considering a plug-in estimate for the weekly periodic occurrence pattern of insurance claims, followed by estimating parametric distributions for the reporting delay. 2.2.1 Aggregating events towards a coarser time scale. Events are usually not registered in continuous, but in discrete time periods. Such discrete event counts result by aggregating the real time events towards a coarser time scale. Figure 1 sketches this aggregation from continuous to discrete time data. The day is the natural (discrete) time unit in many administrative systems. Daily data are often aggregated to weekly, monthly, quarterly or yearly event data, even though many granular data insights may get lost in this aggregation process. For example, Becker and Cui (1997) consider discrete time units 4 in quarterly periods and model the delay in entering AIDS diagnoses in a surveillance system. Insurance studies traditionally use quarterly or yearly claim counts (Wüthrich and Merz, 2008) . In our notation and in the case study presented in Section 4 we consider event counts on a daily level, even though our notation is generic and may refer to any other time unit of interest (e.g., weeks, months or years). The number of events which occurred on day t is denoted by N t , where the integer t indicates the occurrence date and ranges from 1 to τ . The number of these events which have been reported after precisely d days are denoted as N td such that N t = ∞ d=0 N td . Only those events that are reported before or at the evaluation date τ are observed, causing N t to be right-censored and the reporting delay distribution of the observed events occurring on day t to be right-truncated at τ − t. Throughout this paper we denote the reported, and therefore observed, number of events which occurred on day t by We denote the number of events that have happened on day t but are not yet reported by and the total number of such unreported events over all occurrence days by In discrete time nowcasting the overall objective is to use the observed events A convenient way to represent these event counts N td is with a two-way contingency table where the rows correspond to occurrence time t and the columns indicate the reporting delay d. Since we can only observe what has been reported, an incomplete two-way contingency table results where the upper left-hand triangle is observed and the lower right-hand triangle is empty and has to be predicted. Table 1 depicts the structure of such a triangle with reported events, often called a run-off triangle in actuarial literature or a reporting triangle in epidemiology. The shaded regions in Figure 1 and Table 1 visualize the aggregation of events registered in continuous time into a single cell or observation in this triangle. The dimension of the triangle depends on the granularity of the discretization. Smaller bin sizes correspond to larger triangles, which usually result in more complex models to capture the heterogeneity present in the data. A first modelling strategy directly specifies a count regression model for the observed counts N td in N r . Most often these counts are assumed to be independent and Poisson distributed, i.e., N td ∼ POI(λ td ), or, in case of overdispersion, a negative binomial distribution is used, i.e., N td ∼ NB(λ td , φ), where λ td is the reporting intensity. In epidemiology, van de Kassteele et al. (2019) model the two-dimensional surface in Table 1 using bivariate P-splines with a smooth reporting intensity λ td and day-of-the-week effects expressed as deviations from it. This intensity is modelled via a tensor product B-spline basis with pairwise products B i (t)B j (d), where the basis functions B i (t) capture the effect of occurrence time and the B j (d) model the 5 Table 1 : Triangular display with aggregated event counts. Only the event counts in the upper triangle are observed, whereas the occurred but not reported counts in the lower triangle have to be predicted. The coloured cells in the triangle represent the aggregation of events registered in continuous time in Figure 1 towards a coarser discrete time scale. Reporting delay reporting delay. For the day-of-the-week effects dummy variables are included in the predictor, taking the value 1 if a certain combination of t and d corresponds to a specific weekday, and 0 otherwise. The smooth intensity and day of the week effects are estimated simultaneously using a penalized negative binomial regression model and the iterative reweighted least squares algorithm. Bastos et al. (2019) correct for reporting delays in disease surveillance data with a negative binomial regression model for the N td that takes spatiotemporal variation into account. In their Bayesian framework the occurred-but-not-reported cases are estimated from the resulting posterior predictive distribution. A similar Bayesian modelling strategy is proposed in McGough et al. (2020) , which is then applied in Greene et al. (2021) to nowcast COVID-19 cases in New York City. In actuarial science the so-called chain ladder method is probably the most widely used technique to predict numbers of unreported claims. In this method the incremental event counts N td are independently Poisson distributed (Hachemeister and Stanard, 1975; Renshaw and Verrall, 1998) with E[N td ] = λ td = λ t · p d for all t = 1, . . . , τ and d = 0, . . . , τ − 1, where λ 1 , . . . , λ τ > 0 and p 0 , . . . , p τ −1 > 0 with sum constraint p 0 + . . . + p τ −1 = 1. The latter assumes all claims to be reported by the end of the delay period τ − 1. Beyond actuarial science, a similar modelling strategy was also proposed in (for instance) the disease modelling examples in Kalbfleisch and Lawless (1989) and Becker and Cui (1997) . The log-likelihood corresponding to the Poisson model formulation of the chain ladder method (also referred to as the Poisson log-linear model in Sellero et al., 1996) is where we denote Ψ = {λ 1 , . . . , λ τ , p 0 , . . . , p τ −1 } and c is a constant not depending on the model parameters. Maximizing the log-likelihood requires solving the following set of equations for Ψ, subject to all elements of Ψ being positive and τ −1 d=0 p d = 1. The Poisson maximum likelihood conditions equate the sums of the claim counts in each row and column of the observed upper triangle N r to their expected value counterparts. Mack (1991 Mack ( , 1993 points out that this set of equations can be solved recursively due to the triangular structure. A more standard approach to estimate the parameters is to formulate the model as a generalized linear model and use a numerical optimizer (see Taylor, 2000; England and Verrall, 2002; Wüthrich and Merz, 2008) . An alternative approach explicitly models the underlying occurrence and reporting processes, in line with the strategy outlined in the continuous time setting. The framework is then hierarchical where the N t follow an occurrence model and the N td |N t are multinomially distributed. In discrete time the assumptions (A1)-(A2) become (A1') The daily total event counts N t for t = 1, . . . , τ are independently Poisson distributed with intensity λ t = exp(x t α), where x t is the vector of covariate information corresponding to occurrence day t and α is a parameter vector. (A2') Conditional on N t , the event counts N td for d = 0, 1, 2, . . ., are multinomially distributed with probabilities p td = p td (θ, x td ). These reporting probabilities do not depend on the number of events that occurred on day t, sum to one and are modeled with parameter vector θ and covariate information x td . The discrete occurrence intensities and reporting probabilities can be retrieved from the continuous time assumptions in (A1), (A2), when we assume λ t to be piecewise constant between integer time points and properly integrate the reporting density f U |X (·|·; θ). That is, The split in cases with reporting delay zero and a strictly positive delay follows from the different shapes (i.e., the triangle in light gray versus the parallelogram in dark gray) when aggregating continuous data in Figure 1 . Note that the chain-ladder method discussed in Section 2.2.2 fits in the framework outlined by (A1')-(A2') if we impose the sum constraint p 0 + . . . + p τ −1 = 1. The latter assumes all claims to be reported by the end of delay period τ −1, an assumption typically made in an insurance context. The chain-ladder method assumes a stationary reporting process since the probabilities p d do not depend on the occurrence period t. We bundle the parameters to be estimated in Θ = (α, θ). Based on the model assumptions (A1')-(A2') and the thinning property of Poisson random variables, the daily event counts N td are independently Poisson distributed with intensities λ t (x t α) · p td (θ, x td ) for t = 1, . . . , τ and d = 0, 1, 2, . . .. Below we (often) use the more compact notation λ t (x t α) · p td (θ, x td ) = λ t · p td to make our writing more concise. In particular, the observed event count N r t on day t is Poisson distributed with intensity λ t · p r t where p r t = τ −t d=0 p td and the unreported event count N nr t is Poisson distributed with intensity λ t · p nr t where p nr t = 1 − p r t . Conditional on N r t , the observed daily event counts {N td | d = 0, 1, . . . , τ − t} are multinomially distributed with parameters N r t and {p td /p r t | d = 0, 1, . . . , τ − t}, since we have to account for the right-truncation of the reporting delay. The likelihood of the observed data can then be written as the product of a Poisson likelihood and a multinomial likelihood, 7 Equivalently, the likelihood can also be constructed by treating the daily event counts as rightcensored, since the number of unreported events is unknown, Indeed, this expression reduces to (4) by rewriting the sum over n using the Taylor expansion for the exponential function. The corresponding log-likelihood equals Similar to the continuous time setting, the log-likelihood in (5) contains terms which depend on the parameters of both the Poisson model for event occurrences (appearing in λ t ) and the reporting delay distribution (appearing in p td and p r t ). This complicates direct maximum likelihood estimation as it prevents separate optimization with respect to each of these parameter components. In line with the continuous time literature review in Section 2.1, two strategies for optimizing this likelihood are generally applied. A first strand puts focus on direct optimization of this likelihood using a standard numerical method such as Newton-Raphson. This is feasible, but we cannot rely on standard statistical modelling routines and we need to derive (analytically or numerically) the gradient and Hessian of the log-likelihood in (5). Höhle and an der Heiden (2014) propose a Bayesian approach for the joint modelling of the occurrence and delay process for discrete data collected in epidemiology. Günther et al. (2021) then apply this Bayesian model to construct nowcast estimates for the COVID-19 pandemic in Bavaria. In a second strand, a (heuristic) two-stage approach keeps the likelihood tractable by putting focus on the modelling of the reporting process, while using a nonparametric estimator for the occurrence process. Examples are Lawless (1994) Our literature overview for both continuous as well as discrete time models puts focus on either the direct optimization of the likelihood or the use of heuristic two-stage procedures. In the absence of right-truncation the complete likelihood analogue of (1) (continuous) or (5) (discrete) would split into an occurrence and a reporting likelihood and become tractable. This immediately suggests the use of the expectation-maximisation (EM) algorithm (Dempster et al., 1977) to handle the missing data problem, as is recognized by a series of contributions on nowcasting. Brookmeyer and Gail (1988) estimate a piecewise constant infection density while assuming a Weibull distribution for the incubation time, independent of the time of infection. With an iterative procedure similar to EM, Harris (1990) fits both categorical as well as continuous-time models for the incidence of infections and their incubation time. Pagano et al. (1994) use the EM algorithm but only focus on the estimation of the reporting delay distribution, while Bacchetti et al. (1993) and Bellocco and Marschner (2000) assume a known (incubation) distribution for the delays. These papers propose rather simple structures for the occurrence and reporting processes, e.g. a stationary reporting delay distribution, or categorical models with separate parameters for each t and d. In the next section we present a flexible yet practical estimation framework capable of dealing with any parametric structure for the occurrence and reporting processes. Because data are collected over discrete time units in administrative systems, we propose our general model for the intensities λ t (x t α) and the reporting delay probabilities p td (θ, x td ) in (A1')-(A2'). This framework not only allows for an easy calibration of each of 8 the models from Section 2.2.3, but also facilitates the estimation of new occurrence and reporting process specifications which were previously considered too difficult for joint optimization. Aiming for flexibility, we incorporate covariates in both the occurrence process of events and the reporting delay distribution. We capture evolutions over time or seasonal trends, by including fluctuations in both the event counts and their reporting delays by month, day of the month or day of the week of the occurrence date. Additionally, one can also model relationships with external covariates which might influence the events such as economic circumstances, business cycles and weather conditions. Day-specific particularities in the reporting delay (such as holiday effects) can be modeled using designated day probabilities. The case-study developed in Section 4 convincingly illustrates this, by letting the data set assist us in choosing an appropriate occurrence and reporting delay structure. 3 An EM algorithm for the joint estimation of the occurrence and reporting delay of events Starting from the log-likelihood in (5) we choose to treat the truncation as a missing data problem and employ an EM algorithm to simplify maximum likelihood parameter estimation. Consider the complete version of the data which augments the observed daily event counts from the upper part of the triangular display in Table 1 with the unknown values of the future event counts in the lower triangle. Given the complete data N , we construct the complete log-likelihood function which allows for a separate estimation of the parameters α of the event occurrence model (appearing in λ t (x t α)) and θ of the reporting delay distribution (appearing in p td (θ, x td )). From a numerical point of view, we propose to deal with the infinite sums over the reporting delay d in (6) This implies grouping all events reported with a delay of at least τ days in a remainder category. The latter are the events reported beyond the right boundary of the reporting triangle in Table 1 . The complete log-likelihood still factorizes into occurrence and reporting contributions and becomes The EM algorithm exploits the simpler form of L c (Θ; N ) by iterating between computing expectations in an E-step and maximization in an M-step. Applied to our setting, the numbers of unreported events are replaced by their expected values in the E-step and the log-likelihood of the augmented data is maximized in the M-step. While numerical optimization is required 9 in the M-step, the parameters of the event occurrence model can be estimated separately from the reporting delay parameters and standard software routines can be utilized in the absence of truncation. We discuss the steps of the EM algorithm in more detail for the kth iteration. E-step. We take the conditional expectation of the complete log-likelihood (7) given the observed data N r and using the current estimate Θ (k−1) of the parameter vector Θ: This requires to compute the expected values of the future event counts for t = 1, . . . , τ and the total daily event counts are The terms in (8) depend on the observed event counts and the parameter values in iteration k − 1 and these will therefore be treated as constants in the M-step where the maximization is with respect to the parameter vector Θ. M-step. We maximize the expected value (8) of the complete data log-likelihood obtained in the E-step with respect to the parameter vector Θ. In order to optimize (8) with respect to α as defined in model assumption (A1'), we have to maximize the terms related to the event occurrence model, which is a weighted Poisson log-likelihood, possibly including an offset term (to include an exposure-to-risk measure, as illustrated in the case study covered in Section 4). The parameter values optimizing (10) are denoted by α (k) . Updating the estimates for the parameters θ of the reporting delay distribution requires the maximization of We now have to optimize a right-censored likelihood with right-censoring at delay τ . Depending on the problem at hand, the likelihood can be simplified by omitting the censoring term when reporting events with a delay of more than τ − 1 days is highly uncommon, see our discussion in the case study of Section 4. Initial step. The first E-step of the EM algorithm with k = 1 requires a starting value Θ (0) for the parameter set. Our strategy is to first apply the chain ladder method (see Section 2.2) on the daily event counts to obtain initial predictions N td of the future event counts. Then, we initialize Θ by applying an M-step based on these initial event count estimates. More specifically, we use the Wüthrich and Merz (2015) formulation of the chain ladder method in terms of the chain ladder development factors, allowing for a fast and practical implementation. Therefore, we define the cumulative event counts as and estimate the development factors of the chain ladder method as The chain ladder method applies these development factors to the latest cumulative event count in each row to produce forecasts of future cumulative event counts: We use these chain ladder estimates for the cumulative event counts to initialize the expected incremental event counts as for t = 1, . . . , τ and apply an M-step, as outlined above with k = 0, to find decent starting values Θ (0) . For delays beyond the maximum observed delay (i.e. d > τ − 1) we put the initial values N (0) td equal to zero. Convergence. The log-likelihood (5) increases with each EM iteration (Dempster et al., 1977) . Given proper starting values, the sequence Θ (k) converges to the maximum likelihood estimator (MLE) of Θ corresponding to the incomplete data log-likelihood log L(Θ; N r ) in (5). The stopping criterion we apply is based on the relative change in the log-likelihood. Namely, we iterate until the absolute value of {log L(Θ (k) ; N r )−log L(Θ (k−1) ; N r )}/{0.1+log L(Θ (k) ; N r )} becomes smaller than 10 −8 . The parameter vector estimate upon convergence is denoted by Θ. It is expected that when the number of parameters is finite, asymptotic normality of the estimators holds under quite general conditions, though they depend on the specific problem settings. However, if the number of parameters increases with τ , such as in a chain ladder model, the asymptotic properties of the maximum likelihood estimators are difficult to obtain and require a more detailed and rigorous study. While an EM algorithm by itself does not output an estimated covariance matrix of the parameter estimators, with some additional effort, such matrix may be constructed. Louis (1982) showed how the observed information matrix can be expressed in terms of the gradient and second derivative of the complete data log-likelihood function. This approach avoids working with the incomplete data log-likelihood function and leads to analytic expressions which can be computed using the model specifications. The supplemented EM algorithm (Meng and Rubin, 1991) provides a computational, iterative approach using the negative second Hessian matrix of the expected complete-data log likelihood. Oakes (1999) explains how the matrix of second derivatives of the observed data log-likelihood can be obtained using the derivatives of the function Q, see (8), and provides expressions for exponential family models. For models for which analytical results are too hard to obtain, numerical derivatives could be used instead. 11 If one works with the log-likelihood from (5), in principle all known methods for likelihood estimation are applicable, in particular the use of information criteria such as AIC (Akaike, 1973) and BIC (Schwarz, 1978) for variable selection, see Claeskens and Hjort (2008, Chap. 2, 3) for more explanation about their use. However, when an EM algorithm is used for estimation, care needs to be taken since at convergence the function Q of (8) is not the maximized loglikelihood. Cavanaugh and Shumway (1998) developed AICcd which allows to directly work with Q though it requires an adjustment to the 'penalty' part of the AIC. Such an adjustment is used by Claeskens and Consentino (2008) for variable selection with missing covariate data, and by Dirick et al. (2015) for selection in multiple event mixture cure models. The complete data AIC value for use with an EM algorithm is defined by with the square matrices where I o estimates the limiting covariance matrix of Θ, for which methods such as described in Section 3.2 may be used. For model selection purposes, the AICcd values are computed for each considered model and the model with the smallest value of AICcd gets selected. The matrix I c ( Θ; N r ) is also the main ingredient for model diagnostic measures after using an EM algorithm, see Zhu et al. (2001) and Barreto-Souza and Simas (2017) . To define the generalized Cook's distance to investigate the influence of a single observation, thus of a single N td with t = 1, . . . , τ and d = 0, . . . , τ − 1, we use a one-step approximation as in Zhu et al. (2001) to avoid refitting the model. DefineQ Large values should encourage further investigation of those observations. We refer to the mentioned references for other influence measures that can be used with an EM algorithm. Using the estimated parameter vector Θ, we predict the number of daily unreported events in the lower triangle of Table 1 . Point estimators for all N td ∈ N nr can be obtained using the expected values N td = λ t p td . Similarly, the total number of unreported events per day is estimated by N nr t = ∞ d=τ −t+1 N td = λ t p nr t and the total number of unreported events over all occurrence days by N nr = τ t=1 N nr t . Moreover, under the model assumptions (A1')-(A2'), the future daily number of events N td (t ≤ τ, t+d > τ ) are independently Poisson distributed and we thus have that N td ∼ Poisson(λ t p td ), N nr t ∼ Poisson(λ t p nr t ) and N nr ∼ Poisson ( τ t=1 λ t p nr t ). This allows us to construct prediction intervals and to make probabilistic statements concerning the number of unreported events after replacing the intensities by their maximum likelihood estimates. With a model defined on a daily level (by means of example) these unreported events can be divided into daily nowcasts by occurrence date or by reporting date, as we demonstrate in the case study in Section 4. The latter is of particular interest when using our model in practice 12 as it gives the analyst a refined view on the future reporting times (at daily level or aggregated e.g. by future reporting weeks or months). An out-of-time evaluation is then a valuable tool to assess the predictive performance of a proposed nowcasting model. Hereby, we restrict the time window, say to events with occurrences 1 ≤ t ≤ τ < τ and reporting delays d ≥ 0 and t + d ≤ τ , calibrate the model with the adjusted evaluation date τ and use it to nowcast the events with 1 ≤ t ≤ τ and τ ≥ t + d > τ . These nowcasts can then be compared to the actual observed reporting of events. We are particularly in favor of performing a moving window out-of-time evaluation, where we repeat the single out-of-time evaluation multiple times with different evaluation dates τ . This not only allows to assess the stable predictive performance of a model across multiple evaluation dates, but also allows to verify possible changes in the parameters when calibrated over different subsets of data. When such (substantial) changes are detected, the use of change-points in the modelling of the occurrence and reporting processes may be an interesting direction to explore, see Tabnak et al. (2000) for an example in modelling reporting delays in AIDS surveillance data. The chain ladder introduced in Section 2.2 imposes a multiplicative structure on the reporting intensity, with a stationary reporting delay distribution. Classic ways to estimate this model either rely on a log-linear Poisson regression model or use the development factors discussed in the initialization step of an EM algorithm (see Section 3.1). We now revisit the estimation of the chain ladder method and discuss how its parameters can equivalently be estimated using our proposed estimation framework. The complete log-likelihood related to the chain ladder method is similar to (6) with the difference that the reporting delay probabilities p d do not depend on the occurrence period t and the sum over d runs until τ − 1. The latter is linked to the fact that the chain ladder method does not allow for extrapolation beyond the range of data (cfr. England and Verrall, 2002 , for extensions of the classical chain ladder method which involve the estimation of tail factors). The E-step is the same as (9) In case the chain ladder factors (12) are used in the initial step (see Section 3.1), such that in fact N (0) td = λ t p d for t = 2, . . . , τ and d = τ − t + 1, . . . , τ − 1 due to the equivalence with the Poisson model, convergence is reached immediately the first time we apply the M-step above. Indeed, we then obtain where we used (2). Similarly, using (3), we find Using an EM algorithm, the iterative steps are easy and intuitive and, upon convergence, the same parameter estimators for λ t and p d are obtained compared to direct maximum likelihood optimization. When structuring the occurrence and reporting delay parameters (as we propose in Section 2.2.3), the model can no longer be solved analytically nor formulated as a generalized linear model. An EM algorithm then offers an elegant solution. 13 We analyze a data set with the occurrence and reporting dates of claims in a portfolio of general liability insurance policies for private individuals from a European insurance company. The goal is to predict (or: nowcast) the number of claims that already incurred in the past, but are not yet reported to the insurance company. Detailed claim and policy information is available from January 2000 until August 2009. This includes the occurrence date of a claim, the time between occurrence and reporting of the claim to the insurance company and an exposure-to-risk measure. The online supplement further details the exposure measure that is available in this dataset. To enable out-of-time prediction, we restrict our analysis to claims that have occurred between January 1, 2000 and August 31, 2004. We set the new evaluation date, say τ , at the end of this time window, on August 31, 2004 and want to estimate the total incurred but not reported (IBNR) claim count, as well as the reporting dates of these IBNR claims. Based on the full data set until August 2009, 176 671 claims have occurred during this time window. Due to a reporting delay, only 174 624 of these have been reported by the evaluation date, as visualized in Figure 6 in the online supplement. The remaining 2047 claims are IBNR claims, i.e. claims which have occurred between January 2000 and August 2004 but have only been reported after the evaluation date and before the end of the observation period. In line with Section 2.2.3, we work in discrete time and explicitly model the occurrence and reporting processes. Occurrence model. Let N t be the total insurance claim counts that occur on day t, for t = 1, . . . , τ where t = 1 is Jan 1, 2000 and t = τ refers to August 31, 2004. We model the N t as independent Poisson distributed random variables with intensity λ t , see assumption (A1'), structured as where e t is the exposure on day t, month(t) indicates the month, dow(t) the day of the week and dom(t) the day of the month to which t belongs. We also include parameters for occurrence days t on January 1 and December 31, because on these days many claims occur. Reporting model. Figures 7a and 7b in the online supplement illustrate how reporting probabilities decrease over time and are low on Saturdays and Sundays. Supported by these empirical findings we structure reporting delay as the product of week probabilities and day (or intra-week) probabilities p td (θ, x td ) = p W tw (θ, x t ) · p intra td . Here, p W tw denotes the probability of reporting an event from occurrence day t in the wth week after occurrence. The intra-week reporting probabilities are denoted with p intra td and sum to 1 over the days within the reporting week. The reporting week probabilities (φ + µ t ) φ+w for w = 0, 1, 2, . . . , are modeled using the probability mass function of a negative binomial distribution with expected value µ t = exp(x t θ) and variance µ t + µ 2 t /φ, where φ is the dispersion parameter and x t is the covariate vector corresponding to occurrence day t. Figure 7b in the supplement indicates that the negative binomial distribution is a good fit. We structure the µ t as follows, including parameters for occurrence day t on January 1 and December 31. A first modelling strategy for the reporting delay day probabilities symbolically writes these probabilities as p intra td = P(dow(t), wday(t, t + d)) , where wday(t, t + d) orders the working days within the reporting week with separate levels for Saturday and Sunday. For example, when the claim occurred on a Thursday, Friday is wday2 = wday(t, t + 1) and Monday is wday3 = wday(t, t + 2). The 7 × 7-matrix P then contains the day probabilities related to the first week. Each element in P is between 0 and 1 and all row sums equal 1. Alternatively, we project the seven intra-week reporting probabilities p intra td to six probabilities, inspired by the reverse time strategy of . This allows to leave the sum-to-one restriction on the intra-week reporting probabilities. Moreover, we can now include covariate information related to both the day of occurrence t and the actual reporting date t + d. We define with w = d 7 (the reporting week after occurrence) and j = d − 7w the intra-week day of reporting. Expression (16) takes the form of a discrete time hazard rate, but due to the reverse time strategy we now condition on failure before day 7w + j instead of survival. We structure logit(q t,d ) = x td γ = γ 0 + γ workdays(t,t+d) + γ dow(t+d) + γ holiday(t+d) , where workdays(t, t + d) counts the number of elapsed working days in the current reporting week, i.e. excluding the weekend and holidays, and holiday(t + d) indicates whether t + d is a holiday. Reporting on holidays is exceptional in our data set. The inclusion of dummy variables for holidays allows to capture this empirical fact in the intra-day reporting probabilities, while distinguishing between national holidays on which all companies are closed and two unofficial holidays (New Year's Eve and Good Friday). We also study some competing modelling strategies proposed in the literature. We consider two versions of the models discussed in Section 2.2.2 where the N td are independent Poisson distributed random variables with mean λ td structured as or λ td = e t · exp α t + β holiday(t+d) + β holiday(t+d+1) + β dow(t+d) + β delay(d) . ( 19) 15 While (18) uses covariates capturing relevant information from t, (19) includes a parameter α t for each t. Both model specifications structure the information on the reporting delay with a parameter β delay(d) for each observed delay (see e.g. Bastos et al., 2019) as well as β dow(t+d) referring to the day of the week of the effective reporting date t + d. The β holiday(t+d) and β holiday(t+d+1) correspond to reporting on a holiday, or the day after. No explicit reporting delay distribution is proposed, but instead a large number of parameters is used. We also calibrate the chain ladder method using a yearly grid. That is, where λ year(t) is the effect of the year of occurrence to which day t belongs. Reporting delay is modelled independently from the occurrence, with a parameter per year of reporting, thus p year(t+d)−year(t) describes the effect of the reporting year of the claims that occurred on day t and were reported with d days of delay. For the models proposed in Section 4.2, we use the EM algorithm of Section 3 to estimate the parameters in the occurrence and reporting processes. For the models in Section 4.3, maximum likelihood estimates are obtained with the glm4 routine in the R package MatrixModels (Bates and Maechler, 2015) . To grasp the insights obtained via an explicit specification of the occurrence and reporting processes, we focus on the maximum likelihood estimates of the parameters in the occurrence model (13) combined with the reporting model with reporting week probabilities in (14) and intra-week reporting probabilities in (15). We evaluated the impact of the censoring term in the reporting delay likelihood used in the M-step, see (11). We obtained similar parameter estimates when omitting the reporting of events with a delay of more than τ −1 days compared to the estimates obtained under the simplified likelihood that drops the censoring term in (11) (comparison not shown). Therefore, the results below are obtained with the simplified reporting likelihood. The parameter estimates of some alternative specifications are deferred to Section A.3 in the online supplement. The effects related to the categorical predictors in the Poisson occurrence model specified in (13) are visualized in Figure 2 . Figure 8 in the online supplement shows the parameter estimates for the negative binomial regression model of the reporting delay in weeks, while Table 2 in the online supplement collects the estimated intra-week probabilities from (15). The maximum likelihood estimates are shown along with Bonferroni adjusted simultaneous 95% confidence intervals based on the inverse of the expected information matrix. The intercept in the Poisson model is estimated as -5.965 with 95% confidence interval [−6.020, −5.910]. Figure 2 reveals a seasonal pattern in which the number of claims rises in the middle of the year and falls around the year end. Claims most likely occur in June and least likely in December with an estimated difference in expected value of 42%. The calibrated day of the week effect shows an increase in the expected number of claims on Saturdays and a slight decrease on Tuesdays and Thursdays. The categorical effect of the day of the month shows a remarkable pattern which is similar in both the claim occurrence and the reporting delay model. On the 1st and 15th, the number of claims as well as the reporting delays have significantly higher expected values. To a lesser extent, this is also present for the 5th, 10th, 20th, 25th, and 30th or 31st day of each month. This pattern can most likely be explained by rounding errors of the occurrence date when insureds have to report a claim which took place several weeks or months ago. Since this misreporting of dates is more likely to occur for claims which are only reported after a longer time period, we simultaneously see an increase in the expected reporting delay 16 for claims occurring on these rounded month days. There is a high number of occurrences on December 31 and January 1 with on average 112%, respectively 135%, more claims. However, the reporting delay of these claims is not significantly different from the other ones. The main goal of our proposed models is to estimate the number of unreported claims. Using an out-of-time evaluation with τ = August 31, 2004, we know there are 2047 IBNR claims in the full data set, which runs until August 2009, of which we know the corresponding occurrence date and reporting delay. We first demonstrate the insights that can be derived from a nowcasting model using the occurrence specification in (13) and the reporting model in (14) combined with (15). Calibrated on the data from January 1, 2000 to August 31, 2004 this model estimates the total number of IBNR claims as N nr = 2055.8, which is close to the actual count. Moreover, the distributional assumptions of our model can be used to provide a 95% prediction interval given by [1697, 2145] , see Section 3.4. Furthermore, since the model is defined on a daily level, the total IBNR prediction can be divided into daily forecasts by occurrence date and by reporting date. To illustrate this strong point of our model, we predict the IBNR claim counts by reporting date in Figure 3 . Figure 9 in the online supplement shows a similar split of the IBNR claims by occurrence dates. In Figure 3 we disperse the total predicted IBNR claim count according to the date on which the claims will be reported to the insurer. It means we now focus on estimating τ t=1 N t,ρ−t for ρ = τ + 1, τ + 2, . . ., i.e. the number of unreported claims reported on day 1, 2, . . . of the out-of-time period. This forms an appealing way to use our model in practice as it gives the insurer a refined view on the reporting times. The predictions on a daily level in Figure 3a are accompanied by 95% simultaneous prediction intervals and range from September 1, 2004 , until November 7, 2004 . the first two months following our training period. When compared to the out-of-time actual values, the forecasts clearly capture the downward trend in the reporting of IBNR claims and the nearly absence of reporting in weekends. This is primarily the case 17 (a) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q due to the day probabilities in our model which reflect the day-specific aspects of the reporting delay. In Figure 3b (resp. 3c) the reporting dates are grouped by weeks (resp. months) after the evaluation date and the IBNR claim counts are predicted for the next 26 weeks (resp. 12 months). We notice how, also over longer time spans, the predictions by reporting week or month follow the pattern observed in the actual unreported counts. We compare the performance of different model specifications via a moving window out-of-time evaluation. We adjust the evaluation date τ , which was previously chosen to be August 31, 2004 , to any date in between August 31, 2003 , and August 31, 2004 . For each such τ , we refit the model based on the observed data by that date, and produce an estimate of the total unreported count N nr = τ t=1 N nr t corresponding to claims that occurred before or at τ . Figures 4a and (zoom in) 4b show the predictions obtained with the occurrence specification in (13) and the reporting model in (14) and (15), as well as the actual total IBNR claim counts based on the full data set. Overall, the estimates follow the seasonal pattern quite well. The deviations from the observed trend in the first months of 2014 might be explained by variations from year to year in the effect of the month on the occurrence and reporting of claims. In our model, we assume the seasonal monthly pattern to be the same over the different years. As a result, the parameter estimates are averaged values. The model could potentially be refined (by including extra covariates) if expert-knowledge is available on how changes in e.g. product design and conditions, the claims handling process, the business environment or legislation, may impact the claim arrival process or the reporting delay. We notice a seven day pattern in the number of unreported claims in Figure 4b , which results from the delayed reporting during the weekend. This aspect is incorporated in our model through the intra-week reporting probabilities in (15). Moreover, we observe an increase in the number of IBNR claim counts around the end of the year. This is due to the fact that the insurance company is closed around the holidays, preventing any claims from being reported at that time. This effect is only partially captured by the dummy variables for the occurrence dates jan1 and dec31 in (13) and (14). We compare these predictions to two other model specifications that benefit from an EM algorithm for calibration. Figure Figure 15 . Figure 5a shows predictions obtained with the occurrence specification in (13) and the reporting model in (14) and (17). In the specification of the intra-week probabilities this model designates dummy indicators for reporting on national and unofficial holidays in (17), which leads to clear improvements around the end-of-year holidays as compared to Figure 4 . The yearly chain ladder method is used in Figure 5b . While this method overall performs well, detailed insights on the number of reportings in e.g. the next days, weeks or months can not be deduced from this model calibrated on data aggregated into yearly grids. Moreover, each year ends on a different day of the week. Since the chain ladder averages the reporting probability over past years, the estimated 7-day week pattern in the predictions is slightly misaligned with the actual pattern in the data. Crevecoeur et al. (2019) show that when more years of data are available the 7-day pattern completely disappears from the yearly chain ladder predictions, which then results in a systematic underestimation of the IBNR claim count on Saturdays and Sundays. Experiments to capture the week pattern with a more granular chain ladder method resulted in a sharp loss in the overall performance (not shown). The nowcasting models illustrated in Figure 5c and 5d directly specify a structure for the reporting intensity. In (18) (shown in panel 5c) the information on the occurrence date t is structured via a set of covariates, whereas panel 5d calibrates a parameter for each occurrence date t (as in (19)). With this type of models it is more difficult to capture insights from the calibrated parameters, as there is no explicit distinction between the dynamics in the occurrence and the reporting process. On the one hand, with a structured representation of the influence of the occurrence dates t does not sufficiently capture the actual pattern in the data. The model 19 (13), (14) and (17), (b) yearly chain ladder method specified in (20), (c) direct specification of reporting intensity in (18), (d) direct specification of reporting intensity in (19). Red triangles: data, blue circles: model. specification with a parameter for each occurrence date t on the other hand leads to volatile nowcasts, e.g., one prediction was removed (96 995 predicted IBNR claims on Sunday, January 11 2004). Since reporting on a Sunday is extremely rare in our data set the occurrence and immediate reporting of a claim on this date resulted in an unreasonably high λ t estimate. We review and structure the literature on nowcasting the occurrence of events subject to a reporting or observation delay. Our literature overview bridges multiple disciplines and incorporates papers from the actuarial, statistical and epidemiological literature. We propose a general modelling and estimation framework capable of dealing with any parametric structure for both the occurrence as well as the reporting process. The framework uses regression models for count data and treats the right truncation of the reporting delays as a type of missing data. Applying an EM algorithm strongly simplifies maximum likelihood estimation as it allows for the use of standard statistical software to fit the regression models. As an example, we demonstrate how the parameter estimators of a classical method for nowcasting with aggregated data, also known as the chain ladder method, can be obtained using an EM approach. We investigate the performance of our proposed framework on an insurance case study where the focus is on predicting the number of incurred but not (yet) reported claims in a European portfolio of general liability insurance policies for private individuals. We benchmark the predictions obtained with our proposed strategy against the results from other nowcasting models that directly structure the reporting intensity and can be calibrated with standard statistical routines. The presented model provides a better understanding of the claim occurrence as well as reporting delay process and leads to a refined view on the future reporting times (at daily level or aggregated e.g. by future reporting weeks or months). We indicate some possible directions for future research. First of all, we would like to 20 stress that the provided estimation framework involving an EM algorithm is applicable in a wide context (e.g. epidemiology and reliability engineering) whenever cases are only reported after a delay. This provides a more desirable alternative over the ad hoc methods or twostep approaches used earlier in the literature. The estimation procedure described in Section 3 is readily applicable to other contexts after specifying a suitable parametric model for the reporting delay probabilities. It would then be interesting to explore different distributional assumptions for the daily total event counts and the reporting delay structure. The reporting delay can be easily altered within the given framework to, for instance, a zero-inflated or hurdle distribution or a more heavy-tailed distribution. Relaxing the Poisson assumption for the daily total event counts is also feasible but might complicate the E-step in which we now relied on the thinning property of Poisson distributions. The EM framework is however compatible with latent underlying processes affecting the occurrence of events such as hidden Markov models or shot noise process (see e.g. Badescu et al., 2019; Avanzi et al., 2016) . Another promising approach would be to investigate how time series models for counts (see Jung and Tremayne, 2011 , for an overview) could be introduced in this setting. Assisted by the data we specify a structure for the reporting probabilities p td (θ, x td ) introduced in assumption (A2'). The barplot in Figure 7a shows the empirical reporting probabilities in the first 28 days after occurrence for claims that occurred on a Monday. Reporting probabilities decrease over time and are low on Saturdays and Sundays. The claims pictured in Figure 7a all occurred on a Monday. Therefore the first weekend corresponds to delays of 5 (Saturday) and 6 (Sunday) days. Following this intra-week pattern, we structure reporting delay as the product of week probabilities and day (or intra-week) probabilities: Here p W tw denotes the probability of reporting an event from occurrence day t in the w-th week after occurrence. The intra-week reporting probabilities are denoted with p intra td . The latter take values between 0 and 1 and sum to 1 over the days within the reporting week. Figure 7b indicates that the negative binomial distribution is a good fit for the empirical week probabilities. Complementary to the results shown in the paper, we give a detailed overview and discussion of the parameter estimates obtained for various model specifications. A.3.1 Occurrence and reporting processes: reporting week and intra-week probabilities We focus on the occurrence model (13) in combination with a reporting specified by the reporting week probabilities in (14) and the intra-week reporting probabilities in (15). We report the estimated day probabilities from matrix P in Table 2 . Only a small fraction of claims is being reported on Saturdays and nearly none on Sundays. In fact, in the entire observed part of the data, only 3 claims have been reported on Sunday. The maximum likelihood estimates of the parameter vector α from the claim occurrence model are shown in Figure 2 in the paper. Next to these, the maximum likelihood estimates of the parameter vector θ used in the negative binomial regression model of the reporting delay in weeks are displayed in Figure 8 . All estimates are shown along with 95% confidence intervals based on the inverse of the expected information matrix. Simultaneous confidence intervals are constructed in these graphs using the Bonferroni correction to adjust for multiple comparisons. For completeness, we also report that in the negative binomial model the intercept is estimated as 1.867 (with 95% confidence interval [1.717, 2.018]) and the dispersion parameter φ as 0.177. We illustrate how the occurrence model (13) in combination with the reporting structure defined in (14) and (15) is used to forecast the number number of unreported claims for past occurrence dates. In Figure 9a we plot point estimates and 95% simultaneous prediction intervals for N nr t with t corresponding to occurrences dates in between July 1, 2004, and August 31, 2004, i.e. the Figure 8 : Maximum likelihood estimates and 95% simultaneous confidence intervals for θ corresponding to the categorical effects of the month, the day of the week and the day of the month of the occurrence date in the negative binomial reporting delay model. last two months from our training period. The predictions follow the same trend as the actual IBNR claim counts derived from the full data set until August 2009. We notice how IBNR claims are elevated on the first day and middle of each month, in line with our earlier findings. In Figure 9b (resp. Figure 9c) we group the occurrence dates by weeks (resp. months) prior to the evaluation date and show the IBNR claim count predictions corresponding to the past 26 weeks (resp. 12 months). We notice how, also over longer time spans, the predictions by occurrence week or month follow the pattern observed in the actual unreported counts. A.5 Modelling the intra week probabilities using a reverse time strategy We consider the occurrence model specified in (13) in combination with a reporting delay distribution with reporting week probabilities in (14) and intra-week probabilities obtained via the reverse time strategy in the presence of covariates, as specified in (16). Figure 10 shows the estimates for the occurrence model in (13), the estimates for the reporting week probabilities are shown in Figure 11 and the intra-week reporting probabilities obtained via the reverse time strategy are in Figure 12 . We now consider two nowcasting models that directly structure the reporting intensity. These models are discussed in Section 2.2.2 of the paper, where the N td are independent Poisson distributed random variables with mean λ td . Figure 13 shows the parameter estimates for the specification from (18) and Figure 14 displays the parameter estimates for the regression structure in (19). In addition to the results shown in Figure 5 in the paper (with evaluation dates between November 15 to February 15, 2004), we show the estimates of the total IBNR claim count with moving 29 (a) q qqq q qqqq q qq q q q qqqq q qqq q q qqqq q q qq q q qq qq q q q q q q qq qq q qqq qq q qqq qq q qqq q q q qq q q qq Figure 15a shows predictions obtained with the occurrence specification in (13) and the reporting model in (14) and (17). Results obtained with the yearly chain ladder method are displayed in Figure 15b . Panel 15c shows the estimates for the IBNR claim counts obtained with a nowcasting model that directly specifies the reporting intensity along (18), whereas the model leading to panel 15d uses (19). (14) and (17). 95% simultaneous confidence intervals are constructed using the Bonferroni correction and the inverse of the expected information matrix. (13) and (17). 95% simultaneous confidence intervals are constructed using the Bonferroni correction and the inverse of the expected information matrix. (13) and (14). 95% simultaneous confidence intervals are constructed using the Bonferroni correction and the inverse of the expected information matrix. Information theory and an extension of the maximum likelihood principle Micro-level stochastic loss reserving for general insurance A micro-level claim count model with overdispersion and reporting delays Backcalculation of HIV infection rates A marked Cox model for the number of IBNR claims A marked Cox model for the number of IBNR claims: Estimation and application Improving estimation for beta regression models via EM-algorithm and related diagnostic tools A modelling approach for correcting reporting delays in disease surveillance data MatrixModels: Modelling with Sparse And Dense Matrices Estimating a delay distribution from incomplete data, with application to reporting lags for AIDS cases Joint analysis of HIV and AIDS surveillance data in back-calculation A method for obtaining short-term projections and lower bounds on the size of the AIDS epidemic An Akaike information criterion for model selection in the presence of incomplete data Variable selection with incomplete covariate data Model selection and model averaging Regression models and life-tables Modeling the number of hidden events subject to observation delay Maximum likelihood from incomplete data via the EM algorithm An Akaike information criterion for multiple event mixture cure models Nowcasting pandemic influenza A/H1N1 2009 hospitalizations in the Netherlands Stochastic claims reserving in general insurance A statistical algorithm for the early detection of outbreaks of infectious disease Nowcasting the COVID-19 pandemic in Bavaria Nowcasting for real-time COVID-19 tracking in New York City: An evaluation using reportable disease data from early in the pandemic Claims reserving in continuous time: a nonparametric Bayesian approach IBNR claims count estimation with static lag functions Reporting delays and the incidence of AIDS Bayesian nowcasting during the STEC O104:H4 outbreak in Germany Smooth backfitting of proportional hazards with multiplicative components Predicting IBNYR events and delays: I. Continuous time Useful models for time series of counts or simply wrong ones? Methods for the analysis and prediction of warranty claims Inference based on retrospective ascertainment: An analysis of the data on transfusion-related AIDS Regression models for right truncated data with applications to AIDS incubation times and reporting lags Nonparametric analysis of truncated survival data, with application to AIDS Adjustments for reporting delays and the prediction of occurred but not reported events Finding the observed information matrix when using the EM algorithm A simple parametric model for rating automobile insurance or estimating IBNR claims reserves Distribution-free calculation of the standard error of chain ladder reserve estimates Continuous chain ladder: Reformulating and generalizing a classical insurance problem Nowcasting by Bayesian smoothing: A flexible, generalizable model for real-time epidemic tracking Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm Prediction of outstanding liabilities in non-life insurance Prediction of outstanding liabilities II. Model variations and extensions Detection of infectious disease outbreaks from laboratory data with reporting delays Modelling reporting delays for outbreak detection in infectious disease data Direct calculation of the information matrix via the EM algorithm Regression analysis of censored and truncated data: Estimating reporting-delay distributions and aids incidence from surveillance data A stochastic model underlying the chain-ladder technique Estimating the dimension of a model Reporting delay: a review with simulation study and application to Spanish data A change-point model for reporting delays under change of AIDS case definition Nowcasting the number of new symptomatic cases during infectious disease outbreaks using constrained P-spline smoothing Understanding reporting delay in general insurance Explicit moments for a class of micro-models in non-life insurance A review on coarse warranty data and analysis Stochastic claims reserving methods in insurance Stochastic Claims Reserving Manual: Advances in Dynamic Modeling Case-deletion measures for models with incomplete data The authors are grateful to the editors, associate editor and the referee for the valuable comments and suggestions. Supplementary material for "Modeling the occurrence of events subject to a reporting delay via an EM algorithm"This Appendix collects additional figures and tables regarding the case study on insurance nowcasting, discussed in Section 4 of our paper entitled Modeling the occurrence of events subject to a reporting delay via an EM algorithm. We analyze a data set with the occurrence and reporting dates of claims in a portfolio of general liability insurance policies for private individuals from a European insurance company. The data set used in this case study has also been studied in e.g. Antonio and Plat (2014) with a stochastic model for the development of claims after reporting and Crevecoeur et al. (2019) who also nowcast the IBNR claims, but exclusively put focus on the dynamics in the reporting process. Exposure is available in this dataset by month from January 2000 onwards, expressed as earned exposure, i.e. the fraction of policies actually exposed to risk during the period. This means that a policy providing insurance coverage only during 10 days in January will contribute 10/365th to the exposure of that month. Earned exposure is not available on a daily level so instead we transform the monthly exposure to daily exposure by dividing by the number of days in each month. This accounts for the varying month lengths.To enable out-of-time prediction, we restrict our analysis to claims that have occurred between January 1, 2000 and August 31, 2004. We then study the nowcasting problem using evaluation date, say τ , at the end of this time window, on August 31, 2004 and want to estimate the total incurred but not reported (IBNR) claim count, as well as the reporting dates of these IBNR claims. Based on the full data set until August 2009, 176 671 claims have occurred during the time window between January 1, 2000 and August 31, 2004. Due to a reporting delay, only 174 624 of these have been reported by the evaluation date, as depicted in blue in the daily run-off triangle in Figure 6 . The remaining 2047 are IBNR claims, i.e. claims which have occurred between January 2000 and August 2004 but have only been reported after the evaluation date and before the end of the observation period. These are graphically illustrated in red in Figure 6 .