key: cord-0139163-q1pqjfv6 authors: Mukhopadhyay, Subhadeep title: Modelplasticity and Abductive Decision Making date: 2022-03-06 journal: nan DOI: nan sha: 1a4f6a9154c2145bdc53282eaccedf35d59e54f9 doc_id: 139163 cord_uid: q1pqjfv6 `All models are wrong but some are useful' (George Box 1979). But, how to find those useful ones starting from an imperfect model? How to make informed data-driven decisions equipped with an imperfect model? These fundamental questions appear to be pervasive in virtually all empirical fields -- including economics, finance, marketing, healthcare, climate change, defense planning, and operations research. This article presents a modern approach (builds on two core ideas: abductive thinking and density-sharpening principle) and practical guidelines to tackle these issues in a systematic manner. Uncertainty of knowledge`Knowledge of uncertainty " Usable knowledge. (1) A decision analyst confronts data X 1 , . . . , X n equipped with a tentative (imprecise and uncertain) probabilistic model f 0 pxq of the underlying phenomena. The challenge then boils down to effectively using the misspecified model f 0 pxq to learn from data and to apply that knowledge for informed decision-making. Rao's uncertainty principle suggests the following three-staged approach, which we call the 'tripod of model-building': The purpose of this article is to describe a general statistical theory that permits us to implement this three-staged model-building procedure for data analysis and decision-making. 'All analysts approach data with preconceptions. The data never speak for themselves. Sometimes preconceptions are encoded in precise models. Sometimes they are just intuitions that analysts seek to confirm and solidify. A central question is how to revise these preconceptions in the light of new evidence.' -Heckman and Singer (2017) Empirical scientific inquiry typically starts with a simple yet believable model of reality (model-0) and aims to sharpen existing knowledge by gathering new observations. We observe a random sample X 1 , . . . , X n 9 " F 0 . By " 9 "" we mean that F 0 is an 'approximately correct' structured provisional model for X that is given to us by subject-matter experts. We like to intelligently use 4 the misspecified f 0 pxq to extract new knowledge from the data 5 . Creating knowledge-guided statistical models. The core mechanism of our process involves: (i) inspecting whether the structured provisional model-0 is still a good fit in light of fresh data; (ii) if not, then we like to know what's new in the data that cannot be tackled by the current model; and, finally, (iii) repair the current misspecified model in order to cope with the new reality. However, the question remains as to how can we design an inference machine that can offer these successively fine-grained insights? To address this question, we will describe a new statistical model building principle, called the 'density-sharpening principle.' We introduce a two-system model that accommodates the decision maker's concern for misspecification of the starting expert-guided model. Definition 1 (Two-component model). X be a general (discrete, continuous, or mixed) random variable with true unknown density f pxq and cdf F pxq. Let f 0 pxq represents a simple approximate model for X with cdf F 0 pxq, whose support includes the support of f pxq. Then the following density decomposition formula holds: here dpu; F 0 , F q is defined as where Q 0 puq " inftx : F 0 pxq ě uu for 0 ă u ă 1 is the quantile function. The function dpu; F 0 , F q is called 'comparison density' because it compares the initial model-0 f 0 pxq with the true f pxq and it integrates to one: However, we will interpret the d-function as the density-sharpening function (DSF), since it plays the role of "sharpening" the initial model-0 to hedge against its potential misspecification. To simplify the notation, dpF 0 pxq; F 0 , F q of eq. (2) will be abbreviated as d 0 pxq. A few remarks on density-sharpening law: 1. The model building mechanism of Definition 1 provides a statistical process of transforming and refining a crude initial model into a useful one for better decision-making. 2. Note that if dpu; F 0 , F q ‰ 1, i.e., if dpu; F 0 , F q deviates from uniform distribution then change of probability assignment is needed to embrace the current scenario. The density sharpening mechanism of (2) prescribes how to revise the old probability assignments in light of new evidence. 3. Similar to Rao's uncertainty law (1), we can also write down a simple logical equation that captures the essence of the density-sharpening based model building principle (def. 1): Misspecified model-0ˆKnowledge of misspecification " Upgraded model-1 (4) Interpretation of the components: the first component is the starting imprecise model f 0 pxq, coming from expert knowledge. The second component d 0 pxq is the quality-assurer of the model that manages the risk of misspecification of the initial f 0 pxq. d 0 pxq sharpens the decision-makers initial mental model by extracting knowledge from data that is previously unknown, which justifies its name-density sharpening function (DSF). Finally, the model-0 is "stretched" by d 0 pxq following eq. (2) (only when the ideal scenario is different from the expected one) to incorporate the newly discovered information into the revised model. The class of d-sharp distributions turns the uncertain knowledge-distribution f 0 pxq into a usable distribution by properly sharpening using d 0 pxq. The density-sharpening law provides a mechanism of building a model f pxq for the data X 1 , . . . , X n by comparing it with the assumed working model f 0 pxq. To apply the formula (2), we need to estimate d 0 pxq from data. 6 And we call this learning process 'comparison coding' because d 0 pxq codes how surprising the current situation is in light of the model-0 by contrasting expectations with reality. Since the density-sharpening function d 0 pxq :" dpF 0 pxq; F 0 , F q is a function of F 0 pxq, we can approximate it by a linear combination of polynomials that are function of F 0 pxq and orthonormal with respect to the base-model f 0 pxq. One such orthonormal system is the LPfamily of polynomials (Mukhopadhyay and Parzen, 2020 , Mukhopadhyay, 2021a , which can be constructed as follows. For an arbitrary continuous F 0 , define the first-order LP-basis function as standardized F 0 pxq: 6 To keep the theory of estimation simple, we will mainly focus on the X continuous case. A detailed account for the discrete case can be found in Mukhopadhyay (2021a) . Note that E 0 pT 1 pX; F 0 qq " 0 and Var 0 pT 1 pX; F 0 qq " 1. Next, apply Gram-Schmidt procedure on tT 2 1 , T 3 1 , . . .u to construct a higher-order LP orthogonal system T j px; F 0 q: T 4 px; F 0 q " ? 9 70F 4 0 pxq´140F 3 0 pxq`90F 2 0 pxq´20F 0 pxq`1 and so on. Compute these polynomials by performing the Gram-Schmidt process numerically, which can be done using readily available computer packages like R or python. Definition 2 (Comparison coding). Expand comparison density in the LP-orthogonal series To estimate the unknown LP-Fourier coefficient, note that: Replacing LPrj; F 0 , F s with its plug-in estimator in (9) we get where Although (11) provides a robust nonparametric comparison-coding procedure, it has one draw-back: the estimated r d may be unsmooth due to the presence of a large number of small noisy LP-coefficients. To avoid unnecessary ripples in r d, we need to isolate the small number of non-zero LP-coefficients. Our denoising strategy goes as follows (Mukhopadhyay, 2021b) : sort the empirical Ă LPrj; F 0 , F s in descending order based on their absolute value and compute the penalized ordered sum of squares. This Ordered PENalization scheme will be referred as OPEN model-selection method: OPENpmq " Sum of squares of top m LP coefficients´γ n n m. Throughout, we use AIC penalty with γ n " 2. Find the m that maximizes the OPENpmq. Store the selected indices j in the set J . The OPEN-smoothed LP-coefficients will be denoted by x LP j . Finally, return the following smoothed estimate: Remark 1 (The scientific value of sparse d). A meaningful way to measure the simplicity of a model is the number of "new" statistical parameters that it contains beyond the given scientific parameters-that is, the parsimony (number of parameters) of d. A sparse p d provides an intelligent and parsimonious way to elaborate the model-0 (not an indiscriminate, bruteforce elaboration). Simplicity is vital to make the model usable and interpretable by decisionmakers, who like to understand how to change the initial model to explain the data. Understanding the deficiency of the current model is an essential part of the process of iterative model building and refinement: Have we overlooked something? Where are our knowledge gaps? This section provides a comprehensive understanding and exploratory tool for representing and assessing potential model misspecifications. 7 Figure 1 : 10, 000 samples are generated from the true (unknown to the analyst) model 0.9Exppλ 0 q`0.1N p25, 2.5 2 q. The graph of d 0 pxq acts as a 'magnifying glass' that forces us to examine what extra information data are willing to reveal beyond the known model. Example 1. Consider the following scenario: Fig. 1 displays the data that a physicist just collected from an experiment. The blue curve is the physics-informed background distribution f 0 pxq, which, in this case, is an exponential distribution with λ 0 " 25, and the red curve is the true unknown probability distribution. The physicist is mainly interested in knowing whether there is any new physics hidden in the data, i.e., anything new in the data that was overlooked by existing theory. If so, what is it? This will help the physicist to come up with some scientific explanations and potential alternative theories. The Shape of Uncertainty. The researcher ran the density-sharpening algorithm of the previous section with m " 10, and the resulting p d 0 pxq is displayed in the right of Fig. 1 as a function of F 0 pxq. Few conclusions: (i) Model appraisal: The non-uniformity of p d tells us that the "shape of the data" is inconsistent with the presumed model-0. (ii) Model amendment: The shape of p d also informs the scientist about the nature of deficiency of the old model-i.e., what are the most worrisome aspects of the presumed model? In this example, the most consequential unanticipated pattern is the presence of a prominent 'bump' (excess mass) around F´1 0 p0.63q « 24.85, which might be indicative of new physics. This newly discovered pattern can now be used to improve the background exponential model. Remark 2 (Visual explanatory decision-aiding tool). One of the unique abilities of our exploratory learning is its ability to generate explanations on why and how the model-0 is incomplete 8 . Thus, the graph of p dpu; F 0 , F q explicitly addresses decision-makers model misspecification concerns. It digs into the observations to uncover the "blind spots" of the current model that can ultimately drive discovery (locating novel hypotheses) and better decisions. A general measure of the degree of model misspecification is defined using the Csiszár information divergence class. Definition 3. For ψ : r0, 8q Þ Ñ R a convex function with ψp1q " 0, define the Csiszár class of statistical divergence measure between F and F 0 : We prefer to represent it in terms of density-sharpening function as follows: One can recover popular divergence measures by appropriately choosing the ψ-function: One can quickly estimate the χ 2 -model misspecification index by expressing it in terms of LP-Fourier coefficients (applying Parseval's identity to equation 9): I χ 2 pF, F 0 q quantifies the uncertainty of the preliminary model f 0 pxq in light of the given datai.e., whether f 0 pxq is catastrophically wrong or slightly wrong. Estimate it by plugging the empirical LP-coefficients (12) into (17). Since, under H 0 : F " F 0 , the sample LP-coefficients have the following limiting null distribution (see Theorem 2 of Mukhopadhyay 2017): n r I χ 2 pF, F 0 q follows χ 2 m under null. One can use this to compute the p-value. Applying this measure to example 1, we get a p-value of practically zero-indicating that the background exponential model is badly damaged and should be repaired before making a decision. Definition 4. DSpF 0 , mq stands for Density-Sharpening of f 0 pxq using m-term LP-series approximated d 0 pxq, given by: obtained by replacing (9) into (2). DSpF 0 , mq generates a relevant class of plausible models in the neighbourhood of the postulated f 0 pxq that are worthy of consideration. A few additional points on density-sharpening: 1. The DSpF 0 , mq-based density-sharpening principle provides a mechanism for exploring data by exploiting the uncertain background knowledge model. It starts with data and an approximate model f 0 pxq-and produces a more refined picture of reality following (18). 2. The process of density-sharpening suitably 'stretches' the theory-informed model to create a class of robust empirico-scientific models. Moreover, it shows how new models are born out of pre-existing ones by means of data-driven self-modification. 3. The truncation point m indicates the radius of the neighborhood around the elicited f 0 pxq to create permissible models. DSpF 0 , mq models with higher m entertain alternative models of higher complexity. However, to maintain conceptual appeal and interpretability, it is advisable to focus on the vicinity of f 0 by choosing an m that is not too large. Substituting the smooth estimates x LPrj; F 0 , F s of eq. (14) into the formula (18), we get the most economical model (among competing alternatives around f 0 pxq) that best explains the empirical surprise. 9 4. It provides an architecture of an 'intelligent agent' that simultaneously possesses the ability to: learn (what's new can we learn from the data), reason (how to explain the surprising empirical findings), and plan (how to self-modify to adapt in the new situations). Example 2 (Glomerular filtration data). We are given glomerular filtration rates 10 for 211 kidney patients. The experiment was done at Dr. Bryan Myers' Nephrology research lab at Stanford University. The dataset was previously analyzed in Efron and Hastie (2016) . The blue curve on the left plot of Fig. 2 shows the best-fitted lognormal (LN) distribution. We start our analysis by asking whether the parametric LN model needs to be refined to fit the data. The middle panel displays the density-sharpening function, which provides insights into the nature of misspecification of the LN model: the peak and the tails of the initial LN distribution need repairing; LN underestimates the peak and neglects the presence of heavier tails. The repaired LN model (displayed on right-hand side of Fig. 2) is given by where f 0 pxq is LNpµ 0 , σ 0 q, with µ 0 " 4 and σ 0 " 0.24. The part in the square bracket comes from d 0 pxq, which provides recommendations on how to suitably elaborate the LN-model to capture the unexplained shape. The point of this example was to show how the densitysharpening principle (DSP) allows an analyst to explicitly perform model formulation, fitting, checking, and repairing-all seamlessly combined into one workflow. It is interesting to compare our d-sharp LN-model (the red curve) with the seven-parameter exponential family fit shown in Fig. 5 .7 of Efron and Hastie (2016) . The most noticeable difference lies in the right tail. Efron's seven-parameter exponential family model shows eerie spikes on the extreme-right tail. The main reason for this is that it is based on polynomials of raw x: (x, x 2 , . . . , x 7 ), which are not robust. That is to say, these traditional bases are unbounded and highly sensitive to 'large' data points. In contrast, our LP-polynomials are functions of F 0 pxq, not raw x, and thus robust by design. The other operational difference between our approach and Efron's exponential family approach is that we model the "gap" between lognormal and the data, which is often far easier to approximate nonparametrically (only required one parameter, see eq. 19) than modeling the data from scratch. 11 Not the smallest advance can be made in knowledge beyond the stage of vacant staring, without making an abduction at every step. -C. S. Peirce (1901) Modelplasticity-Models ability to modify and adapt itself in response to new data. The density-sharpening principle enables the model to develop new shapes in the face of change. 11 There is an easy way to see that: compare the shapes of the histograms of the left two plots of Fig. 2 . Density-sharpening and model evolution. Modeling is a continual process, not a onetime data-fitting exercise. The density sharpening mechanism allows us to combine new observations with a priory expected model to generate new insights, as depicted in Fig. 3 : Better theory f 0 pxqˆd 0 pxq was the pioneer of abductive reasoning; see Stigler (1978) and Mukhopadhyay (2021b) for more details on the Peircean view of statistical modeling. The goal of Abductive Inference Machine or AIM is to provide a learning framework that endows a model with this ability to learn, grow and change with new information. Remark 3. The density-sharpening process plays an essential role for abductive inference, which provides the computational machinery for generating novel hypotheses with explanatory merit and selecting specific ones for further examinations. Remark 4 (Abductive inference ‰ Hypothesis testing). Any scientific inquiry begins with observations and some initial hypotheses. Classical statistical inference develops tools to test the validity of the null model in light of the data. Since all scientific theories are incomplete, accepting or rejecting a particular hypothesis is a pointless exercise. The real question is not whether the null hypothesis is true or false. The real question is: how far is the reality from the postulated model? In which direction(s) should we search to find a better model? Density-sharpening law provides a process of progressive refinement of yesterday's hypothesis. We often neglect how we get rid of the things that are less important...And oftentimes, I think that's a more efficient way of dealing with information. Attention is the prerequisite of gaining new knowledge. Intelligent learners have the ability to quickly infer where to focus attention to gain knowledge. In our modeling framework d 0 pxq draws analyst's attention quickly and efficiently to the new informative part by suppressing boring details; verify it from the graphs of d 0 pxq in Figs. 1 and 2. It acts as a 'gating mechanism' that filters out the new interesting (surprising) aspects of the data, and ignores the dull and unsurprising part-thereby sharpening the model's intelligence by guiding where to pay attention for information processing. "The whole function of the brain is summed up in: error-correction" -W. Ross Ashby, English psychiatrist and a pioneer in cybernetics. Remark 5. In the brain, a dedicated circuit (or system) performs information-filtering similar to what d 0 pxq does for our two-component model. The existence of such a brain circuit was first hypothesized by Francis Crick (1984) -he called it 'The Searchlight Hypothesis.' Since then, significant progress has been made to hunt down the brain region, what is now called basal ganglia, that suppresses irrelevant inputs. For more details see Halassa and Kastner (2017) and Gu et al. (2021) . Basal ganglia help us focus on what's important and tune out the rest. The mechanics of our model-building mimic the brain's cognitive process that uses existing knowledge to sieve out the new information for correcting the error (sharpening) of the earlier mental model. How should a decision maker acknowledge model misspecification in a way that guides the use of purposefully simplified models sensibly? -Cerreia-Vioglio et al. (2020) This section demonstrates how practicing abductive inference based on the density-sharpening principle can enable better decision-making under uncertainty. Abduction is the process of generating and revising a model before choosing the optimal action. An abducer makes decisions by allowing potential model misspecification. A " ta 1 , . . . , a q u based on observed outcome X 1 , . . . , X n from an unknown probability distribution. The DM selects the optimal action that minimizes expected loss (or risk) under the assumed model-0: where f 0 pxq is the DM's posited probability distribution over outcomes. However, as an abducer, the DM is completely aware that the uncertainty about the outcomes may not be fully captured by a single, rigidly-defined probability distribution f 0 pxq and thus wants to choose the best decision by accommodating the uncertainty of model-0. Decision making based on density sharpening principle. To account for the imperfect nature of model-0, the most natural thing to do is to work with an enlarged class of plausible distributions around the vaguely acceptable f 0 pxq: within a certain reasonable neighbourhood, say M " 10. We like to use this enlarged class of distributions Γ M for robust decision-making. Two such strategies are discussed below. We call this an abductive-minimax procedure. Our proposal is partly inspired by the 'localminimax' idea of Hansen and Sargent (2001a,b) . Method 2. We now describe another robust decision-making procedure that takes into account the uncertainty in the analyst's elicited probability model of future states. Two key concepts are: bootstrap model averaging and action-profile function. Step 1. We use bootstrap to explore f P Γ M in an intelligent way. Draw n samples with replacement from the original data. Denote the bootstrap empirical cdf as r F p1q . Run densitysharpening algorithm DSpF 0 , r F p1q q; denote the selected d-sharp model as f p1q . Step 2. Use f p1q to select the best action from the given set of q-actions ta 1 , . . . , a q u. Denote the selected action as a p1q . Step 3. Repeat steps 1-2, B times (say B " 1000 times). And return: • The sample bootstrap distribution p A of optimal actions ta p1q , . . . , a pBq u-which we call the action profile of the decision problem. • Bootstrap systematically generates probable alternative models tf p1q pxq, . . . , f pBq pxqu that can explain the data. Compute bootstrap model averaged distribution 14 : This model averaging over all plausible alternatives makes it robust to model uncertainty. In this strategy, the policymaker does not have to put his/her complete faith in a single alternative distribution. Bootstrap density exploration weights different alternatives appropriately to create a realistic model. Fig. 4 shows the bootstrap-generated densities for the gfr data of example 2. The light blue curves are the plausible alternative models, and the dark blue is the averaged density that takes into account all likely scenarios. Step 4. Robust procedure 15 : A pragmatic 16 decision-maker chooses an action (or ranks the actions) that minimizes expected loss (or maximizes the expected utility) with respect to Figure 4 : The light blue curves are the bootstrap generated gfr densities, which try to present a landscapes of plausible scenarios. The dark blue denote the estimated model-averaged distributionf pxq. the averaged-distribution: p a robust :" argmin aPA ş L a pxq dF pxq. Our strategy prescribes action that is robust across a wide range of plausible alternative models. It could be especially powerful for dealing with "deep uncertainty" in making robust policies. For a comprehensive overview on this subject, see Marchau et al. (2019) . Step 5. Quantifying the 'robustness' of the action (or decision rule): The entropy of the action profile distribution can be used to examine the robustness (or stability) of the inference to the potential model misspecification: PrpA " a i q log PrpA " a i q. Uniform probability over possible actions yields maximum uncertainty-indicating that the decision is highly non-robust (unstable) under the possibility of model misspecification. Until now, we have assumed experts can precisely formulate their opinion in a probabilistic form f 0 pxq. However, for complex real-world problems, experts might only have incomplete information about the uncertainty distribution of the target variable. Investigators often elicit their partial knowledge about an uncertain quantity as a set of quantile-probability (QP) pairs tx i , F px i qu, for i " 1, . . . , . The job of an analyst is to find a simple, flexible, and parameterizable density that honors the assessed percentiles. Learning from incomplete information. The task of eliciting an expert's probability distribution from a small set of QP pairs is a vital yet nascent topic in decision analysis; see Powley (2013) , Keelin and Powley (2011), Hadlock (2017) . In this section, we present an algorithm called Q2D (stands for quantile to distribution) that provides a systematic approach to deduce a reliable expert distribution from arbitrary QP-specifications. The main theoretical idea behind Q2D algorithm: Recall our DSpF 0 , mq model Integrating from minus infinity to x on both sides, we have Probability-gap Approximation. Given a set of arbitrary quantile-probability data px i , F px i qq, for i " 1, . . . , , we can rewrite (26) compactly as a matrix equation v " S 0 β where v i " F px i q´F 0 px i q, β i " LP j , and S 0 P R ˆm , S 0 ri, js " parameters are β " pβ 1 , . . . , β m q, where β j is shorthand for LPrj; F 0 , F s. For m ď , we can uniquely estimate β using the least-square method For large (say, ě 5), a better, more stable estimate can be found through regularization where }¨} p is the p norm, and λ ą 0 is the regularization parameter. The lasso (Tibshirani, 1996) penalized p β yields a sparse estimate and counters over-fitting. This penalized estimate provides a tradeoff between accuracy and interpretability. Finally, plug the estimated LP-Fourier coefficients β j into the primary equation (18) to get the expert distribution. Remark 6. The expert quantile specifications should not be viewed as a 'gold standard'they are nothing but a preliminary guess (prone to errors of judgment or hindsight bias) whose purpose is to steer the analyst in the right direction 17 . For that reason, we recommend the regularized p β over the naive r β, since it makes little sense to find an exact fit to the noisy QP-data. Example 3 (Bimodal Distribution). We are given the following quantile judgments: Quantile: In our Q2D algorithm, we choose F 0 (an initial approximate shape) to be normal distribution. To estimate the parameters µ 0 and σ 0 of the normal distribution, note that the quantile function Qpuq « µ 0`σ0 Φ´1puq. Thus one can quickly get a rough estimate by simply performing 17 Winkler (1967) emphasized that the expert does not have some 'true' density function waiting to be elicited, only a 'satisficing' initial distribution that the policymaker is 'content to live with at a particular moment of time.' Example 4 (U.S. Navy data). Fig. 6 shows a histogram of 122 repair times (in hours) for a component of a U.S. Navy weapons system. The dataset was analyzed in Law (2011) . Imagine that for privacy and other reasons, we do not have access to the full data. The goal is to infer a probability distribution that faithfully represents the following quantiles: Quantile: x i 0.12 1.30 3.00 7.00 26.17 Probability: F px i q 0.01 0.20 0.50 0.80 0.99 We start with exponential distribution as our initial guess, which is often taken as a 'default' distribution (model-0) in reliability analysis. For X " Exppλq, we have MedianpXq " λ lnp2q, where λ " EpXq. From the quantile table we getλ " 3{ lnp2q " 4.32. Next, we apply the Q2D algorithm to Figure 6 : Left: histogram of 122 repair times for a component of a U.S. Navy weapons system. The blue curve is the f 0 " Expp4.32q. Our analysis only used five QP-data (not the full data), whose outputs are shown in the middle and right-hand panels. The inferred density-sharpening function tells that the peak and the tail of the exponential model need correction. The repaired exponential model is displayed in red. derive the LP-parameters with f 0 " Expp4.32q. The resulting density sharpening function and the final d-sharp exponential are shown in Fig. 6 . The red curve on the right plot shows an excellent fit to the data, which was derived by the Q2D algorithm simply by utilizing the five quantile-probability pairs. High-stakes decision-making (say, COVID-19 pandemic or climate change) is often based on multiple experts' opinions instead of putting all bets on a single rigidly-defined probability model. The challenge is to aid data-driven decision-making by appropriately combining several experts' models. We describe one possible way to build a 'consensus committee model ' that can be used as a possible model-0 within an abductive decision-making framework. Learning from multiple expert distributions. Given k expert probability models tf 01 , . . . , f 0k u, which may differ markedly in shape, define the following model-weights: Relevance weight: w " 1 1`ř j | LP j| | 2 , for " 1, . . . , k where LP j| is the LP-Fourier coefficients of the -th model: Note that the relevance weight for the -th model is always 0 ă w ď 1, and w " 1 if and only if LP j| " 0, @ j. LP j| " 0 for all j when f 0 fully explain the data and there is no need to sharpen it further (i.e., d 0 " 1). In that sense, w 's are data-driven weights (which will keep changing as we get more and more fresh data), computed based on the degree of agreement between the observed data and expert model f 0 . Define mixture expert distribution as where π " w { ř w . This model serves two purposes: it tries to resolve conflicting opinions based on data and at the same time encourages one to include as much diverse information as possible. An analyst can use the combined expert model f 0 mix pxq as a model-0 in the subsequent density-sharpening-based learning and decision-making process. How should an analyst use imperfect models to learn from data? 19 What should be the output of such an analysis that can ultimately aid informed decision-making? We address these questions by introducing a general inferential framework for statistical learning and decision-making under uncertainty-which builds on two core ideas: abductive thinking and density-sharpening principle. Some of the defining features of our approach for data analysis, scientific discovery, and decision-making are highlighted below: ‚ Data analysis and science of model management: No model is perfect, irrespective of how cunningly it is designed. The central problem of statistical model developmental process is to understand how a relatively simple model can evolve into a more complex and mature one in the presence of a new data environment. The principle of density-sharpening assists this model evolution process (thereby helping empirical scientists to abduct): by abductively generating explanations on why the presumed model-0 is unfit for the data [playing the role of a quality inspector] and also providing recommendations on how to fix the misspecification issues [serving as a policy adviser] in order to make better decisions in new circumstances. ‚ Discovery and creation of new knowledge: Abductive data analysts are less interested in testing a particular working model. They are mainly interested in conceptual innovation: discovering new hypotheses based on surprising empirical evidence. 20 The density-sharpening function dpu; F 0 , F q picks out 'what's new' in the data beyond the current scientific knowledge encoded in f 0 pxq, thereby helping the scientist to uncover new unexpected knowledge from the data using graphical tools. The density-sharpening principle (DSP) provides a learning mechanism that isolates the 'known' from the 'unknown' and allows us to focus on the newfound pattern in the data, which is the basis for knowledge-creation 21 . ‚ Abductive inference and decision-making: The proposed theory of abductive decisionmaking tackles model uncertainty induced by imprecise, ambiguous, and incomplete knowledge about the underlying probabilistic structure. An abductive-decision support system automatically discovers and explicitly articulates the possible alternatives to the analysts, which forces them to rethink their choices before taking impulsive action. This style of empirical reasoning and adaptive decision-making could be especially beneficial in situations where investigators need to take quick action in the face of uncertainty, equipped with approximate subject-matter knowledge. 20 A largely unexplored topic relative to the vast literature on hypothesis testing. As noted by George E. P. Box (2001) : "Much of what we have been doing is adequate for testing but not adequate for discovery." 21 Curious readers are invited to read the paper "Nobel Turing Challenge: creating the engine for scientific discovery" by Hiroaki Kitano, where he argued that the single-most-important mission of AI is to accelerate scientific discovery; also see Langley (2022) . Exploratory modeling for policy analysis Statistics for discovery Robustness in the strategy of scientific model building Sampling and Bayes' inference in scientific modelling and robustness Bagging predictors Making decisions under model misspecification Function of the thalamic reticular complex: the searchlight hypothesis Estimation and accuracy after model selection Computer Age Statistical Inference Computational circuit mechanisms underlying thalamic control of attention Quantile-parameterized methods for quantifying uncertainty in decision analysis Thalamic functions in distributed cognitive control Acknowledging misspecification in macroeconomic theory Robust control and model uncertainty Uncertainty within economic models The inference to the best explanation Abducting economics Robust statistical procedures Quantile-parameterized distributions Nobel turing challenge: creating the engine for scientific discovery Agents of exploration and discovery How to select simulation input probability distributions Decision making under deep uncertainty: from theory to practice Large-scale mode identification and data-driven sciences Density sharpening: Principles and applications to discrete data analysis Revisiting C. S. Peirce's experiment: 150 years later Applied Stochastic Models in Business and Industry, special issue on toward an examination of hume's argument against miracles, in its logic and in its history Quantile function methods for decision analysis Uncertainty, statistics, and creation of new knowledge Mathematical statistics in the early states Regression shrinkage and selection via the lasso Judgment under uncertainty: Heuristics and biases The quantification of judgment: Some methodological suggestions