key: cord-0298501-ytqau9bf authors: Voulgaropoulou, Stella; Fauzani, Fasya; Pfirrmann, Janine; Vingerhoets, Claudia; van Amelsvoort, Thérèse; Hernaus, Dennis title: Asymmetric effects of acute stress on cost and benefit learning date: 2021-04-26 journal: bioRxiv DOI: 10.1101/2021.04.25.441347 sha: acb3ff0ea8cf02bd99b91e0642fd161118c44ca2 doc_id: 298501 cord_uid: ytqau9bf Stressful events trigger a complex physiological reaction – the fight-or-flight response – that can hamper flexible decision-making. Inspired by key neural and peripheral characteristics of the fight-or-flight response, here we ask whether acute stress changes how humans learn about costs and benefits. Participants were randomly exposed to an acute stress or no-stress control condition after which they completed a cost-benefit reinforcement learning task. Acute stress improved learning to maximize benefits (monetary rewards) relative to minimising energy expenditure (grip force). Using computational modelling, we demonstrate that costs and benefits can exert asymmetric effects on decisions when prediction errors that convey information about the reward value and cost of actions receive inappropriate importance; a process associated with distinct alterations in pupil size fluctuations. These results provide new insights into learning strategies under acute stress – which, depending on the context, may be maladaptive or beneficial - and candidate neuromodulatory mechanisms that could underlie such behaviour. Stress is ubiquitous in everyday life. From recurrent, brief, events (a work meeting, moving 25 to a new house) to major life events (armed combat, pandemic, financial crisis), humans are 26 continuously exposed to challenges in their daily environment. The immediate central and 27 peripheral physiological cascade triggered by such events, collectively termed the fight-or-28 flight (or acute stress) response (Cannon, 1915) , serves an allostatic role that enables 29 organisms to adequately respond to environmental demands (de Kloet, Joëls, & Holsboer, 30 2005) . Although beneficial for survival, this allostatic process comes at a cost: stress-induced 31 redistributions of neural resources -e.g., towards vigilance or threat detection -may hamper 32 the deployment of strategies that support adaptive and optimal decision-making (Hermans, 33 Henckens, Joëls, & Fernández, 2014) . 34 Optimal decisions essentially depend on the ability to rapidly learn from the positive 35 and negative outcomes of previous actions, also known as reinforcement learning (Niv, 36 2009 putatively dopaminergic teaching signals that represent the mismatch between actual and 45 expected outcomes, which are used to flexibly adjust behaviour (Niv, 2009; Rescorla, 1972) . Taken together, our model-free results indicate that acute stress leads to a 247 reinforcement learning strategy that favours learning to maximise reward value over 248 minimisation of action cost, which based on analyses of win-stay/lose-shift rates, could be 249 attributed to increased sensitivity to positive reinforcement (i.e., reward delivery) compared 250 to negative reinforcement (i.e., avoidance of physical effort). 251 separately. Means ± SD, individual data points, distribution and frequency of the data are 258 displayed. In panel a, the top line indicates a significant Condition-by-Trial type interaction. 259 Significant differences are denoted by asterisks (*: p < 0.05, **: p < 0.01, ***: p < 0.001). 260 Source files of task performance data used for the analyses are available in the Figure 3 (Figure 4a for p(r|y) of all candidate 301 models). We note that 2LR_γ remained the most likely model when we considered additional 302 models with greater redundancy and/or lesser biological plausibility (e.g., models with all 303 combinations of reward value/action cost discounting and weight parameters). 304 The 2LR_γ model contains separate learning rates that weight the importance of RPEs 305 and EPEs (αR, αE), an action cost discounting parameter (γ), and an inverse temperature 306 parameter (β), which in previous work could account for performance on a conceptually 307 similar cost-benefit learning task (Skvortsova et al., 2014) . To demonstrate the effect of 308 changes in parameters values on choice preferences within the 2LR_γ architecture, we first 309 simulated choices from 50 artificial agents (averaged across 10 repetitions) performing the 310 reward maximization/action cost minimization reinforcement learning task using a range of 311 parameter values. As expected, greater values of αR and αE primarily impacted the speed of 312 RL and EL choice preferences, while low values of γ lead to asymmetric choice preferences 313 through discounting of action cost, and lower values of β lead to non-selective increases in 314 random sampling (Figure 4b) . 315 In post hoc simulations, i.e., generating participant choices using the obtained 316 parameters, we additionally observed moderate-to-high correlations between simulated and 317 empirical RL/EL for the acute stress and no-stress control group [ρRL_control = 0.55, p < 0.01; 318 ρRL_stress = 0.84, p < 0.01; ρEL_control = 0.56, p < 0.01; ρEL_stress = 0.77, p < 0.01; see Figure 319 Supplement 4], although the canonical performance difference in RL versus EL accuracy was 320 not selective to the acute stress group [tcontrol(39)=-6.72, p<0.001; tstress(39)=-6.01, p<0.001]. 321 However, after we fixed β and γ to group-level averages, to better demonstrate the effect of 322 group differences in the learning rate parameters, we recovered a small but significant 323 group, see next section) and 2LR (where αR > αE for the acute stress group) can be explained 344 by the absence of discounting parameter γ: 2LR is a special case of 2LR_γ, where γ=1, and 345 thus asymmetric effects of acute stress on reward value maximization and action cost 346 minimization can only be explained by dissimilarity in learning rates. 347 Although the effects of acute stress on reward value and action cost learning rates are 348 opposite in 2LR_ γ versus 2LR architectures, these results bolster our confidence in the 349 overall model space, as well as the interpretation that acute stress primarily impacts reward 350 value and action cost learning rates, and not discounting. The observations that I) 2LR_ γ fit 351 better in the entire group of participants, II) 2LR is fully contained within the 2LR_ γ model, 352 and III) 2LR_ γ displayed good recoverability (see below) motivated our choice to focus on 353 the 2LR_γ model. 354 In model recoverability analyses i.e., re-fitting the simulated data from the model to who assigned greater importance to EPEs than RPEs, may have used a computationally 536 costly learning strategy that provides counterweight to a decision-making policy that is 537 biased towards the reward value of actions (captured by action cost discounting parameter γ). 538 Paradoxically, when decisions are by default tilted towards reward value, similar reward and 539 action cost learning rates will facilitate reward learning but hamper action cost learning. 540 Reduced learning rate asymmetry in the presence of action cost discounting may therefore 541 represent a computational reformulation of a heuristic that is employed when cognitively 542 demanding learning strategies are unavailable and the policy towards energy expenditure is 543 more liberal, such as during acute stress. 544 Importantly, stress-induced changes in task performance may crucially depend on the 545 release of catecholamines in neural circuits that support motivation and learning. Dopamine's 546 actions at D1 and D2 receptors in the basal ganglia mediate approach and avoidance learning 547 affect and RPE-pupil size slopes suggest that primarily moderately stressed participants 555 displayed a preference for maximizing reward value, which might be consistent with an 556 inverted U-shape relationship between cognitive performance and DA transmission which is 557 modulated by stress (Arnsten & Goldman-Rakic, 1998; Baik, 2020). Noradrenaline, however, 558 mobilizes available energy to complete effortful actions and locus coeruleus neurons track 559 28 energy expenditure (Varazzani et al., 2015) . Stress-induced sAA concentrations, increased 560 heart rate, and group differences in the association between pupil size fluctuations and EPEs 561 all point to the involvement of the noradrenaline system. Thus, our model-based pupillometry 562 and stress-induction results hint at stress-sensitive dopaminergic and noradrenergic 563 mechanisms that may regulate cost and benefit learning, which could be explored in future 564 work using targeted pharmacological approaches. 565 The results presented here may improve understanding of stress-related 566 psychopathology. While asymmetric cost-benefit learning during acute stress may be 567 beneficial to reach a desired goal state (e.g., safety) despite high action cost, such strategies 568 could also be maladaptive. For example, stress exposure can lead to drug or smoking relapse 569 (L. Schwabe, Dickinson, & Wolf, 2011), a context in which reward value and action cost 570 may be misaligned. Cost-benefit reinforcement learning may provide a useful framework to 571 test hypotheses regarding stress-related impairments in learning and decision-making. 572 Some study limitations need to be acknowledged. First, pupil dilation associated with 573 effort expenditure greatly reduced our power to detect robust associations between EPE 574 encoding and pupil size fluctuations. Future studies should, therefore, consider a temporal 575 delay between effort outcome and effort expenditure phases. Second, while our 576 computational model was able to recover overall task performance patterns in both groups, 577 such effects were subtle and dependent on the contribution of other (non-learning) 578 parameters, which may highlight the importance of interindividual differences in model 579 To summarize, we present evidence of asymmetric effects of acute stress on cost 581 versus benefit reinforcement learning during acute stress, which computational analyses of 582 task behaviour explain as a failure to assign appropriate importance to RPEs versus EPEs, 583 and our model-based pupillometry tentatively link to activity of ascending midbrain 584 29 neuromodulatory systems. These results highlight for the first time how learning under acute 585 stress can be tilted in favour of acquiring good things and away from the avoidance of costly 586 things. 587 588 Participants 590 Adult participants were recruited via paper and online advertisements. All participants were 591 screened for a DSM-5 psychiatric and/or neurological disorder, substance use, endocrine 592 and/or vascular disorder, abnormal BMI (>40 or <18), smoking and drinking (>10 593 cigarettes/units per week), psychotropic medication use (lifetime) and hormonal 594 contraceptive use (current; female participants only). All participants completed the ~2-hour 595 experiment between 12:00h and 18:00h to minimize diurnal cortisol fluctuations (Bailey & 596 Heitkemper, 2001) . Participants were instructed to refrain from alcohol (starting the evening 597 before the day of the experiment), smoking, food, caffeine intake, strenuous physical activity 598 and brushing their teeth (all >2 hr prior to experiment), which was verified verbally at the 599 start of the session. Four participants were excluded due to an equipment failure (n=4). Three 600 participants quit during stress-induction (n=2) or task procedures (n=1). Because chance-601 level performance on reinforcement learning tasks might indicate a successful manipulation, 602 a lack of motivation, or a failure to comprehend the task instructions, participants that 603 performed at or below chance level (0.5) on both RL and/or both EL pairs near the end of the 604 experiment (final 10 presentations) were excluded (n=13; 6 acute stress, 7 no-stress control). 605 Including these participants did not alter our key finding that acute stress was associated with 606 asymmetric cost versus benefit learning (see Results). Pupillometry and neuroendocrine data 607 were not processed further for these participants. showing either a €0.20 coin or a crossed-out coin, indicating no reward (3000ms). 675 Participants learned to choose the optimal (most reward or most effort avoiding) 676 stimulus for four distinct image pairs, 30 presentations each, with yoked reward and action 677 cost contingencies. For 2/4 pairs, participants could regularly acquire rewards by selecting 678 one (optimal) stimulus over the (suboptimal) other (henceforth, "reward learning"/RL pairs), 679 while the probability of having to exert effort was identical for both stimuli. For the other two 680 pairs, choices of one stimulus were more frequently followed by the avoidance of effort 681 ("effort learning"/EL pairs), while the probability of reward was kept constant between both. 682 For all pairs, the probability of the stimulus property that was kept constant (reward/effort) 683 was set to a 33.3% chance of positive outcome upon selection (reward/ effort avoidance) and 684 66.6% chance of negative outcome (no reward/effort). 685 To assess whether any acute stress effects on reward maximization (measured using 686 RL pairs) and effort cost minimization (measured using EL pairs) learning were potentially 687 mediated by task difficulty, we employed different difficulty levels for each RL and EL pair. 688 That is, for one RL and one EL ("easy") pair, a choice for the optimal stimulus was followed 689 by a positive outcome in 83% (vs. 17% negative outcome) of all trials (83% negative/17% 690 positive outcome for suboptimal stimulus); for the other RL and EL ("hard") pair a choice for 691 the optimal stimulus was followed by a positive outcome in 70% (vs. 30% negative outcome) 692 of all trials (and 70% negative/30% positive outcome) for the suboptimal stimulus. This 693 approach allowed us to disentangle whether acute stress primarily impacted domain-specific 694 (RL vs. EL) or general (easy vs. hard) reinforcement learning (the latter which also might 695 involve other cognitive skills that might be beneficial to performance and sensitive to change 696 under stress, such as working memory (Schoofs, Wolf, & Smeets, 2009 ). The task 697 contingencies described above were based on extensive pilot tests to identify a reinforcement 698 schedule that would enable us to detect stress-induced improvements and decreases in task 699 performance. We selected task contingencies based on pilot sessions involving a no-stress 700 control condition and chose a reinforcement schedule associated with non-ceiling/floor 701 performance on RL and EL trials. The four original pairs were presented four times (total n=16) during which we only asked 710 participants to discriminate on the basis on the reward value (for RL) or action cost (for EL). 711 For novel stimulus combinations, we only presented stimuli that differed in reward 712 value/action cost if reward value discrimination/action cost discrimination was assessed (total 713 n=48: n=4 presentations for the 6 combinations). 714 For every participant, stimuli were randomly assigned to pairs, optimal/suboptimal 715 stimulus orientation was balanced (50% of all optimal stimulus presentations occurred on the 716 left-hand side) and misleading outcomes (e.g., negative outcomes for optimal stimuli) were 717 equally spaced out across the thirty presentations (and balanced for left/right side). Trial 718 presentation order was pseudo-randomized such that I) a given pair would never be presented 719 more than twice in a row and II) the gap between two presentations of a given pair was never 720 greater than four trials. 721 Prior to performing the actual task and prior before acute stress/no-stress control 722 procedures, participants received standard verbal instructions and completed a 16-trial 723 practice round of the learning phase. Participants were not informed about stimulus-outcome 724 contingencies; they were only advised to accrue as much money as possible and avoid 725 exerting unnecessary effort. A 60% accuracy performance threshold was used to confirm that 726 participants understood the general task procedure. The practice round was repeated if 727 participants failed to reach 60% accuracy. To prevent learning, we used deterministic 728 stimulus-outcome probabilities and different stimuli. 729 730 Computational cost-benefit reinforcement learning model: model space 731 In an attempt to uncover latent mechanisms by which acute stress affects reward 732 maximization and/or action cost minimization, we turned to cognitive computational 35 modelling. We employed a modified reinforcement learning framework based on Rescorla 734 and Wagner (Rescorla, 1972) In various formulations of reinforcement learning, such as Q-learning (Watkins & 762 Dayan, 1992) and the actor-critic framework (Niv, 2009; Rescorla, 1972) , the degree to 763 which prediction errors update choice preferences is represented by α, the learning rate 764 Here, pr is the probability of selecting an action, β is the inverse temperature parameter 807 that among others captures the balance between exploration and exploitation (Nassar & 808 Raincloud 950 plots: a multi-platform tool for robust data visualization The role of cognitive effort 953 in subjective reward devaluation and risky decision-making Stress weakens prefrontal networks: molecular insults to higher 956 cognition Noise stress impairs prefrontal cortical 958 cognitive function in monkeys: evidence for a hyperdopaminergic mechanism Stress and the dopaminergic reward system Circadian rhythmicity of cortisol and body 963 temperature: morningness-eveningness effects Acute stress 966 selectively reduces reward sensitivity Perturbations in Effort-Related Decision-Making 969 Driven by Acute Stress and Corticotropin-Releasing Factor Infant cognition: going full factorial with pupil dilation Learning under stress: 1053 how does it work? Relationships between Pupil Diameter 1056 and Neuronal Activity in the Locus Coeruleus, Colliculi, and Cingulate Cortex Neural 1059 signatures of value comparison in human cingulate cortex during decisions requiring 1060 an effort-reward trade-off Alpha Amylase as a Salivary Biomarker of Acute 1062 Stress of Venepuncture from Periodic Medical Examinations. Frontiers in Public 1063 Preprocessing pupil size data: Guidelines and code Doing Bayesian data analysis: A tutorial with Calculating and reporting effect sizes to facilitate cumulative science: a 1068 practical primer for t-tests and ANOVAs The Computational Pharmacological, and Physiological Determinants of Sensory Learning under 1072 Distinct effects of apathy and dopamine on effort-based decision-making in 1075 Behavioural and neural characterization of optimistic reinforcement learning Stress 1080 modulates reinforcement learning in younger and older adults A specific role for serotonin in overcoming effort cost. Elife, 5, 1084 e17282 Heightened reward learning under stress in 1086 generalized anxiety disorder: A predictor of depression resistance Evaluation and comparison of computational 1089 models Taming the beast: extracting generalizable knowledge 1091 from computational models of cognition. Current opinion in behavioral sciences Human salivary alpha-amylase reactivity in a psychosocial stress paradigm Reinforcement learning in the brain Working-memory 1099 capacity protects model-based learning from stress Chapter 23 -Opponent Brain Systems for Reward 1102 and Punishment Learning: Causal Evidence From Drug and Lesion Studies in Decision Neuroscience PsychoPy2: Experiments in behavior made easy Why not try 1109 harder? Computational approach to motivation deficits in neuro-psychiatric diseases Stress reduces use of 1112 negative feedback in a feedback-based learning task Stress increases cue-triggered 1115 "wanting" for sweet reward in humans Two 1118 formulas for computation of the area under the curve represent measures of total 1119 hormone concentration versus time-dependent change Acute 1122 Psychological Stress Reduces Working Memory-Related Activity in the Dorsolateral 1123 Stress attenuates 1126 the flexible updating of aversive value A theory of Pavlovian conditioning: Variations in the effectiveness of 1130 reinforcement and nonreinforcement. Current research and theory Bayesian model selection 1132 for group studies -revisited The human stress response Neural mechanisms underlying motivation of mental versus physical effort Cold pressor stress impairs performance on 1140 working memory tasks requiring executive functions in healthy young men Effects of stress on heart rate complexity--a comparison between short-term 1144 and chronic stress A neural substrate of prediction and 1147 reward. science Stress, habits, and drug addiction: a 1149 psychoneuroendocrinological perspective Stress-induced modulation of instrumental behavior: 1152 from goal-directed to habitual control of action Acute stress induces selective 1155 alterations in cost/benefit decision-making A 1158 selective role for dopamine in learning to maximize reward but not to minimize effort: 1159 evidence from patients with Parkinson's disease Learning to minimize efforts versus 1162 maximizing rewards: computational principles and neural correlates Introducing the Maastricht Acute Stress Test (MAST): a quick and non-1166 invasive approach to elicit robust autonomic and glucocorticoid stress responses A causal link between prediction errors Acute Stress Enhances 1172 Associative Learning via Dopamine Signaling in the Ventral Lateral Striatum. The 1173 R: A language and environment for statistical 1175 computing. R Foundation for Statistical Computing Fear of pain and cortisol reactivity predict the strength of stress-induced hypoalgesia Recommendations for Bayesian hierarchical 1181 model specifications for case-control studies in mental health Noradrenaline and 1184 dopamine neurons in the reward/effort trade-off: a direct electrophysiological 1185 comparison in behaving monkeys Q-learning Development and validation of brief 1190 measures of positive and negative affect: the PANAS scales Controlling low-level image properties: The SHINE toolbox Ten simple rules for the computational modeling of 1196 behavioral data Increased systolic blood pressure 1198 reactivity to acute stress is related with better self-reported health Blockade of uptake for dopamine, but not 1202 norepinephrine or 5-HT, increases selection of high effort instrumental activity: 1203 Implications for treatment of effort-related motivational symptoms in 1204 psychopathology values of both stimuli in the pair. 810Within the above-described model space our predictions of acute stress effects on reward 811 maximization and action cost minimization could, thus, be explained by changes in 812 sensitivity to reward value and/or action cost (WR, WE), changes in how much weight RPEs 813 and EPEs are afforded (i.e., learning rates, αR, αE), and/or changes in the discounting of 814 reward value by action cost (γ). If acute stress leads to more random responses, such effects 815 should be captured by β. 816Based on our predictions and the obtained pattern of results (most notably asymmetrical 817 RL/EL performance in the acute stress condition), we considered six candidate models that 818 could capture these various scenarios: I) a model with 2 distinct learning rates for reward and 819 effort (αR, αE) [2LR] ; II) a model with 2 learning rates (αR, αE) and a discounting parameter 820 (γ) (2LR_ γ); III) a model with 2 learning rates (αR, αE), a reward weight (WR) and an effort 821 weight parameter (WE) (2LR_WR_WE), IV) a model with a single learning rate (α), reward 822 weight (WR), effort weight (WE), and a discounting (γ) parameter (LR_ WR__WE_ γ); V) a 823 model with 2 learning rates (αR, αE), a reward weight (WR), and a discounting (γ) parameter 824 (2LR_WR_γ); VI) a model with 2 learning rates (αR, αE), a reward weight (WR), effort weight 825 (WE) and discounting (γ) parameter (2LR_ WR_ WE_ γ). Fluctuations in pupil diameter were continuously measured using an SR-Research Eyelink 832 1000 Tower Mount infrared eye tracker while participants performed the reward 833 maximization/action cost minimization reinforcement learning task (1000Hz sampling rate, 834 except for three participants, whose data were obtained at 500Hz). Participants placed their 835 head on an adjustable chin rest and against a forehead bar to minimize motion. Eye-tracker 836 calibration was performed at the start of the paradigm, and subsequently every 10 min. 837Stimulus luminance was matched using the SHINE toolbox (Willenbockel et al., 2010) Statistical analyses were conducted using R, version 3.6.2 (Team, 2020 ) and, where 862 applicable, results were visualised using Raincloud Plots (Allen, Poggiali, Whitaker, 863Marshall, & Kievit, 2019). Acute stress measurements were analysed using mixed ANOVAs 864involving Condition (between-factor condition: no-stress control, acute stress induction) and 865Time (within-factor: 2 pre/post-MAST or 6 levels for sCORT). 866For the reward maximisation/action cost minimisation reinforcement learning task, an 867 accuracy score was calculated by dividing the number of optimal stimulus choices by the 868 total trial amount (n=30 per pair). Mixed ANOVAs involving Condition, Trial Type (RL, EL) 869and Difficulty (Easy, Hard pairs) were carried out. For analyses involving Time effects (i.e., 870 repeated presentations of stimulus pairs), accuracy scores were averaged per bin of ten 871 presentations (presentation 1-10, 11-20, and 21-30). To better understand whether acute 872 stress effects on task performance were primarily driven by changes in sensitivity to positive 873 or negative outcomes, win-stay (repeating a choice following a positive outcome) and lose-874 shift (choosing the other stimulus following a loss) rates were calculated for RL and EL trials 875 (Hanneke E. M. den Ouden et al., 2013). For RL trials, we calculated win-stay/lose-shift rates 876 using reward outcomes (yes/no reward); for EL trials we used effort outcomes (yes/no effort). 877We refer to the 2-level factor representing win-stay/lose-shift rates as "Strategy". For surprise 878 test trials involving the original four pairs (n=4 presentations per pair), we investigated final 879 choice tendences using a one-sample t-test against chance level (0.5). Participants' ability to 880 41 discriminate stimuli based on reward value and action cost in novel stimulus arrangements 881 (n=48, 24 reward value and 24 action cost discrimination trials) were investigated using 882 mixed ANOVAs involving Condition and Trial Type. 883 Group differences in model parameters from the non-hierarchically fit model were 884 investigated using Condition-by learning rate (αR, αΕ) mixed ANOVAs and independent 885 samples t-tests. Given that we used separate priors for the two groups, we report the Bayesian 886 analogue of a t-test and mixed-ANOVA (Kruschke, 2014) -a more robust test of group 887 differences -for posterior parameters obtained from the hierarchically fit model (for 888 reference, we also report these analyses for the non-hierarchical data). 889Post hoc (simple) main effect analyses for all ANOVAs were conducted using 890 independent sample (Condition), paired-samples (Time, Trial Type, Strategy), and one-891 sample t-tests (≠ 0 or 0.5). Greenhouse-Geisser-corrected statistics were reported when 892 sphericity assumptions were violated. We report statistical significance as p<0.05 (two-893 sided), but we note that most main and interaction effects involving Condition survived at a 894 more stringent threshold (p<.01), except for some strategy and surprise test phase effects, 895 which should be interpreted with caution. In case of statistically significant results, 896 generalized eta square (ges; n 2 G) was reported, with n 2 G values of 0.02, 0.06, and 0.14 897 representing a small, medium, and large effect size, respectively (Lakens, 2013) . 898With respect to pupillometry, we conducted model-free and model-based analyses. In The observed mean difference from zero that falls outside the 95% HDI suggests that the 1258 62 difference between αE and αR was greater in no-stress controls compared to acute stress 1259 subjects. Panel B. Both groups did not differ in the magnitude of αR, as indicated by a 95% 1260 HDI that included 0. Panel C. Acute stress compared to no-stress control subjects exhibited a 1261 lower value of αE, as indicated by a 95% HDI that falls well above zero. Left: Model-free analyses of pupil size using all effort outcome trials. Middle: Pupil size 1266 differences during effort/effort avoidance outcomes in the entire sample; force exertion was 1267 associated with large effects on pupil size and these trials were therefore excluded from 1268 analysis. Right: Model-based action cost prediction error analyses using all effort outcome 1269 trials. 1270