key: cord-0300794-aktqf9yv authors: Balasubramani, Pragathi Priyadharsini; Diaz-Delgado, Juan; Grennan, Gillian; Alim, Fahad; Zafar-Khan, Mariam; Maric, Vojislav; Ramanathan, Dhakshin; Mishra, Jyoti title: Rostral Anterior Cingulate Activations inversely relate to Reward Payoff Maximation & predict Depressed Mood date: 2021-09-24 journal: bioRxiv DOI: 10.1101/2021.06.11.447974 sha: 4dc83afdfc73e69df436d5f4e27611f87ec4166f doc_id: 300794 cord_uid: aktqf9yv Choice selection strategies and decision making are typically investigated using multiple-choice gambling paradigms that require participants to maximize reward payoff. However, research shows that performance in such paradigms suffers from individual biases towards the frequency of gains to choose smaller local gains over larger longer term gain, also referred to as melioration. Here, we developed a simple two-choice reward task, implemented in 186 healthy human adult subjects across the adult lifespan to understand the behavioral, computational, and neural bases of payoff maximization versus melioration. The observed reward choice behavior on this task was best explained by a reinforcement learning model of differential future reward prediction. Simultaneously recorded and source-localized electroencephalography (EEG) showed that diminished theta-band activations in the right rostral anterior cingulate cortex (rACC) correspond to greater reward payoff maximization, specifically during the presentation of cumulative reward information at the end of each task trial. Notably, these activations (greater rACC theta) predicted depressed mood symptoms, thereby showcasing a reward processing marker of potential clinical utility. Significance Statement This study presents cognitive, computational and neural (EEG-based) analyses of a rapid reward-based decision-making task. The research has the following three highlights. 1) It teases apart two core aspects of reward processing, i.e. long term expected value maximization versus immediate gain frequency melioration based choice behavior. 2) It models reinforcement learning based behavioral differences between individuals showing that observed performance is best explained by differential extents of reward prediction. 3) It investigates neural correlates in 186 healthy human subjects across the adult lifespan, revealing specific theta band cortical source activations in right rostral anterior cingulate as correlates for maximization that further predict depressed mood across subjects. Choice selection strategies and decision making are typically investigated using multiple-27 choice gambling paradigms that require participants to maximize reward payoff. However, 28 research shows that performance in such paradigms suffers from individual biases towards the 29 frequency of gains to choose smaller local gains over larger longer term gain, also referred to as 30 melioration. Here, we developed a simple two-choice reward task, implemented in 186 healthy 31 human adult subjects across the adult lifespan to understand the behavioral, computational, and 32 neural bases of payoff maximization versus melioration. The observed reward choice behavior on 33 this task was best explained by a reinforcement learning model of differential future reward 34 prediction. Simultaneously recorded and source-localized electroencephalography (EEG) showed 35 that diminished theta-band activations in the right rostral anterior cingulate cortex (rACC) 36 correspond to greater reward payoff maximization, specifically during the presentation of 37 cumulative reward information at the end of each task trial. Notably, these activations (greater 38 rACC theta) predicted depressed mood symptoms, thereby showcasing a reward processing 39 marker of potential clinical utility. In this study, we uniquely separate immediate gain frequency bias driven decision-making 74 from advantageous longer term payoff-based decision making. Specifically, we designed a two-75 choice paradigm with two distinct blocks -a Δ0payoff (baseline) block where two reward choice 76 options have equal payoffs and reward variance suitable for measuring the immediate gain 77 frequency bias, and a Δpayoff (difference) block where the two-choice options have unequal 78 payoffs suitable for measuring payoff influences. We thereby, tease apart measurements of 79 immediate gain frequency biased response from expected value or long-term payoff based 80 response, to understand the distinct cognitive and neural mechanisms underlying payoff decisions. 81 Second, we capitalize on computational reinforcement learning (RL) models to understand 82 the basis of individual differences in reward and risk based learning across subjects 83 (Balasubramani et Figure 1A shows a schematic of the task stimulus sequence and 150 Supplementary table 1 shows the reward distribution that was shuffled and updated after every 151 10 trials had been sampled from that set. 152 The We also calculated Win-Stay and Lose-Shift performance on both Δpayoff and Δ0payoff blocks. 176 Win-Stay was computed as the proportion of times the subject repeated the same choice option in 177 the next trial after obtaining a gain for choosing that option in the current trial. Lose-Shift was 178 computed as the proportion of times the subject would shift away from the current choice option 179 in the next trial, on obtaining a loss in the current trial. 2) Model B optimized risk sensitivity (α) for every subject; and 187 3) Model C optimized both γ and α to for every subject. 188 The time scale of reward prediction parameter (γ) represents whether reward prediction is myopic 189 or long-sighted, lower values γ Î (0 1), γ → 0 suggest myopic reward prediction leading to 190 impulsive decisions while higher values, γ → 1 suggest long-sighted integration of rewards for 191 decisions. 192 The risk sensitivity parameter (α) measures the extent to which expected uncertainty associated 193 with the door influences the decision utility, the smaller the parameter value α Î (-1 1), α → -1 the 194 higher is risk seeking, while a larger value, α → 1 indicates high risk aversiveness. 195 The simulation agent had reward distributions as in the real experiment but scaled down 196 by multiplying with a parameter 0.1, and varying with blocks (Δpayoff, Δ0payoff) that were 197 randomly ordered. There were as high as 50,000 trials in each block for letting model performance 198 converge. 199 200 The agent has to choose between two doors each of which (stimulus, s) was represented by a radial 201 basis function (Φi) as below: 202 (2) 203 Here, the μs and σ denotes the mean (s ∈ [1 2]; door1 = 1; door2 = 2) and standard deviation of the 204 inverse attention parameter, respectively. σ is set to 1 in our models. 205 206 The door stimulus is multiplied with the weight matrix wv for computing its value function, Q, and 207 wr for constructing its risk function, √h. 208 209 Utility associated with any state at a trial, t, is the combination of value and risk function (Bell, 210 1995 The door choice selection is performed using the SoftMax principle defined as below. According 216 to SoftMax, the probability for choosing a door at trial, t, is P(s,t): 217 Here, n is the total number of doors available, and β is the exploration index. β is set to 1 in our 219 models. 220 After choice selection, the weight functions are updated using the below principles. The choice 221 value function Q at trial t+1 for door, s, may be expressed as, 222 where ηQ is the learning rate of the value function (0 < ηQ < 1) for the stimulus variable, Φ(s). δ 224 is the temporal difference error represented as 225 where r is the reward associated with taking an action, a, for stimulus, s, at time, t, and γ is the 227 where ηh is the learning rate of the risk function (0 <ηh< 1), and ξ is the risk prediction error 233 expressed by the below equation. 234 For simplicity, we model as ηh = ηQ =0.1 as an initial optimization for our subjects for η provided 236 a median of 0.1. The weights wv and wr are set to a small random number from set [-0.0005 0.0005] 237 at trial = 1. The weights are normalized by dividing by their norm. 238 239 The cost function optimizes the frequency of selections of rare gain and rare loss options in 240 Δ0payoff and Δpayoff blocks for every subject after running the simulation agent for 10 instances 241 of 100,000 each, and inferring the optimal parameters for every participant in our study using 242 fmincon function in MATLAB. Cost function = sum of squares of the difference for observed 243 actual ( Proportion# RareGexpt + Proportion# RareLexpt + Proportion# RareGbase + Proportion# 244 RareLbase) -simulated actual (Proportion# RareGexpt + Proportion# RareLexpt + Proportion# 245 RareGbase + Proportion# RareLbase). Optimization is carried out for either one (Models A,B) or two 246 (Model C) parameters, using fmincon(). We ran fmincon() 100 times to choose the parameter set 247 with least cost for any subject. 248 249 The AIC (Akaike Information Criteria) for these models were built using the likelihood function, 250 which was estimated as the average correlation coefficient between the simulated and the observed 251 key task behaviors -1) payoff performance or Perf measure (eqn. 1); 2) gain frequency Bias 252 measure (eqn. 1); and 3) the block differences between Δpayoff and Δ0payoff blocks in Win-Stay 253 results for the RareG door. 254 255 Neural data processing. We applied a uniform processing pipeline to all EEG data acquired 256 simultaneous to the reward task. This included: 1) data pre-processing, 2) computing event related 257 spectral perturbations (ERSP) for all channels, and 3) cortical source localization of the EEG data 258 filtered within relevant theta, alpha and beta frequency bands. 259 260 1) Data preprocessing was conducted using the EEGLAB toolbox in MATLAB (Delorme & 261 Makeig, 2004) . EEG data was resampled at 250 Hz, and filtered in the 1-45 Hz range to exclude 262 ultraslow DC drifts at <1Hz and high-frequency noise produced by muscle movements and 263 external electrical sources at >45Hz. EEG data were average referenced and epoched to the chosen 264 door presentation during the task, in the -.5 sec to +1.5 sec time window (Figure 1) . Any missing 265 channel data (one channel each in 6 participants) was spherically interpolated to nearest neighbors. 266 Epoched data were cleaned using the autorej function in EEGLAB to remove noisy trials (>5sd 267 outliers rejected over max 8 iterations; 0.91± 2.65% of trials rejected per participant). EEG data 268 were further cleaned by excluding signals estimated to be originating from non-brain sources, such 269 as electrooculographic, electromyographic or unknown sources, using the Sparse Bayesian 270 learning (SBL) algorithm (Ojeda et al., 2018 (Ojeda et al., , 2021 , https://github.com/aojeda/PEB) explained 271 below in the cortical source localization section. 272 273 2) For ERSP calculations, we performed time-frequency decomposition of the epoched data using 274 the continuous wavelet transform (cwt) function in MATLAB's signal processing toolbox. 275 Baseline time-frequency (TF) data in the -250 ms to -50 ms time window prior to chosen door 276 presentation were subtracted from the epoched trials (at each frequency) to observe the event-277 related synchronization (ERS) and event-related desynchronization (ERD) modulations 278 (Pfurtscheller, 1999 Figure 1B) . Corresponding lose-shift 372 behavior did not differ between blocks (p=0.33). 373 Additionally, we found that the payoff-based responses, Perf were significantly correlated 374 to individual gain frequency Bias (r=0.65, p<0.0001, Figure 1C) . 375 Next, we implemented multivariate regression to model the payoff-related performance, 376 Perf, based on all self-reported demographic (age, gender, race, ethnicity, socio-economic status 377 SES) and mental health (anxiety, depression, inattention and hyperactivity) predictors as per Using this RL framework, we were interested in investigating how are choice decisions 409 affected by 1) extent of integration of rewards over time i.e. time scale of reward prediction, and 410 2) differential risk sensitivity to gains and losses affecting the choices, where risk is the variance 411 in reward outcomes. In order to find whether the observed subjective behavioral differences are 412 driven by one of the two or both of the above decision making measures, we built three separate 413 reinforcement learning models (RL, Figure 2A Right rostral anterior cingulate cortex encodes reward payoff maximization. 447 Participants performed the reward task with simultaneous EEG, which we analyzed in the 448 theta (3-7 Hz), alpha (8-12 Hz), and beta (13-30 Hz) frequency bands in cortical source space 449 parcellated as per the Desikan-Killiany regions of interest (Desikan et al., 2006) . To identify the 450 neural correlates underlying reward payoff maximization (Perf), we modeled the neural variables 451 as predictors of Perf using robust multivariate linear regression, while accounting for gain 452 frequency Bias that was significantly related to Perf (Figure 1) , and the optimal RL model 453 parameter (γ). 454 We investigated neural activations from three relevant trial periods: immediately post-455 presentation of selected choice but prior to reward (0-500 ms selected choice period), during 456 presentation of trial reward (0-500 ms reward period), and during presentation of the cumulative 457 reward up to that trial in the trial sequence (0-500 ms cumulative reward period). Perf neural 458 activations were the relative difference in activity on Δpayoff vs. Δ0payoff block RareG trials. 459 Taking the relative block difference allowed non-task related individual EEG differences to cancel 460 out. Relative responses to the RareG door were important for analysis because this door choice 461 resulted in a larger long-term payoff than the other (RareL) door in the Δpayoff block. We applied 462 family-wise error-rate (fwer) corrections to the Perf source-space neural correlates to account for 463 multiple comparisons across three frequency bands (theta, alpha, beta) and three time periods 464 (choice, reward, cumulative reward). Figure 3A shows the Perf source-space neural correlates 465 found to be significant in this analysis; all neural activations inversely related to Perf. 466 We further accounted for the multiple independently significant cortical ROI predictors 467 ( Figure 3A but differing only on the long-term outcome between options, allows measurement of long-term 530 payoff maximization strategy. Therefore, our study design is able to distinguish reward 531 maximization from melioration and further leverages these measures to inform mental health 532 behaviors. 533 The behavioral outcomes of our experiment varied based on individual subject 534 characteristics. We found that payoff-based performance was significantly related to individual 535 bias for observed frequency of gains; this is in line with prior studies of decision-making but 536 wherein gain frequency decisions are often conflated with expected value (Bechara et al., 1997; 537 Lin et al., 2009). We further modeled subjective differences in reinforcement learning based 538 decision making and by using RL modeling framework, we extracted subjective sensitivity to risks 539 (α) and time scale of reward prediction (γ) to explain each subject's behavior. The parsimonious 540 RL models suggested that the observed behavior is preferably explained by the differences in the 541 extent of reward prediction over time (γ) between individuals. 542 Uniquely, we then investigated the neural correlates of payoff performance while 543 accounting for individual differences in gain frequency bias and extent of reward prediction. More specifically, analyses showed theta activity in right rACC negatively associated with 561 payoff-based performance. This finding is aligned with prior evidence for reward-based theta 562 processing and its widely studied relationship to long-term risk or uncertainty (Cavanagh et Preuschoff et al., 2006) . Notably, reward period bilateral rACC alpha also related to payoff 568 maximization, but this was not a distinct correlate of payoff as it also significantly explained gain 569 frequency bias or melioration in our task. 570 Translational neuroscience studies show that reward based decision processing deficits are 571 found in depression and in attention disorders, leading to difficulty in reward integration and 572 Altogether, our study presents the importance of controlling for melioration biases for 585 immediate reward frequency and individual differences in learning while assessing advantageous, 586 i.e. foresighted, decision-making ability in humans. Dysfunctional reward processing in depression. Current 621 Opinion in Psychology Examination (MMSE) for the detection of Alzheimer's disease and other dementias in 625 people with mild cognitive impairment (MCI). Cochrane Database of Systematic 626 Converging Evidence for a Fronto-Basal-Ganglia Network for Inhibitory Control of 629 An extended 632 reinforcement learning model of basal ganglia to understand the contributions of 633 serotonin and dopamine in risk-based decision making, reward prediction, and 634 punishment learning A network 636 model of basal ganglia for understanding the roles of dopamine and serotonin in reward-637 punishment-risk based decision making Bipolar oscillations between positive and 640 negative mood states in a computational model of Basal Ganglia. Cognitive 641 Neurodynamics Identifying the Basal Ganglia Network Model Markers for Medication-Induced 644 Impulsivity in Parkinson's Disease Patients Identifying the Basal Ganglia Network Model Markers for Medication Impulsivity in Parkinson's Disease Patients Modeling 650 Serotonin's Contributions to Basal Ganglia Dynamics Overlapping neural processes for stopping and 653 economic choice in orbitofrontal cortex Using a Simple Neural 655 Network to Delineate Some Principles of Distributed Economic Choice. Frontiers in 656 Mapping Cognitive Brain Functions 659 at Scale Deciding advantageously before 661 knowing the advantageous strategy Characterization of the decision-making deficit 663 of patients with ventromedial prefrontal cortex lesions Learning the 666 value of information in an uncertain world Risk, return, and utility Differences in time course activation of dorsolateral prefrontal cortex associated with low 671 or high risk choices in a gambling task Rostral Anterior Cingulate Cortex Volume Correlates with Depressed Mood in Normal Healthy Children Iowa Gambling Task (IGT): 676 Twenty years after -gambling disorder and IGT Frontal theta links prediction 679 errors to behavioral adaptation in reinforcement learning The many facets of dopamine: Toward an integrative theory of the role of 683 dopamine in managing the body's energy resources Is deck C an advantageous deck in the Iowa Gambling Task? 685 Immediate gain 687 is long-term loss: Are there foresighted decision makers in the Iowa Gambling Task? 688 dorsolateral prefrontal cortices mediate adaptive decisions under ambiguity by integrating 691 choice utility and outcome evaluation Right frontal cortex generates reward-related theta-band 693 oscillatory activity Neural correlates 695 of risk prediction error during reinforcement learning in humans EEGLAB: An open source toolbox for analysis of single-trial 698 EEG dynamics including independent component analysis Humans incorporate attention-701 dependent uncertainty into perceptual decisions and confidence An 706 automated labeling system for subdividing the human cerebral cortex on MRI scans into 707 gyral based regions of interest Peril and pleasure: An RDOC-inspired examination of threat responses and reward 711 processing in anxiety and depression Sparse source imaging in electroencephalography with accurate field 713 modeling Metalearning and neuromodulation Reward and punishment processing in depression Individual differences in decision-making. Personality and 718 Individual Differences The influence of individual differences on the Iowa Gambling Task and real-720 world decision making Comparing 722 alternative metrics to assess performance on the Iowa Gambling Task Decision Making in Children With ADHD 726 ADHD-Anxious/Depressed, and Control Children Using a Child Version of the Thresholding of Statistical Maps in 730 Functional Neuroimaging Using the False Discovery Rate Expected value and prediction error abnormalities in depression and 734 schizophrenia Risky behavior in gambling 736 tasks in individuals with ADHD-a systematic literature review Computational model of precision 738 grip in Parkinson's disease: A Utility based approach The amygdala and decision-making The role 743 of the right inferior frontal gyrus: Inhibition and attentional control Individual differences in need for cognition and decision making in the 746 Enhancement of MR Images Using Registration for Signal Averaging Double dissociation of value 751 computations in orbitofrontal and anterior cingulate neurons Lab Streaming Layer Distinct 756 neural mechanisms of risk and ambiguity: A meta-analysis of decision-making The PHQ-9 Gain-loss frequency and final outcome in the Soochow 761 Gambling Task: A Reassessment Identifying the neurobiology of altered 764 reinforcement sensitivity in ADHD: A review and research agenda Depression and the perception of reinforcement Neural correlates of cognitive control in gambling disorder: A 770 systematic review of fMRI studies Future-oriented decision-773 making in Generalized Anxiety Disorder is evident across different versions of the Iowa 774 Gambling Task A computational model of altered gait patterns in parkinson's disease patients 778 negotiating narrow doorways Preparing to stop action 781 increases beta band power in contralateral sensorimotor cortex The Iowa GamblingTask in depression 784 -what have we learned about sub-optimal decision-making strategies Rewards and punishments in iterated decision making: An 787 explanation for the frequency of the contingent event effect Revealing individual differences in the Iowa 790 Gambling Task Problem gamblers exhibit reward 792 hypersensitivity in medial frontal cortex during gambling Bridging M/EEG Source Imaging and 795 Independent Component Analysis Frameworks Using Biologically Inspired Sparsity 796 Fast and robust Block-Sparse Bayesian 798 learning for EEG source imaging Low resolution electromagnetic 800 tomography: A new method for localizing electrical activity in the brain Anterior cingulate activity modulates nonlinear decision 804 weight function of uncertain prospects EEG event-Related desynchronization (ERD) and event-Releated 806 synchronization (ERS) Prefrontal Control over Motor Cortex Cycles at Beta Frequency during Movement 810 Inhibition Frontal brain 813 asymmetry and reward responsiveness: A source-localization study Pretreatment Rostral Anterior Cingulate Cortex Theta Activity in Relation to Symptom 819 Improvement in Depression: A Randomized Clinical Trial Neural differentiation of expected reward 822 and risk in human subcortical structures Executive 825 Functions in Pathologic Gamblers Selected in an Ecologic Setting Attention-gated reinforcement learning of internal 828 representations for classification Multiple reward signals in the brain Deficient reinforcement 832 learning in medial frontal cortex as a model of dopamine-related motivational Melioration as rational choice: 835 Sequential decision making in uncertain environments Decision making in the reward and punishment variants of the 838 Iowa gambling task: Evidence of "foresight" or "framing"? A Brief Measure for 841 Assessing Generalized Anxiety Disorder: The GAD-7 Dissociable processes underlying decisions in the 844 Iowa Gambling Task: A new integrative framework. Behavioral and Brain Functions The 847 smartphone brain scanner: A portable real-time neuroimaging system Reinforcement Learning: An Introduction. Adaptive 850 Computations and Machine Learning Risk-852 dependent reward value signal in human prefrontal cortex A potential role of the inferior frontal gyrus and anterior 855 insula in cognitive control, brain rhythms, and event-related potentials Do individual differences in Iowa Gambling 858 Task performance predict adaptive decision making for risky gains and losses Ten simple rules for the computational modeling of 861 behavioral data. ELife, 8, e49547 Decision-making ability in psychosis: A systematic review and meta-analysis of the 864 magnitude Rostral anterior cingulate cortex activity mediates the 868 relationship between the depressive symptoms and the medial prefrontal cortex activity Cognitive 871 control involves theta power within trials and beta power across trials in the prefrontal-872 subthalamic network Modelling ADHD: A 875 review of ADHD theories through their predictions for computational models of 876 decision-making and reinforcement learning