key: cord-0073400-mibdhu6q authors: Cousins, Aidan; Nakano, Lucas; Schofield, Emma; Kabaila, Rasa title: A neural network approach to optimising treatments for depression using data from specialist and community psychiatric services in Australia, New Zealand and Japan date: 2022-01-13 journal: Neural Comput Appl DOI: 10.1007/s00521-021-06710-3 sha: a155926ddb9965cf1fc75d2b801f7ab8bdce0535 doc_id: 73400 cord_uid: mibdhu6q This study investigated the application of a recurrent neural network for optimising pharmacological treatment for depression. A clinical dataset of 458 participants from specialist and community psychiatric services in Australia, New Zealand and Japan were extracted from an existing custom-built, web-based tool called Psynary . This data, which included baseline and self-completed reviews, was used to train and refine a novel algorithm which was a fully connected network feature extractor and long short-term memory algorithm was firstly trained in isolation and then integrated and annealed using slow learning rates due to the low dimensionality of the data. The accuracy of predicting depression remission before processing patient review data was 49.8%. After processing only 2 reviews, the accuracy was 76.5%. When considering a change in medication, the precision of changing medications was 97.4% and the recall was 71.4% . The medications with predicted best results were antipsychotics (88%) and selective serotonin reuptake inhibitors (87.9%). This is the first study that has created an all-in-one algorithm for optimising treatments for all subtypes of depression. Reducing treatment optimisation time for patients suffering with depression may lead to earlier remission and hence reduce the high levels of disability associated with the condition. Furthermore, in a setting where mental health conditions are increasing strain on mental health services, the utilisation of web-based tools for remote monitoring and machine/deep learning algorithms may assist clinicians in both specialist and primary care in extending specialist mental healthcare to a larger patient community. According to the International Statistical Classification of Disease Revision 10 (ICD-10), depression is categorised under mood (affective) disorders where the central disturbance is a change in mood to depression or elation [1] . Depressive episodes occur when the patient suffers from decreased mood, energy, activity, self-esteem and selfconfidence [1] . Patients exhibit lesser capacity for enjoyment and decreased interest in their avocations while also exhibiting ideas of guilt or worthlessness [1] . There are several subtypes of depression including major depressive disorder, bipolar disorder and melancholic depression [1] . Major depressive disorder (MDD) is described as a short-or long-term impairment causing significant reductions in quality of life and psychosocial functioning by affecting areas of life including mood, affect, motivation and cognition [2] . According to the ICD-10, bipolar disorder is characterised by two or more episodes where the patient's mood and activity levels are significantly altered resulting in occasions of elevation of mood and increased energy or decreased energy and activity [1] . Hypomania is a period of persistent mild elevation of mood, increased energy and activity, while mania is mood elevated higher than what would be expected in the patient's circumstances [1] . Melancholic depression has been a contentious issue within psychiatric circles, with some viewing the disorder as a dimension of a severe expression of clinical depression ("melancholia"), while others believe it is a separate type of depressive disorder [3] [4] [5] . Regardless of its classification, melancholic depression is a severe form of depression with prominent neurovegetative symptoms [3] . Depression is a major public health issue and cause of disability [6] . An analysis of data gathered in the Global Burden of Disease study (held from 1990 to 2017) showed that the incidence of depression had increased worldwide from 172 to 258 million or by 49.86% [7] . The study further analysed age-standardised incidence rates (ASR) and estimated annual percentage changes (EAPC) of the 195 countries included in the population study. ASR was found to be significantly increased in 29 countries, slightly increased in 132 countries, slightly decreased in 9 countries and significantly decreased in 9 countries [7] . Interestingly, the number of people with depression increased in all five socio-demographic index (SDI) levels (low, low-middle, middle, high-middle and high); however, the ASR only increased in the high-SDI region [7] . Geographically, the number of people with depression increased in all geographical locations. 93.7% of patients with depression in 2017 were found to have MDD [7] . A systematic review conducted on the same Global Burden of Disease data by Ferrari et al. showed that depressive disorders were the second leading cause of years lived with a disability (YLDs) in 2010. MDD accounted for 8.2% of global YLDs [6] . Depressive disorders were also a leading cause of disability-adjusted life-years (DALYs) with MDD accounting for 2.5% of global DALYs. Furthermore, MDD was found to be the cause of 16 million suicide DALYs. In Australia, the National Survey of Mental Health and Wellbeing completed in 2007 estimated that 45% of the Australian population aged between 16 and 85 years would experience a mental disorder during their lifetime and 1 in 5 had experienced a mental disorder in the last year [8] . Since 1998, there have been many attempts to quantify the prevalence of depression with a 2019 study showing depressive symptoms in 7.4% and 13.2% of Australian males and females, respectively [9] [10] [11] [12] . The economic impact of mental health illness in Australia was estimated to be $10.6 billion AUD in the 2018-2019 financial year with studies projecting that health service costs will increase by 45% between 2006 and 2026, and that the cumulative cost of mental illness over the next 30 years will exceed $2.63 trillion AUD [8, [13] [14] [15] . In New Zealand, the 2018 New Zealand Mental Health Monitor (NZMHM) provides one of the most recent reviews of the mental health of the New Zealand general population [16] . The NZMHM found that 32% of New Zealanders had an experience with mental distress and an additional 32% of New Zealanders lived with someone with a lifetime experience of mental distress [16] . Furthermore, 49% of the population were aware of a close friend who experiences mental distress [16] . During the recent COVID-19 pandemic in 2021, Gasteiger et al. completed a cross-sectional study with a cohort size of 681 adults older than 18 in New Zealand [17] . While the sample of the New Zealand population was 89% female and older than the median age of the general population (40 and 37.4 years, respectively), they found that 64% of participants reported symptoms of depression, 53% reported symptoms of anxiety, 31% reported moderate-tosevere symptoms of depression, and 24% reported moderateto-severe symptoms of anxiety [17] . Although outdated, the 2005 Depression Service Plan released by the Midcentral District Health Board in New Zealand estimated the cost of depression in New Zealand at $750 million per year [18] . More recent estimates are correlated to population-adjusting Australian figures. Similar to Australia and New Zealand, Japan has a high mental health burden [19, 20] . Community-based mental health surveys play an important role in estimating the prevalence of mental disorders because most people do not seek treatment even if they experience psychological distress equivalent to diagnosable mental disorders [20] . The World Mental Health Japan Survey Second conducted from 2013 to 2015 with a sample size of 2450, randomly selected residents between the ages of 20 and 75 [20, 21] . They found a lifetime prevalence of any mood disorder of 4.57% and 7.21% for men and women, respectively [20, 21] . They also found a 12-month prevalence of any mood disorder of 2.24% and 3.26% for men and women, respectively [20, 21] . Other studies have estimated the economic impact of depression to be between 1.29 billion and 3 trillion yen (or $16.46 billion AUD to $38 billion AUD) yearly [22, 23] . A wide range of screening tests that vary in length, style, administration and psychometric evaluation are currently used [25] . The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) Report, released in 2004, was an important milestone in optimising management of patients with depression [26, 27] . Four thousand patients from 41 different primary and psychiatric care sites were evaluated using a novel 4-level treatment paradigm. The remission rate, using The Quick Inventory of Depressive Symptomatology, was 32.9%, 30.6%, 13.6% and 14.7% for the levels 1, 2, 3 and 4, respectively [27] . Although a cumulative remission rate of 65.8% was achieved, crucially, 34.2% did not improve with medical interventions for depression [27] . For non-remission patients, it is critical to identify the earliest point at which to stop further medication trials because the longer the time to remission, the less chance a patient has of reaching remission [27] .Optimising an individual patient's treatment plan so that they have the highest likelihood for remission is a needed advancement in psychiatry. Psynary (www.psynary.com) is a web-based tool used to support clinicians, organisations and patients in the diagnosis and treatment of mood and anxiety disorders in accordance with the ICD-10 [28] . The Psynary system was designed to collect de-identified data in a format that could be represented numerically with no requirement for language processing. It is anonymous, which avoids privacy and data protection issues that can be associated with online platforms [29] . The Optimisation of Treatment for Mood and Anxiety Disorders Study 1 (OptiMA1) involved two parallel studies in New Zealand and Japan for the purpose of validating key outcome measures developed for the Psynary platform [28] . These outcome measures included the R8 Depression score and R8 Anxiety score [28] . The R8 Depression score was designed to fully capture the wide range of symptom domains seen in depression, across the full range of illness severity, and to be sensitive to treatment effects [28] . Participants (n=270) were recruited from patients registered on Psynary by the public community mental health clinic at Nelson Marlborough District Health Board in New Zealand (n=62) and by the private clinic serving the English-speaking population at the American Clinic Tokyo in Japan (n=208) [28] . Patients with probable mood or anxiety disorders who registered to Psynary between 24th March 2016 and 25th October 2018 were invited to complete either an online or written consent process prior to participating in the study [28] . Inclusion criteria included completing Psynary in the English language, being over 18 years of age for NZ, or 20 years of age for Japan, and having an ICD-10 diagnosis of a current depressive episode (unipolar or bipolar) or anxiety disorder (ICD-10 F31.3, F31.4, F31.81, F32.1, F32.2, F33.1, F33.2, F40-F43) confirmed by the treating clinician at their initial appointment [28] . An early analysis of the cohort (n=131) suggested a similar doubling of remission rates in response to treatment optimisation, but over a shorter 90-day time period compared to the STAR*D trial (Fig. 1 ) [29] . This validation study found that if patients had less than a 20% reduction in R8 Depression score after 6 days of treatment, the negative predictive value for non-remission was 98% [29] (Table 1) . From OptiMA1, Psynary appears to be beneficial in monitoring response to treatment and guiding timing of medication optimisation [28] . These findings suggested that the Psynary database could potentially be utilised to develop predictive models of response to treatment and guide treatment selection [28] . Since then, OptiMA2 has been conducted, a qualitative study to establish Nurse Practitioner-Psynary-assisted care pathway in Port Macquarie, New South Wales, Australia. This is now being followed up by OptiMA3 that will collect naturalistic clinical outcomes from that pathway. Both these studies also include ethics approval to analyse the naturalistic clinical outcomes from all participants. These have been included in this current study to examine the feasibility of using the Psynary database to conduct machine learning approaches to develop predictive algorithms to guide treatment selection. Unlike other medical specialties that heavily rely on quantitative biomarkers to assist in the diagnosis of diseases, treatment planning and measurement of outcomes, mental health still predominantly relies on clinical measures. Mental health team members use patient interviews, [29] . The OptiMA1 study results are shown in colour, while the STAR*D trial is shown in grey (Colour figure online) Neural Computing and Applications questionnaires and patient reports to evaluate signs and symptoms [30, 31] . The experience and subjectivity of the clinician are heavily leveraged to make inferences about symptomatology from this data, which is difficult to imitate using supervised deep learning (DL) models. Supervised DL models require a training set containing "true" labels to optimise model parameters before the model can be used to predict the diagnostic outcome of new subjects. Therefore, the quality of expert-provided diagnostic labels used for training sets the upper-bound for the predictive performance of the model [31] [32] [33] . Despite the difficulties, there has been a large amount of research in the usage of ML/DL techniques in mental health. A number of studies have focussed on scraping social media posts to predict depression for example, using multinomial naive Bayes (with an accuracy of 76.69%) and event-driven tendency warning models (best recall rate of 0.668 and F-measure of 0.624) [34, 35] . Other studies have focussed on analysing electronic health records. Nemesure et al. used a sample of 4184 students who underwent a general health and psychiatric assessment for the diagnosis of MDD and Generalised Anxiety Disorder. A high-level XGBoost model was used to produce an area under the receiver operating characteristic (AUROC) of 0.67, a sensitivity of 0.55 and specificity of 0.7 for major depression [36] . Further studies have used ML/DL techniques to classify, diagnose and grade depression using clinical data, externally validated screening tests and biomarkers [37] [38] [39] [40] [41] [42] [43] [44] . In 2018, Gao, Calhoun and Sui completed a review of 66 studies focussed on MDD that have used magnetic resonance imaging to either classify MDD from controls or other mood disorders or investigated treatment outcome predictors for individual patients [45] . The 66 studies investigated by Gao, Calhoun and Sui could be further classified into 9 groups of studies that were focussed on diagnosis/classification of only MDD , only bipolar disorder [67] , diagnosis/classification of MDD compared to bipolar disorder [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] , diagnosis/classification of MDD compared to Generalised Anxiety Disorder [83] , diagnosis/classification of Schizophrenia and mood disorders (including MDD and Generalised Anxiety Disorder) [84, 85] , brain abnormalities in patients who have been diagnosed with a particular mood disorder [86] [87] [88] [89] [90] [91] [92] [93] [94] , analysing therapeutic responses to MDD [95] [96] [97] [98] [99] [100] , using neurobiological markers/neuroimaging for diagnosis/classification [101] [102] [103] [104] [105] [106] [107] [108] [109] and predicting responses to electroconvulsive therapy [110, 111] . In 2021, de Nijs et al. were able to create individualised models to predict 3-and 6-year symptomatic and global outcomes of patients with schizophrenia-spectrum disorders (mainly with established illness, but variable illness duration) based on patient-reportable data with a study size of 523 schizophrenia-spectrum patients [112] . Also in 2021, Taliaz et al. used genetic, clinical and demographic data from patients with solely MDD in the STAR*D Report to create an ML algorithm that generated an accurate predictor of response to three antidepressant medications with an average balanced accuracy of 72.3% and 70.1% across the medications in validation and test sets, respectively [113] . They then obtained data from the Pharmacogenomic Research Network Antidepressant Medication Pharmacogenomic Study (PGRN-AMPS) of patients treated with citalopram (a selective serotonin reuptake inhibitor) [113] . This external validation yielded accuracy of 60.5% and 61.3% for the STAR*D and PGRN-AMPS, respectively [113] . The aim of this study is to demonstrate how the clinical data collected by Psynary can be used to optimise the treatment of depression in the clinical setting. We hypothesised that ML/DL techniques could be used on the data collected in the Psynary database to support the optimisation of the treatment and management of depression. Although there have been previous studies that used clinical data to build ML/DL algorithms to diagnose depression and a quite recent study that predicted the response of three medications in treating MDD, we are unaware of any published study that has used clinical data to build a single, all-inclusive ML/DL algorithm to optimise the treatment of all subtypes of depression. The Psynary database included details from patients being treated for depression (in any of its forms) at associated community and specialist psychiatric facilities in either , patients who had alcohol-related comorbidities with a primary diagnosis of a mood or anxiety disorder and exclusion criteria were clients presenting with psychotic symptoms, significant comorbid alcohol and drug misuse where these were the primary diagnoses, terminal/life threatening physical comorbidity and cognitive impairment or intellectual disability. These were assessed on the basis of past medical history, liaison with GP and, with permission, collateral history from a relative or carer. All patients consented to their de-identified data being included in the Psynary database for the purposes of ongoing research, and the OptiMA1, 2 and 3 studies were approved by the relevant local ethics committees. OPTIMA patients were instructed to create an online Psynary account and complete a baseline evaluation with the in-person support of a psychiatrist or nurse practitioner and then complete weekly reviews either by themselves remotely or with the help of a nurse practitioner with expertise in mental health. The Psynary database then logged this baseline information which included; past history such as family history, past episodes of depression and other disorders, age of onset, hospital admissions, past deliberate self-harm and past attempted suicide; deliberate self-harm questions including suicidal thoughts intensity, suicidal planning and suicidal attempts and other domains such as alcohol intake, current psychiatric medication (name and dose) previous treatment changes and whether they had electroconvulsive therapy sessions. The Psynary system incorporates the Hypomania Checklist 16 item (HCL-16) [114] , Generalised Anxiety Disorder-7 (GAD-7) [115] and Patient Health Questionnaire-9 (PHQ-9) [25] scores and generates ICD-10 diagnoses for all major mood and anxiety disorders. It also includes the proprietary main outcome measures of this study, the R8 Depression and R8 Anxiety scores. All of these metrics were used in building the algorithm reported in this article except for the HCL-16. When a patient completed a review, the Psynary database then logged; current psychiatric medication list, current alcohol intake, current deliberate self-harm responses, recent electroconvulsive therapy treatment, R8 Depression, R8 Anxiety, PHQ-9, GAD-7 current scores. All of these metrics were used in the review aspect of the model except for the HCL-16. On the 1st of June 2021, anonymised clinical data from Psynary was exported and preprocessing undertaken. All patient demographic categorical data was one-hot encoded and empty data values were handled by denoting 0 to "No" and -1 to "No Answer" depending on the question field. All data preprocessing and algorithm creation was built on Jupyter lab version 3.0.14, python version 3.8.5 and using an AMD Ryzen Threadripper 2950X 16-Core Processor 3.50 GHz, 32.0 GB of installed RAM and an NVIDIA RTX 2080 GPU. The review data for each patient was stored in chronological order and filtered and preprocessed in function of the medications taken during the period corresponding to that review. Medications were categorised into one of the following drug classes: analgesics, antidepressants, antihistamines, antipsychotics, anxiolytics, benzodiazepines, hypnotics, mood stabilisers, opioid antagonists and stimulants. Antidepressants were further divided into their mechanism of action: mono-amine oxidase inhibitors (MAOIs), noradrenaline reuptake inhibitors (NaRls), selective serotonin reuptake inhibitors (SSRIs), serotoninnorepinephrine reuptake inhibitors (SNRIs), tricyclic antidepressants (TCAs) and atypical antidepressant medications. Groups with low representation in reviews were discarded, as were patients without reviews for betterrepresented medications. If, for a given review, a respondent had taken medications of more than one group, that review appears as duplicates in each corresponding array. As is expected from this type of data, review length frequency decreased exponentially (Fig. 2) . The average number of reviews per patient was 11.5, with 95% of patients having 39 reviews or fewer. The objective of our analysis was to propose treatment by predicting the effectiveness of psychiatric medications for each individual patient. To better fit the available data, we formulated our problem as a regression of the best R8 Depression score the patient would achieve while taking any given medication [28] . While training the model, the sequence of reviews provided to the model was truncated randomly, for each medication. This truncation used a geometric distribution to allow for early prediction of remission. In addition to the regression of R8 Depression score prediction, the model was trained to predict which medications will be prescribed by clinicians. We found this additional objective to both help prevent overfitting by increasing the ratio of training data to model capacity and improve performance of the model when used as a recommendation system. To do this, the medications prescribed by clinicians in all reviews are given as a multi-hot vector. For medication optimization, predicted R8 Depression scores S are corrected by coefficients calculated from the prescription probabilities P scaled with a learned factor b, the purpose of which is to control the relative importance of the prescription probabilities. The algorithm then proposes the medication with the lowest scaled score (Eq. 1). 2.2 Implementation of long short-term memory model Neural network models have become the de-facto standard in most machine learning applications. There are several choices of architecture for sequence modelling, most notably recurrent neural networks (RNN), 1-dimensional convolutional networks and attention networks [116] . The advantages of the latter two have largely to do with better gradient propagation and performance with long sequence lengths [116] . The Psynary review data, having short sequence lengths and unidirectional chronological structure, appears to be best suited for the simple-to-implement RNN structure. For this particular application, a long shortterm memory (LSTM) network was chosen due to its ready availability in deep learning libraries and proven effectiveness [117] . The architecture of the model in this study separately processes each sequence of reviews corresponding to each medication using the same LSTM network. We suspect that the patterns learned by networks trained in each medication separately are largely similar, and training an individual network for each medication group would severely reduce sample sizes due to subdivision of the dataset. Using a larger, single recurrent network was empirically found to produce better and more consistent results. Fully connected networks filled in the rest of the picture-they were used for feature extraction from both the baseline questionnaire and reviews, and at the end of the network for the final R8 Depression score and doctor prescription predictions. Training was performed in three steps. The fully connected network (FCN) review feature extractor and LSTM modules were first trained in isolation on the R8 Depression score prediction task. This recurrent model was then integrated into the baseline questionnaire feature extractor and final FCN layers with frozen weights, and the remainder of the model trained. Finally, the entire model was annealed using low learning rates. Due to the low dimensionality of the data, small size of the model and its recurrent structure, GPU acceleration was not used. To further validate the significance of the model, an optimisation test was performed. The model's outputs were taken and tested by posing the questions "After seeing the patient review data, should the medication regimen change or should it continue?" and "If the medication should change, what should the medication be changed to?". Because of the extra data dimensionality of patient reviews, instead of sensitivity, specificity, positive predictive and negative predictive values being calculated, the sensitivity, positive predictive value, false positive rate and correct medication change accuracy were calculated (Eqs. 2, 3, 4 and 5). This study utilised data collected from the OptiMA1, 2 and 3 [28, 118] studies, contained in the Psynary database. In total, data from 458 participants was included in this study ( Table 2 ). The large majority of participants were from Japan (85.6%), followed by New Zealand (9.6%) and then Australia (4.8%). The sex distribution slightly favoured females (57%) and the median age and age range of the participants were 32.5 and 18 to 73 years, respectively (Fig. 4) . Most of the participants were employed full-time (52.4%), spoke English (98.9%) and did not have the support of a carer (85.2%). The patient's age distribution showed two anomalous data points (ages 2 and 17) which were deleted and considered missing data. The participants' psychiatric characteristics were also interpreted ( Table 3 ). The average R8 Depression raw score was 34.28 out of a maximum of 84 with depression remission considered at a raw score less than 14 [28] . The PHQ-9 score is grouped based on severity into none (0-4), mild (5-9), moderate (10) (11) (12) (13) (14) , moderately severe (15) (16) (17) (18) (19) and severe (20) (21) (22) (23) (24) (25) (26) (27) [25] . The participants' average PHQ-9 score was 16.54 (considered within moderately severe depression) with 59% of the participants considered either moderately severe or severe. The GAD-7 score is grouped into mild (0-4), moderate (5-9) or severe (10-15) anxiety [115] . A total of 354 participants (77.3%) recorded scores reflecting severe anxiety. Furthermore, 326 (71%) participants were considered to have a subtype of unipolar depression and 84 participants (18.3%) were considered to have a diagnosis of a subtype of bipolar disorder . The remaining 3 participants with a provided diagnosis (0.7%) were considered to have hypomania ( Table 4 ) . The participants in this study were treated by two psychiatrists. An analysis of the frequency of medication classes prescribed showed SSRIs being the most prescribed medication class, followed by antipsychotics and mood stabilisers (Fig. 5) . SSRIs are likely to be the most prescribed medication class due to the existing treatment guidelines in Japan [119] , New Zealand and Australia [120] recommending them as first-line medications. The model in this study, trained with the Psynary dataset, without reviews, had an accuracy of predicting R8 Depression-defined remission of 49.8% (seen in Fig. 6 ). After the model had processed 2 reviews, the model had a prediction accuracy of 76.5%. The accuracy of predicting the individual medication classes that led to the best R8 Depression scores were then examined to understand the strengths and weaknesses of the algorithm. Without including reviews, it was found that the most accurate group was SSRIs (78.9%), followed by antipsychotics (53.6%), mood stabilisers (36.8%), benzodiazepines (35.7%), atypical antidepressants (29.4%), NaRIs (16.7%) and finally SNRIs (10.5%). However, when the model was trained with the reviews, antipsychotics were predicted to be the medication with the best R8 Depression scores (88%) followed by SSRIs (87.9%), mood stabilisers (69.1%), benzodiazepines (53.5%), atypical antidepressants (39.2%), SNRIs (27.7%) and NaRIs (25.9%) (Fig. 7) . These findings are not surprising as they mirror the frequency of patients prescribed a particular medication class (Fig. 6) . Analysis of training and test loss per epoch graph showed no significant changes between training and test loss (Fig. 8) . We also optimised the β coefficient (learned factor) to balance the trade-off between R8 Depression scores and prescribed medication prediction. The best β coefficient was found to be 0.6465 (Fig. 9) . We then completed an optimisation assessment of our model. Due to the extra dimension of patient reviews, an error matrix was not developed. Instead, we evaluated the model by assessing if asked to change medication, whether the model continued with the current medication or changed and then if it changed whether the model selected the correct medication (Fig. 10) . After just 2 reviews, the precision of changing medications (Change Recommendation Precision) was 97.4%, the recall (Change Recommendation Recall) was 71.4%, the accuracy of changing medications to the correct medication (Correct Change Recommendation) was 54.3%, and the false positive rate (Incorrect Change Recommendation) was 2.6% (Fig. 10) . Finally, we reviewed the mean absolute error by number of reviews. Our model can predict R8 Depression scores before reviews with a mean absolute error of 14.6±9.5. As the number of reviews increases, accuracy increases and deviation decreases significantly, though with some instability, likely due to the decreasing sample count (as shown in Fig. 11 ). In this study, we describe the development of a DL algorithm that predicts reaching remission from depression in response to psychiatric medications. The capabilities of the algorithm are shown by its ability to initially predict remission at an accuracy of 49.8% before processing reviews (Fig. 6) . After only 2 reviews, the model had an accuracy of 76.5%. The medications predicted to have the best R8 Depression score results were antipsychotics (88%), followed by SSRIs (87.9%) and mood stabilisers (69.1%) (Fig. 7) . This is of significant clinical importance as current treatment protocols for all depression subtypes are largely reliant on trial-and-error of treatment guidelines. Using the model created in this study as an adjunct to normal care could reduce the iterations of trial-and-error. Our model was designed around the restrictions imposed by our available data. Several design decisions were made that took into account the specific biases imposed by the naturalistic dataset. An analysis of a single medication was made difficult by the large variety of prescribed medication, which subdivided the data into very small subsets. To ameliorate this and to reduce dataset entropy, the data was categorised by medication class, with antidepressants divided into groups based on mechanism of action. Furthermore, many patients were given more than one medication at a time, which made unitary classification incompatible with the desired application of the model. Because the best indicator of remission was the R8 Depression score, regression of this value is a very similar task to binary prediction of remission, with the added benefit of giving a measure of expected improvement. The biggest performance improvement came from the addition of the auxiliary prediction of prescribed medications. This served as a heuristic technique that directed Neural Computing and Applications medication selection to that which a doctor is likely to prescribe during the treatment process. Put simply, it acts as a tiebreaker, allowing the model to accurately select between medication types it expects to perform well, without adding penalties to the R8 Depression regression. Training a model on a naturalistic dataset like Psynary comes with several drawbacks. The most notable of these are bias caused by the skewed distributions of the data, information gaps caused by the selectiveness of prescribed medications and several layers of survivorship bias. Our dataset comes from patients treated in Australia, New Zealand, and Japan and from clinics with different population profiles. The clinic in Tokyo focuses on primary care of mostly foreign residents and make up the majority of the dataset (85.6%), while patients from Oceania are largely secondary care patients. There is also asymmetric representation of medication types. As there were only two psychiatrists prescribing medications for the participants in this study, the dataset is heavily reliant on the two clinicians' experience which could be a potential source of survivorship bias. Our choices for model design also came with drawbacks. Most notably, the aggregation of medication types, the use of regression over explicit recommendation and the implementation of a recurrent model were decisions that compromise the model in specific ways. The prediction of a medication class is less useful for a clinician than predicting a specific medication, as patient response can vary significantly to medications of the same action type. Throughout a treatment, a patient may be prescribed several different medications of the class, which, as of now, our model cannot distinguish in any meaningful way. A possible solution for future work may be to use a finer category system, possible from learned clusters. As our model is not an explicit recommender system, multi-medication recommendation becomes difficult, with significantly lower accuracy. The main reason for choosing R8 Depression regression was the strong biases in the data. A binary recommender system would mostly replicate the biases of the prescribing clinicians, rather than learning from the performance of medications on patients. This was another reason that medications were aggregated by class, eliminating the bias coming from preference to a specific The use of recurrent models in neural networks may be on the decline. Many sequence modelling objectives are now better solved using convolutional or attention networks, such as in the state-of-the-art audio generator WaveNet [121] and for translation tasks in transformer architecture [117] . These options have several advantages over recurrent networks, such as shorter gradient propagation distances, better performance with long sequences and easier training. Our choice of using an LSTM caused, in particular, difficulty in training the network, as the parallel fully connected and recurrent portions did not train at equal rates. Using a different sequence modelling architecture would, however, have cost added complexity and development time that we felt was best allocated to other parts of the model. Finally, the training target is the best R8 Depression score reached during treatment in the presented model. Because of this, the model does not handle a relapse of depression once a participant has reached R8 Depressiondefined remission. In clinical practice, it is common for patients to relapse in depression and is evident in this study where 177 participants (38.6%) have been diagnosed with a form of recurrent depression (Table 2) . Future studies will include updating this current model to reflect this clinical situation. To evaluate the performance of the model under the intended conditions of treatment optimization, the model was evaluated in the binary decision problem of continuing the current medication or changing medication. We evaluated the model's accuracy both in the choice of changing medication and in the precision in recommending a new medication. The results are presented as a function of the number of reviews in Fig. 11 . After just 2 patient reviews, the model shows high recall (71.4%) in determining if a change in medication is needed. In the scenario of such a recommendation, the precision of the medication choice is similarly high (97.4%). To further validate the model, we then asked the model to select a different medication. The accuracy of changing to Taliaz et al. is the only published study similar to this one. Using genetic, clinical and demographic data from 1679 STAR*D participants, the team generated a hybrid, multi-step DL binary-response algorithm to predict whether a participant was either a "responder" or "nonresponder" for citalopram (SSRI), sertraline (SSRI) and venlafaxine (SNRI). This yielded an average balanced accuracy of 70.1% for the final test set, compared to a 46.8% initial response rate for participants with MDD. While they used a wide variety of data to arguably create a more visible clinical picture of the participant, our model was able to perform with an initial accuracy of 49.8% and accuracy of 76.5% after 2 patient reviews while accommodating for several different types of medication classes, subclasses and subtypes of depression. Taliaz et al. also completed additional statistical calculations of the models performance. The sensitivity, specificity, positive predictive value and negative predictive value were 68.7%, 71.4%, 71.7% and 69%, respectively [113] . In comparison after only 2 reviews, our recall rate (also known as sensitivity) was 71.4%, similar to Taliaz et al., results and our precision rate (also known as positive predictive value) was 97.4% which far exceeded Taliaz et al., results. From the 66 studies analysed by Gao, Calhoun and Sui, only 5 focussed on predicting the therapeutic response and all 5 solely focussed on MDD instead of all subtypes of depression [45] . In 2015, Korgaonkar et al. investigated objective brain volumetric measures of patients with MDD to reliably predict symptomatic remission with their initial antidepressant medication [96] . Their study found two decision trees that had high probability prediction scores of non-remission and were replicated [96] . These were 1) left middle frontal volume less than 14.8 mL and right angular gyrus volume greater than 6.3 mL which discerned 55% of non-remitters with an 85% accuracy; and 2) fractional anisotropy values in the left cingulum bundle less than 0.63, right superior fronto-occipital fasciculus less than 0.5 which discerned 15% of non-remitters with 84% accuracy [96] . Also in 2015, Williams et al. investigated whether amygdala activation stimulated by emotion was a general or differential predictor of response to escitalopram (SSRI), sertraline (SSRI) and venlafaxine (SNRI) using MRI [97] . Their model classified responders vs non-responders with an overall accuracy, cross-validation accuracy, sensitivity and specificity of 75%, 75%, 77% and 72%, respectively . They found that when adding age to a model that looked at the stria terminalis fractional anisotropy and the cingulate fractional anisotropy, their model had an overall accuracy of 74%, sensitivity of 74% and specificity of 75% [98]. Gong et al. investigated the diagnostic and prognostic potential of pre-treatment structural neuroanatomy using support vector machine (SVM) in patients with non-refractory depressive disorder or refractory depressive disorder (two subtypes of MDD) [99] . Sixty-one patients were prescribed either an SSRI, TCA or SNRI. The diagnostic accuracy, sensitivity and specificity when applying SVM to both grey and white matter images were 65.22% [99] . The prognostic accuracy based on both grey and white matter images resulted in an accuracy, sensitivity and specificity of 69.57% [99] . Finally, Costafreda et al. interestingly investigated the functional neuroanatomy of showing sad faces of different intensities to patients with acute MDD before cognitive behavioural therapy to predict clinical response [100] . They found that prediction of remission from MDD was significant at the lowest and highest intensities of sadness. Both situations had a sensitivity of 71% and specificity of 86% [100] . Our optimisation assessment illustrates that our model is highly comparable to these 5 studies when comparing our sensitivity rate (71.4%). What is very important from our results and cannot be compared to these 5 studies are our precision rate of 97.4%, false positive rate of 2.6% and correct change accuracy of 54.3%. Additionally, when solely comparing model accuracies, our model has an accuracy of 76.5% after 2 patient reviews and performs better than all studies apart from Korgaonkar et al. With depression prevalence rates increasing and placing pressure on existing services, new management techniques need to be considered to ensure all patients are treated, and in remission, as soon as possible [122] . Currently, the majority of patients with depression present to primary care [123] and secondary mental health services predominantly accommodate severe and/or high risk presentations, leaving the majority of people with depression unable to access Fig. 9 Optimisation of the developed model's β coefficient by accuracy with and without reviews Neural Computing and Applications specialist mental health services [123] [124] [125] [126] . Introduction of telemedicine as an aspect of management for mental disorders covers two important factors of care; it improves patient access in areas where specialists are limited (including regional, rural and remote areas), and it improves disease control and relapse prevention [127] . Self-completed web-based systems ensure consistency of data collection and some patients are more likely to disclose relevant information on a self-completed assessment compared to clinician-interviewed settings [128, 129] . Furthermore, online systems provide the opportunity to collect results and track progress [128] . This study has illustrated the benefits of using Psynary, a web-based tool as an adjunct to normal care for depression. Unlike typical mental health care settings, the data is recorded in a quantitative format that lends itself to analysis. This study has also shown the clinical importance of the application of ML/DL algorithms to clinically collected data for the optimisation of depression treatment. To further validate the use of Psynary and the DL algorithm created in this study, further work will involve diversifying the demographic of participants and clinicians, the locations of the clinics and implementing genomic data (single nucleotide polymorphisms) with a genomics team to create a multi-dimensional, robust model . Depression has a major health and socio-economic impact, is continuing to increase in prevalence and mental health services are struggling with the increased demand . Using a combination of web-based tools and DL algorithms in a clinical setting, as outlined in this study, could lead to an increase in clinician accessibility and reduce time taken to reach optimal treatment protocols thereby reducing the prevalence of depression and its socio-economic impact on societies. All clinical data used in this article is fully anonymised from the point of capture. It is available on application to Dr Andrew Kissane for the purpose of validation, regulation and further research. The data and the web-based Psynary system remain the intellectual property of International Medical K.K. ICD-10 Version:2019. In: World Health Organisation The developed model's mean absolute error per number of reviews Neural Computing and Applications Cognitive dysfunction in depression-pathophysiology and novel targets Identifying and differentiating melancholic depression in a non-clinical sample Defining melancholia: a core mood disorder Diagnosing melancholic depression: some personal observations Burden of depressive disorders by country, sex, age, and year: findings from the global burden of disease study 2010 Changes in the global burden of depression from 1990 to 2017: findings from the Global Burden of Disease study Australian Institute of Health and Welfare (2021) Mental Health Services in Australia Prevalence of depressive symptoms and its associated factors among healthy community-dwelling older adults living in Australia and the United States Changes in the prevalence of major depression in an Australian community sample between Psychosis, depression and behavioural disturbances in Sydney nursing home residents: prevalence and predictors The community prevalence of depression in older Australians A review of the economic impact of mental illness The Life and Economic Impact of Major Mental Illnesses in Canada. Mental Health Commission of Canada Paying the pricethe cost of mental health care in England to 2026 Depression, anxiety and stress during the COVID-19 pandemic: results from a New Zealand cohort study on mental well-being MidCentral District Health Board (2005) Depression Service Plan. MidCentral District Health Board Alternative projections of mortality and disability by cause 1990-2020: global burden of disease study Prevalence of mental disorders and mental health service use in Japan Prevalence, treatment, and the correlates of common mental disorders in the mid 2010's in Japan: the results of the world mental health Japan 2nd survey Cost of depression among adults in Japan in 2005 Cost of depression among adults in Japan The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study The psychometric properties of depression screening tools in primary healthcare settings: a systematic review Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report Validation of a novel online depression symptom severity rating scale: the R8 Depression Measuring Early Treatment Response and Accelerating Treatment Optimization in the Treatment of Depression: Naturalistic Outcomes from OptiMA1 In: EPA European Congress of Psychiatry Artificial intelligence for mental health and mental illnesses: an overview Deep learning in mental health outcome research: a scoping review Research domain criteria (RDoC): toward a new classification framework for research on mental disorders Moving from static to dynamic models of the onset of mental disorder: a review Analyzing depression tendency of web posts using an event-driven depression tendency warning model Predicting depression levels using social media posts Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence Predicting the naturalistic course of depression from a wide range of clinical, Neural Computing and Applications psychological, and biological data: a machine learning approach Mobile sensing and support for people with depression: a pilot trial in the wild Cross-trial prediction of treatment outcome in depression: a machine learning approach Artificial intelligence approach to classify unipolar and bipolar depressive disorders A neuro-fuzzy approach for the diagnosis of depression Predicting differential diagnosis between bipolar and unipolar depression with multiple kernel learning on multimodal structural neuroimaging Using machine learning-based analysis for behavioral differentiation between anxiety and depression Improving diagnosis of depression with XGBOOST machine learning model and a large biomarkers Dutch dataset (n = 11,081) Machine learning in major depression: from classification to treatment outcome prediction Machine learning approaches for integrating clinical and imaging features in latelife depression classification and response prediction Identifying current and remitted major depressive disorder with the Hurst exponent: a comparative study on two automated anatomical labeling atlases Prediction of clinical depression scores and detection of changes in wholebrain using resting-state functional MRI data with partial least squares regression Whole-brain resting-state functional connectivity identified major depressive disorder: a multivariate pattern analysis in two independent samples Depression disorder classification of fMRI data using sparse low-rank functional brain network and graph-based features Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder Diagnostic classification of unipolar depression based on resting-state functional connectivity MRI: effects of generalization to a diverse sample Multivariate pattern analysis strategies in detection of remitted major depressive disorder using resting state functional connectivity Accuracy of automated classification of major depressive disorder as a function of symptom severity Support vector machine classification of major depressive disorder using diffusion-weighted neuroimaging and graph theory. Front Psychiatry 6:21 Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression Structural MRI-Based predictions in patients with treatment-refractory depression (TRD) Toward probabilistic diagnosis and understanding of depression based on functional MRI data analysis with logistic group LASSO Sparse network-based models for patient classification using fMRI Aberrant functional connectivity for diagnosis of major depressive disorder: a discriminant analysis Unsupervised classification of major depression using functional connectivity MRI Pattern classification of valence in depression Identifying major depressive disorder using Hurst exponent of resting-state brain networks Multicentre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder Patient classification as an outlier detection problem: an application of the one-class support vector machine Predicting the naturalistic course of major depressive disorder using clinical and multimodal neuroimaging information: a multivariate pattern recognition study Towards person-centered neuroimaging markers for resilience and vulnerability in bipolar disorder Discriminating bipolar disorder from major depression based on kernel SVM using functional independent components Discriminating bipolar disorder from major depression based on SVM-FoBa: efficient feature selection with multimodal brain imaging data Pattern recognition of magnetic resonance imaging-based gray matter volume measurements classifies bipolar disorder and major depressive disorder Abnormal segments of right uncinate fasciculus and left anterior thalamic radiation in major and bipolar depression Clinical utility of a short resting-state MRI scan in differentiating bipolar from unipolar depression Co-altered functional networks and brain structure in unmedicated patients with bipolar and major depressive disorders Differential abnormal pattern of anterior cingulate gyrus activation in unipolar and bipolar depression: an fMRI and pattern classification approach Distinguishing medication-free subjects with unipolar disorder from subjects with bipolar disorder: state matters. Bipolar Disord Subcortical volumes differentiate major depressive disorder, bipolar disorder, and remitted major depressive disorder Distinguishing bipolar and major depressive disorders by brain structural morphometry: a pilot study Brain morphometric biomarkers distinguishing unipolar and bipolar depression: a voxel-based morphometry-pattern classification approach Neuroanatomical classification in a population-based sample of psychotic major depression and bipolar I disorder with 1 year of diagnostic stability Amygdala excitability to subliminally presented emotional faces distinguishes unipolar and bipolar depression: an fMRI and pattern classification study Identifying major depression using whole-brain functional connectivity: a multivariate pattern analysis Separating generalized anxiety disorder from major depression using clinical, hormonal, and structural MRI data: a multimodal machine learning study Individualized differential diagnosis of schizophrenia and mood disorders using neuroanatomical biomarkers Convergent and divergent functional connectivity patterns in schizophrenia and depression Abnormal brain activation during directed forgetting of negative memory in depressed patients Failure of hippocampal deactivation during loss events in treatmentresistant depression Resting-state functional connectivity abnormalities in first-onset unmedicated depression Pattern classification of brain activation during emotional processing in subclinical depression: psychosis proneness as potential confounding factor Alterations in regional homogeneity of spontaneous brain activity in late-life subthreshold depression Discriminating unipolar and bipolar depression by means of fMRI and pattern classification: a pilot study Increased cortical-limbic anatomical network connectivity in major depression revealed by diffusion tensor imaging Changes in community structure of resting state functional connectivity in unipolar depression Self-blame-selective hyperconnectivity between anterior temporal and subgenual cortices and prediction of recurrent depressive episodes Classification of different therapeutic responses of major depressive disorder with multivariate pattern analysis method based on structural MR scans Magnetic resonance imaging measures of brain structure to predict antidepressant treatment outcome in major depressive disorder Amygdala reactivity to emotional faces in the prediction of general and medication-specific responses to antidepressant treatment in the randomized iSPOT-D trial Diffusion tensor imaging predictors of treatment outcomes in major depressive disorder Prognostic prediction of therapeutic response in depression using high-field MR imaging Neural correlates of sad faces predict clinical remission to cognitive behavioural therapy in depression Resting-state connectivity biomarkers define neurophysiological subtypes of depression Diagnostic potential of structural neuroimaging for depression from a multi-ethnic community sample Cortical thickness predicts the first onset of major depression in adolescence SCoRS-A Method Based on Stability for Feature Selection and Mapping in Neuroimaging Integrating neurobiological markers of depression Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression Prognostic and diagnostic potential of the structural neuroanatomy of depression Pattern classification of sad facial processing: toward the development of neurobiological markers in depression A functional MRI marker may predict the outcome of electroconvulsive therapy in severe and treatment-resistant depression SMRI biomarkers predict electroconvulsive treatment outcomes: accuracy with independent data sets Prediction of individual response to electroconvulsive therapy via machine learning on structural magnetic resonance imaging data Individualized prediction of three-and six-year outcomes of psychosis in a longitudinal multicenter study: a machine learning approach Optimizing prediction of response to antidepressant medications using machine learning and integrated genetic, clinical, and demographic data Unrecognised bipolar disorder in primary care patients with depression A brief measure for assessing generalized anxiety disorder: the GAD-7 Attention is All you Need Long short-term memory Implementation of an innovative nurse led service to support treatment for depression in primary care (OptiMA2) Major depressive disorder treatment guidelines in Japan The 2020 Royal Australian and New Zealand college of psychiatrists clinical practice guidelines for mood disorders WaveNet: a generative model for raw audio Barriers and facilitators to using a web-based tool for diagnosis and monitoring of patients with depression: a qualitative study among Danish general practitioners Psychological treatment of depression in primary care: recent developments Mental illness in general health care: an international study Psychiatric morbidity, service use, and need for care in the general population: results of The Netherlands mental health survey and incidence study Psychological interventions for major depression in primary care: a meta-analytic review of randomized controlled trials Telemedicine for depression: a systematic review Use of the internet to assist in the treatment of depression and anxiety: a systematic review Internet use and stigmatized illness Acknowledgements The authors would like to thank Dr. Andrew Kissane and Dr. Richard Tranter for allowing the use of the Psynary data, their ongoing support and professional input into the research work. Aidan Cousins is a 4th year medical student with University of New South Wales undertaking his Honours research project.Consent to participate Written consent was obtained from all participants for use of de-identified data from Psynary for the purpose of analysing the naturalistic clinical outcomes.Consent for publication All authors approve the final version of the manuscript and this submission for possible publication in Neural Computing and Applications. Author's contribution Aidan Cousins and Lucas Nakano were involved in study design, implementation and analysis and drafting of the manuscript. Rasa Kabaila and Emma Schofield were involved in study design and expert review.Funding This research did not receive any specific grant from funding agencies in the public, commercial, private or not-for-profit sectors.Code availability For review purposes, the code used is available on application to Dr Andrew Kissane. Conflict of interests The authors declare that they have no conflicts of interest.Ethics approval This research was approved by the clinical research ethics committee of University of Otago (New Zealand, approval #: H16/014), Asai Hifuka Institutional Review Board (Japan, approval #: 承認番号: 20170724-1) and the North Coast NSW Human Research Ethics Committee (Australia, approval #:HREA271 2019/ ETH13489).