key: cord-1041886-6ao48hh1 authors: Roberts, Anna E.; Davenport, Tracey A.; Wong, Toby; Moon, Hyei-Won; Hickie, Ian B.; LaMonica, Haley M. title: Evaluating the quality and safety of health-related apps and e-tools: Adapting the Mobile App Rating Scale and developing a quality assurance protocol date: 2021-03-17 journal: Internet Interv DOI: 10.1016/j.invent.2021.100379 sha: 2c6f337f427316212bc7f6eaee775fe1429902dc doc_id: 1041886 cord_uid: 6ao48hh1 BACKGROUND: Whilst apps and e-tools have tremendous potential as low-cost, scalable mental health intervention and prevention tools, it is essential that consumers and health professionals have a means by which to evaluate their quality and safety. OBJECTIVE: This study aimed to: 1) adapt the original Mobile App Rating Scale (MARS) in order to be appropriate for the evaluation of both mobile phone applications as well as e-tools; 2) test the reliability of the revised scale; and 3) develop a quality assurance protocol for identifying and rating new apps and e-tools to determine appropriateness for use in clinical practice. METHODS: The MARS was adapted to include items specific to health-related apps and e-tools, such as the availability of resources, strategies for self-management, and quality information. The 41 apps and e-tools in the standard youth configuration of the InnoWell Platform, a digital tool designed to support or enhance mental health service delivery, were independently rated by two expert raters using the A-MARS. Cronbach's alpha was used to calculate the internal consistency and interclass correlation coefficients were used to calculate interrater reliability. RESULTS: The A-MARS was shown to be a reliable scale with acceptable to excellent internal consistency and moderate to excellent interrater reliability across the subscales. Given the ever-increasing number of health information technologies on the market, a protocol to identify and rate new apps and e-tools for potential clinical use is presented. CONCLUSIONS: Whilst the A-MARS is a useful tool to guide health professionals as they explore available apps and e-tools for potential clinical use, the training, time, and skill required to use it effectively may be prohibitive. As such, health professionals and services are likely to benefit from including a digital navigator as part of the care team to assist in selecting and rating apps and e-tools, increasing the usability of the data, and technology troubleshooting. When selecting, evaluating and/or recommending apps and e-tools to consumers, it is important to consider: 1) the availability of explicit strategies to set, monitor and review SMART goals; 2) the accessibility of credible, user friendly information and resources from reputable sources; 3) evidence of effectiveness; and 4) interoperability with other health information technologies. With traditional in-clinic and online mental health care services in high demand, there is increasing evidence that health information technologies (HITs) will play a vital role in health care delivery (O'Connor et al., 2016) . Furthermore, the disruption caused by the COVID-19 global pandemic has resulted in a greater need for and reliance on digital health models of care for screening, treatment and ongoing maintenance of health (Wind et al., 2020) . To that end, health and wellbeing apps and e-tools (e.g. websites, web-based courses) have enormous potential for empowering self-management of chronic conditions (Yardley et al., 2016) . Additionally, they offer an alternative for those who prefer or are required to use information and communications technologies (e.g. during the COVID-19 pandemic), those with geographical or physical constraints (Burns et al., 2010; Rowe et al., 2020) , or those who may lack awareness of available services (Burns and Rapee, 2006) . Specific to mental health and wellbeing, apps and e-tools have the potential to provide low-cost intervention and prevention tools that are designed specifically for mental health disorders, such as anxiety, depression and problematic health behaviours (e.g. alcohol, gambling and smoking). The use of self-directed apps and e-tools for the purposes of symptom monitoring and management may be sufficient for individuals with lower levels of clinical need (i.e. prevention, early intervention, ongoing symptom maintenance), thus improving availability of service-based care for those who need it the most and reducing overall burden on the mental health system (Burns et al., 2014) . In 2019, there were over 204 billion apps downloaded, reflecting an increase of approximately 5% compared to 2018 (App Annie, 2020) . A similar pattern of growth is evident for health-related apps, with 318,000 apps available as of March 2020 and an additional 200 being added to the market daily (IQVIA, 2017) . Given the constant development of new Web-based content, the quantification of e-tools is not possible. The clinical utility of apps and e-tools has great potential. For example, evidence shows that Headspace, a mindfulness app, is associated with improvements in several aspects of psychosocial wellbeing, including irritability, affect and stress (Economides et al., 2018) . The Mind Spot Clinic, has been shown to be an effective e-tool, delivering health professional and technician-assisted, Web-based cognitive behavioural therapy (iCBT) programs that have resulted in clinically significant improvements as measured on self-report symptom scales for individuals with depression (Titov et al., 2010) , anxiety disorders (Newby et al., 2020) and social phobia (Titov et al., 2009 ). However, due to the unregulated free market that exists within the digital landscape, apps and e-tools are often of uncertain quality and efficacy (Byambasuren et al., 2018) . Beyond the star ratings presented in commercial app stores, there is little information about the quality and accuracy of apps and for e-tools there are no such ratings available. Whilst star ratings may be useful as an indicator of user satisfaction and sustained engagement, they do not necessarily equate to the safety and quality of an app. This is corroborated by Singh et al. (2016) , who reported that star ratings correlated poorly with clinical utility and usability of health-related apps. Similarly, it has been found that the number of app installs and active minutes of use is not associated with long term usage of more popular apps (Baumel et al., 2019) . Additionally, Australians, for example, are becoming less trusting of app stores and technology companies for their recommendations and have a desire for a credible and regulated rating system for health-related apps (Consumer Health Forum of Australia, 2018) . Research from other countries demonstrates that this issue is global. For example, on their respective national health system websites, the United Kingdom (NHS Innovations South East, 2014) and France (Haute Autorité de Santé, 2016) are providing more information outside of commercial app stores for individuals regarding safe health-related app choice. As the uptake of apps and e-tools increases, both by individuals for the purposes of self-management and by health professionals as a means to complement clinical services, it is essential for all potential users to have access to measures by which they can evaluate quality and safety. Failing to evaluate the accuracy and appropriateness of health-related apps and e-tools could compromise user health and safety (Lewis and Wyatt, 2014) . Several studies have highlighted inaccuracies in and a lack of evidence-base for health-related apps. For example, apps designed to help with opioid conversion calculations (Haffey et al., 2013) or melanoma detection (Wolf et al., 2013) do not consistently follow evidence-based guidelines and may provide inaccurate information with potential hazardous repercussions (e.g. drug overdose, incorrect diagnosis). Specific to apps supporting mental health and wellbeing, a recent review found that apps for bipolar disorder were cost effective and convenient, but the majority fail to provide information on all core psychoeducation principles and do not adhere to best practice guidelines (Nicholas et al., 2015) . Similarly, a review of suicide prevention apps found that many were not supported by an evidence-base and, perhaps more strikingly, identified some apps as being more harmful than helpful (Larsen et al., 2016) . In relation to the latter, Larsen et al. (2016) found some apps to include potentially extremely damaging content describing or facilitating access to lethal means, providing encouragement to people to end their life and portraying suicide in a fashionable manner. Given the potential risk associated with the use of health-related apps, there is growing interest in evaluating the quality and safety of these digital tools. The Mental Health Commission of Canada (MHCC) and the American Psychological Association (APA) have introduced health-related app assessment frameworks. The MHCC framework was uniquely developed for the Canadian context and includes criteria specific to the available evidence-base, gender responsiveness and cultural appropriateness of apps; however, does not assign ratings to these criteria (MHCC, 2018) . Additionally, the MHCC is yet to develop an empirical assessment tool by which to assess the criteria. The APA's framework is a step-based model designed to inform decision making about health-related apps (APA, 2020). More specifically, the framework guides users through a hierarchal review of four key features: 1) safety and privacy; 2) evidence and benefit; 3) user engagement and; 4) interoperability. Only if a criterion is satisfied, should the user move on to the next step in the evaluation framework. Whilst a useful rubric to assist users in choosing a high quality and safe app that suits their individual needs and preferences, it does not provide an explicit rating and relies on the individual to apply the logic for each app under consideration. In 2018, Nouri and colleagues conducted a review of criteria for assessing the quality of health-related apps. Over 23 evaluation scales were identified, of which ten were developed for general purposes with no specific subject category. The authors consolidated the evaluation criteria into seven categories: design, information/content, usability, ethical issues, security and privacy, and user-perceived value. The scales included in this review varied in what criteria they used. For example, Scott et al. (2015) focussed solely on security and safety measures, whereas others solely looked at app usability (Zapata et al., 2015; Schnall et al., 2016; Brown 3rd et al., 2013) . Included in this review is the Mobile App Rating Scale (MARS) introduced by Stoyanov et al. (2015) . The MARS is a reliable, simple, multidimensional scale that requires little training to be implemented. The 23-item scale has four subscales each with multiple items: engagement (5 items), functionality (4 items), aesthetics (3 items), and information quality (7 items); and one subjective quality scale (4 items). Each feature is rated on a scale from 1 ("inadequate") to 5 ("excellent"), with more specific descriptors for the response options for each question. Upon completion, seven scores are calculated, including the mean score for each subscale, a total mean score, a mean subjective quality score, and an app-specific subscale that assesses perceived impact of an app on the user's knowledge, attitudes, intensions to change and likelihood of change specific health behaviours. The MARS has been used globally, including for the evaluation of apps to support symptom monitoring and self-care management in diverse fields of medicine, such as cardiology (Creber et al., 2016) , rheumatology (Knitza et al., 2019) and obstetrics (Tency et al., 2019) . It has also been adapted into different languages including German and Spanish (Payo et al., 2019) . At the present time, the MARS is one of the most widely used and internationally recognised app rating tools; however, it remains limited in its utility as it has only been designed to assess apps with modifications are needed in order to inform the quality rating of e-tools (Stoyanov et al., 2015) . Furthermore, Stoyanov et al. (2015) highlight that research is needed to evaluate the safety of health-related apps specifically, both in terms of accuracy of information and privacy and security of user information (Lewis and Wyatt, 2014) . Therefore, in order to evaluate not only mobile phone applications but e-tools as well, the objective of this study was to adapt the MARS to consider features and functionality that are of particular importance for the quality and safety of health-related apps and e-tools, henceforth referred to as the Adapted MARS (A-MARS), and then to test the reliability of the revised scale. Finally, this paper presents a quality assurance protocol for identifying and rating new apps and e-tools to determine appropriateness for potential use in clinical practice. As described above, the original MARS was adapted to be appropriate for health-related apps and e-tools. As such, all questions and responses were reworded to refer to both apps and e-tools (i.e. "Do you feel engaged enough to complete the e-tool program or use the app on multiple occasions?"). The 'Engagement' section of the original MARS was also expanded specifically for e-tools, taking into account e-tool program completion, return use and engagement in strategies from the program. In relation to more specific changes, 'Entertainment (Q1)' was relabelled as 'Engagement' in order to better capture user engagement, including the likelihood of completing the e-tool program or using the app repeatedly, noting that health-related apps and e-tools are not necessarily designed to be fun or entertaining. The description of 'Customisation (Q3)' was broadened to assess whether customising the app or e-tool improves the ease of use. Modifications also included the evaluation of 'Interoperability (Q4)' or the ability to exchange data between other apps, e-tools, or wearables. 'Performance (Q6)' was expanded to specifically enquire about program errors or glitches experienced by users. Details related to the login process, the utility of the help function, and frequently asked questions were added to 'Ease of Use (Q7)'.'Gestural design (Q9)' was relabelled as 'Design' in order to better capture design elements of both apps and e-tools, such as popup windows and flash images, as well as to assess the consistency of the theme throughout the tool. The 'Accuracy' question from the original MARS which assessed the accuracy of the description in the app store was removed as it was deemed irrelevant for this tool. A new section was also added to the A-MARS with questions of particular relevance for health-related apps and e-tools, including 'Additional resources available (Q23)' which evaluates whether the app or e-tool provides current and relevant resources. 'Strategies (Q24)' was added to determine if the app or e-tool recommends strategies linked to the target area of concern, and 'Solutions Q25)' was added to assess if the app or e-tool provided one or more solutions to address the identified symptom(s). To evaluate the scope of the app or e-tool, 'Multiple health issues/symptoms (Q26)' was included to determine how many symptoms or health issues are addressed. The ability to use the app or e-tool in real time (i.e. real-time data tracking) was included as 'Real time tracking (Q27)'. 'Access to help (Q28)' was also added to assess the ease with which help or support can be accessed via the app or e-tool. Finally, a 'Not applicable' option was included for all items included in the health-related subscale. No other substantive changes were made to the remainder of the questions from the original MARS. As with the original MARS, each feature in the A-MARS is rated on a scale from 1 ("inadequate") to 5 ("excellent"), with more specific descriptors for the response options for each question. Upon completion, eight scores are calculated, including the mean score for each subscale (i.e. engagement, functionality, aesthetics, information, subjective quality, health-related quality), a mean quality score based on the engagement, functionality, aesthetics and information subscales, and a mean total score. The A-MARS is provided as Appendix A. Apps and e-tools to be rated were chosen based on their inclusion in the youth configuration of the InnoWell Platform (Hickie et al., 2019a) . As described by LaMonica et al. (2019) , the InnoWell Platform is a codesigned digital tool that is embedded within traditional in-clinic and Web-based mental health services. The InnoWell Platform was developed through Project Synergy (a $30 M Australian-Government funded initiative delivered by InnoWell Pty Ltd.; a joint venture between the University of Sydney and PwC [Australia]; Hickie et al., 2019b; Inno-Well, 2018) to collect, store, score and report clinical data back to a consumer and their health professional. Within the InnoWell Platform, a range of different care options exist, commonly known as interventions, to help the participant manage areas of health (i.e. psychological distress, sleep, physical activity). Care options are divided into two types: clinical and non-clinical. Clinical care options require a health professional's involvement, such as individual therapy and group therapy. In contrast, a participant can immediately access and begin using non-clinical care options (see Fig. 1 ), such as apps and e-tools, without the support of a health professional . During the co-design process, care options are tailored to the consumer population, in this case young people receiving care through primary youth mental health services. There were 41 apps and e-tools included in the youth configuration of the InnoWell Platform at the time of writing this paper. Through participatory design workshops as well as word of mouth (Hickie et al., 2019a) , these apps and e-tools are iteratively suggested by young people and their supportive others and/or recommended by health professionals for use in clinical practice. As such, the apps and e-tools included within the InnoWell Platform are continuously reviewed and updated to ensure quality and safety. All apps and e-tools were rated using the A-MARS. There were four expert raters: 1) a senior research fellow with a PhD in Clinical Psychology and three years' experience in the design and evaluation of HITs; 2) a senior research assistant with a Master's degrees in Exercise Physiology and Brain and Mind Sciences and two years' experience in the design and evaluation of HITs; a 3) research affiliate with a Bachelor's Degree in Psychology with two years' experience working in mental health support services in a university accommodation setting, focused on engagement with culturally and linguistically diverse individuals; and 4) a research affiliate with a Bachelor's degree in Psychology and experience in consulting for a nonprofit organisation specialised in providing technological support for people with disabilities. The first four apps and e-tools were used for training purposes. After independently rating the apps and e-tools, the raters met to compare and review the results of the pilot test and to resolve discrepancies in ratings. To reach consensus, all raters reviewed the scale in depth in order to improve the alignment of app ratings. Additionally, the meaning, purpose and descriptors of goals, quality of information, quantity of information, credibility of source, and evidence base were reviewed in detail to address disagreement between raters on these items. The remaining 37 apps and e-tools from the InnoWell Platform standard youth configuration were then independently rated by two raters. Using methodology previously described by Stoyanov, each rater trialled the apps and etools for a minimum of 10 min and then independently rated their quality using the revised A-MARS. Cronbach's alpha was used to calculate the internal consistency of the A-MARS including the mean scores for the engagement, functionality, aesthetics and information subscales and the mean quality score reflecting the average of these subscales, the mean subjective quality and health-related quality scores, and finally the mean total score. These scores reflect the internal consistency of the scale or the degree to which the questions are measuring the same construct. Alphas were interpreted as excellent (≥ 0.90), good (0.80-0.89), acceptable (0.70-0.79), questionable (0.60-0.69), poor (0.50-0.59) and unacceptable (<0.50) (George and Mallery, 2003) . Interrater reliability of the A-MARS subscales, the mean quality score and the mean total score was evaluated using the intraclass correlation coefficient (ICC). This descriptive statistic evaluates the similarity between ratings. A two-way mixed effects, average measures model with absolute agreement was used. ICCs were interpreted as excellent (≥ 0.90), good (0.76-0.89), moderate (0.51-0.75) and poor (≤0.50) (Portney and Watkins, 2009 ). All analyses were conducted using IBM SPSS (Version 26). Independent A-MARS ratings on the total score for the 37 apps and etools showed the scale to have an excellent level of internal consistency (Cronbach α = 0.938) and interrater reliability (2-way mixed ICC = 0.920, 95% CI 0.797-0.987). Similarly, the independent A-MARS ratings showed the mean quality score, excluding the subjective quality and health-related quality subscales, to have an excellent level of internal consistency (Cronbach α = 0.908) and good interrater reliability (2-way mixed ICC = 0.895, 95% CI 0.731-0.983). Internal consistencies of the A-MARS subscales were also high, ranging from acceptable to excellent (Cronbach α = 0.721-0.920, median = 0.824), and their interrater reliabilities were moderate to excellent (ICC = 0.687-0.910 median = 0.711), with the engagement and information subscales having the highest and lowest interrater reliability, respectively. Examination of the corrected item-total correlations indicate that items 15 (quantity of information, r = − 0.163), 16 (visual information, r = 0.240), and 17 (credibility of source, r = 0.225) did not correlate well with the overall information subscale; however, removal of these items did not markedly improve the reliability of the subscale (Cronbach Similarly, item 27 (real-time tracking, r = 0.198) was also noted to have a weak correlation with the health-related subscale; however, again, removal of the item did not notably impact the reliability (Cronbach α if item 16 deleted = 0.796). Detailed item and subscale statistics are presented in Table 1 . Additionally, a full list of the apps and e-tools rated as part of this study, including their mean scores on the A-MARS, is provided in Appendix B. Whilst our project relied on information about apps and e-tools collected via participatory design workshops, a more real-world approach is likely to be appropriate for most health professionals and services. This approach should include: 1) a broad exploration for appropriate apps and e-tools; 2) shortlisting of apps and e-tools based off the consumer, health professional or service needs; 3) evaluation using the A-MARS; and 4) review of A-MARS scores relative to established service-specific criteria (e.g. minimum A-MARS total score, requirement of a University or Government-based source) to determine appropriateness for recommendation. This approach can be seen in more detail in Fig. 2. The field of HITs is evolving rapidly in order to meet the needs of consumers and health professionals as well as health services and systems of care, with the aim of driving more efficient use of resources, promoting coordination rather than fragmentation of care, and facilitating information sharing to improve shared and informed decision making. Based on a recent community consultation process conducted by the Australian Digital Health Agency, 45% of 3100 participants noted that they had difficulty accessing health care due to cost, travel distance, or the unavailability of appointments (Australian Digital Health Agency, 2018). Due to such limitations in the current health care system, consumers and health professionals alike are turning to HITs, including apps and e-tools, and for good reason. A recent meta-analysis found that apps were superior to control conditions in improving stress levels and quality of life as well as depressive and generalised anxiety symptoms, with no marked difference relative to active interventions, including inclinic treatment (Linardon et al., 2019) . Furthermore, our group has found that technology is an important tool for mental health promotion and prevention activities for young people (Burns et al., 2010) . Whilst apps and e-tools hold great promise as a way of delivering selfmanagement strategies and clinical interventions for mental ill health and maintenance of wellbeing, it is essential that consumers and health professionals have the appropriate tools by which to evaluate the quality and safety of such technologies. This study sought to adapt the MARS to be appropriate for e-tools as well as to include items specific to health-related apps and e-tools, including details related to the availability of resources, strategies for self-management of mental ill health and wellbeing, and quality information as well as contact details to access help. Furthermore, item 4 (interactivity) was expanded to include details related to the interoperability of apps and e-tools as part of the rating. Consistent with the original validation study of the MARS, our analyses found the A-MARS total score to have excellent internal consistency and interrater reliability (Stoyanov et al., 2015) . Furthermore, the median internal consistency of all subscales was good, aligning with previous studies (Domnich et al., 2016; Stoyanov et al., 2015) . Our results confirm that the A-MARS is a reliable measure of apps and e-tools, including those designed specifically for health-related purposes, and is suitable for use by any relevant stakeholder, including health professionals and developers of HITs. Interrater reliability ranged from moderate to excellent across the subscales. Whilst these levels are acceptable and consistent with those of the original validation study (Stoyanov et al., 2015) , it is recommended that all raters attend a training session with an expert rater to thoroughly review the response options for each item, clarifying any ambiguities. The A-MARS should be pilot tested on three to five apps and e-tools, then reviewed until an appropriate level of interrater reliability or consensus is reached. When conducting the ratings, all apps or e-tools should be used for a suitable period so as to gain a complete understanding of the informational content, functionalities and features; a minimum of 10 min of use is recommended. As referenced above, as a reliable tool, the A-MARS is appropriate for use by relevant stakeholders, which is important as consumers, health professionals and services become increasingly reliant on HITs to deliver, support, or enhance care. Individual items and subscales may be particularly relevant when determining whether an app or e-tool is effective, engaging and appropriate for prospective consumer use. Our data demonstrated that the majority (62.3%, 23/37) of healthrelated apps and e-tools rated as part of this study implicitly addressed goal setting by providing opportunities for the user to track outcomes through regular assessment coupled with practical strategies to improve aspects of health and wellbeing. However, five (13.5%) apps and e-tools did not provide an opportunity for the consumer to establish tangible goals, consistently indicating that this criterion was 'not applicable.' For an additional nine (24.2%) apps and e-tools, there was disagreement between raters as to whether consumers were able to set goals, indicating that the process may not be readily apparent and, in turn, unlikely to be used or effective in driving behavior change. Whilst technology is commonly used to support health and wellbeing goals, apps in particular often fail to address key components of SMART (specific, measurable, achievable, relevant, time-based) goal setting. For example, a recent review and content analysis found that 95% (38/40) of a selection of physical activity apps had functionality for setting specific and measurable goals, but lacked the other features of SMART goals and did not allow for the re-evaluation goals, all of which are considered key to an effective goal setting strategy (Baretta et al., 2019) . Importantly, the process behind planning and setting customisable goals within an app or e-tool may contribute to greater sustained engagement as well as more robust clinical outcomes. These findings align with previous research supporting the use of consumer-centred and goaldirected design approaches (Vaghefi and Tulu, 2019; Williams, 2009 ). Additionally, they are consistent with the outcomes from traditional inclinic services where goal setting has been shown to be associated with higher engagement with the service whereas the absence of goals was correlated with service disengagement (Cairns et al., 2019) . More robust research is now required to determine if SMART goals can be effectively set, monitored and achieved using HITs. Health care quality improvement projects have highlighted factors Table 1 Interrater reliability and internal consistency of the A-MARS items and subscale scores, and corrected item-total correlations and descriptive statistics of items, based on independent ratings. that are most important to consumers in relation to their experience of care, including access to up to date, user friendly information and resources from reputable sources. For example, the National College of Physicians (UK) (2012) recommends that health professionals provide consumers with information in an accessible format, taking into account their preferences for the level and type of information, and advise them as to where to find additional, reliable, and credible information to support their care. Furthermore, the results of a consumer survey undertaken at four Australian hospitals indicated that the provision of high-quality information at both the point of admission and discharge was valued by consumers (Rapport et al., 2019) . From our group's participatory design work, we know that the determination of the quality of available health information is directly tied to the credibility of the source, with health services and Universities being viewed as reputable (LaMonica et al., 2021) . Importantly, providing details about such trustworthy sources is likely to increase uptake of and engagement with HITs and impact on health-related decision making. Whilst quality information is an important part of a positive consumer experience, our results highlight considerable variability in the availability of quality information, ranging from 'not applicable: no available information ' and '2: poor or barely relevant, appropriate, coherent, or incorrect' through to '5: highly relevant, appropriate, coherent, and correct.' Though the majority (59.4%) of apps and e-tools were rated as a 4 or 5 (mean = 4.17, standard deviation = 0.88), this remains an area for improvement for some apps and e-tools as well as an important consideration in the design and development of new HITs. Furthermore, ratings of credibility of source ranged from '1: source identified but legitimacy/trustworthiness of source is questionable' to '5: developed using nationally competitive government or research funding,' with a mean rating of 3.32 (standard deviation = 1.26). As apps and e-tools are critical mediums by which to deliver information about health risks and associated risk reduction strategies as well as interventions for symptoms of mental ill health, clear documentation of the source is an essential design element so that consumers and health professionals can accurately evaluate the trustworthiness of the information. Whilst commercially developed apps and e-tools are not inherently flawed or ineffective, it is important to recognise that the credibility of the source is important to consumers as well as health professionals. As such, commercial companies may seek to partner with academic research teams with the aim of demonstrating a commitment to positive outcomes for consumers as well as to develop the evidence base by which to support their product. Despite the potential of HITs to support and maintain mental health and wellbeing, the majority continue to have no scientific evidence to support their use. Based on the A-MARS, an app or e-tool is deemed to have an emerging or established evidence-base if it has been trialled and found to be effective in one or more randomised clinical trials (RCT), the gold standard for effectiveness research (Hariton and Locascio, 2018) . Consistent with previous research (Alyami et al., 2017; Larsen et al., 2019; Sucala et al., 2017; Van Ameringen et al., 2017) , the majority of apps and e-tools rated in this study did not meet this standard (mean = 3.43, standard deviation = 0.87). In fact, 64.9% (24/37) have never been evaluated through a research trial. Of the remaining apps and etools, 21.6% (8/37) had been found to be effective in at least one RCT and 13.5% (5/37) were found to have positive or partially positive outcomes in studies of acceptability, usability and satisfaction. The dearth of evidence of the effectiveness of apps and e-tools may relate to the iterative nature of technology development. Traditional clinical science approaches to the development and implementation of interventions rely on a linear approach, including basic science, intervention creation or adaptation, efficacy testing in both research and clinical settings, effectiveness research in community settings, and finally dissemination (Onken et al., 2014) . Whilst the outcomes of each step in this process are indeed valuable, this progressive staged model can result in delays of up to 17 years for research translation into clinical practice (Balas and Boren, 2000) . In light of these extended timelines as well as the standardisation requirements of RCTs, there is a high likelihood that an app or e-tool would be obsolete by the time results were published (Kumar et al., 2013) . As such, developers may be inclined to move more quickly from a pilot study to dissemination (Kumar et al., 2013) . Consideration of a new model for the evaluation of HITs may be necessary to streamline the identification of effective apps and e-tools. Based on a recent review, depression and anxiety apps without an evidence base were viewed to be less beneficial by consumers and had lower consumer ratings compared to evidence-based apps (Baumel et al., 2020) . As such, evidence of effectiveness has the potential to Shortlist apps and e-tools Establish criteria for inclusion into service • What are important features of apps and e-tools for your consumers/service • Credibility of source (University or Government-based) • Minimum cut-score (e.g. MARS total score or mean quality score) A.E. Roberts et al. promote uptake and engagement, thus leading to enhanced outcomes. Interoperability is the ability of HITs to exchange information with and use information from other technologies, such as apps and wearables. Interoperability is considered a fundamental requirement of HIT innovation (Lehne et al., 2019) , underpinning the potential of artificial intelligence and big data analytics to improve diagnostic precision, personalised interventions and disease prevention (Insel, 2017) . Furthermore, the exchange of information between electronic medical records and data from personal health apps and e-tools has the potential both to reduce documentation burden for health professionals, allowing them to spend more time focusing on care, as well as to empower and inform consumers so they can actively manage their own health and wellbeing (Lehne et al., 2019) . In other words, interoperability can enhance data-driven care, including better monitoring of health and wellbeing, delivery of effective and personalised clinical care, and personalised feedback to consumers (Burns et al., 2014) . There was considerable variability in the interactivity/interoperability ratings of the apps and e-tools in this study, ranging from 1 to 5 (see Appendix A for a full description of the ratings), with a mean score of 3.36 (standard deviation = 1.06). Evidence from focus groups indicates that consumers support data sharing both between health professionals as well as with consumers to facilitate care, noting the importance of data privacy and security (Pew Charitable Trusts, 2020; LaMonica et al., 2021) . Increasingly, consumers are being provided with access to their own data through personal health records; however, further consumer-driven integration between health systems and services as well as with HITs is now required to realise the full potential of interoperability, including improved transparency, efficiency and coordination of data-driven care. Despite recognising the potential benefits of using technology as part of their work, health professionals often do not have the time or appropriate resources to explore available apps and e-tools to determine their appropriateness for their consumer base (LaMonica et al., 2020). The A-MARS offers a streamlined solution to guide health professionals in the evaluation of HITs, keeping their own clinical context in mind. Whilst some health professionals may see this as an opportunity to upskill and develop digital health literacy and competence, others simply may not have the time or skill to do so. Given the complexities of evaluating such tools as well as the time required to keep up to date with what is available, including technology requirements and clinical utility of data, it may not be practical for this responsibility to fall to health professionals. Additionally, the reliability of the A-MARS as used independently by health professionals is yet to be investigated in order to determine what training requirements or materials might be required. Heath services are encouraged to consider a digital navigator as an integral team member, serving to bridge the gap between HITs and inclinic care (Wisniewski and Torous, 2020) . The digital navigator would review and rate apps and e-tools ensuring only those that are safe and effective are recommended, help with technology troubleshooting for consumers and health professionals, and summarize digital data to facilitate the delivery of clinical care (Wisniewski and Torous, 2020) . The digital navigator could also provide training, guidance and instruction for health professionals who are interested in developing their own skills in evaluating apps and e-tools using tools such as the A-MARS. The integration of a digital navigator within a traditional care team will serve to increase confidence and trust in the use of HITs by both health professionals and consumers, thus promoting engagement. This may, in turn, have broader implications for promoting the uptake of selfmanagement strategies and decreasing burden on the health system in general, which is particularly relevant given the increased reliance on HITs. The A-MARS was shown to be a reliable scale for the purposes of evaluating the quality of health-related apps and e-tools, with moderate to excellent interrater reliability across the subscales. Specific items and subscales may be particularly important to consider when selecting, evaluating and/or recommending apps and e-tools to consumers, including: 1) the availability of explicit strategies to set, monitor and review SMART goals; 2) the accessibility of credible, user friendly information and resources from reputable sources; 3) documentation of evidence of effectiveness; and 4) interoperability of the app or e-tool with other HITs, including personal health records and electronic medical records. Although the A-MARS is a useful tool to guide health professionals as they explore available apps and e-tools for potential clinical use, the training required to be able to use the scale effectively may be prohibitive. Additionally, health professionals may not have the time or skill set to engage in the evaluation process. The inclusion of a digital navigator as part of the care team may mitigate this barrier to identifying and using HITs in clinical practice; however, further research is required to evaluate the impact of this role on the uptake of and engagement with HITs by consumers and health professionals as well as the associated clinical outcomes. Additionally, it will be important to further evaluate how the A-MARS scores impact the selection of apps and e-tools by the digital navigator and how these scores are associated with systematically measured consumer feedback (as opposed to star ratings). It will also be important to evaluate the cost-effectiveness and return on investment of this new team member. Finally, utilising strategies to enhance community and consumer uptake of and sustained engagement with HITs, such as apps and e-tools, is a priority in the health, medical and research sectors internationally (Australian Government Department of Health and Ageing, 2012; UK NHS, 2014) . To that end, co-design methodologies, including participatory design and user testing, are widely recognised as key to ensuring the quality, usability, and acceptability of HITs. It is likely that the A-MARS can inform this co-design process, by highlighting key areas to be explored with potential end users both in the co-creation and testing phase of product development. This research was conducted on behalf of The Australian Government Department of Health (DOH) as part of Project Synergy . InnoWell has been formed by the University of Sydney and PwC (Australia) to deliver the $30 M Australian Government-funded Project Synergy. Professor Ian Hickie was an inaugural Commissioner on Australia's National Mental Health Commission (2012-18) . He is the Co-Director, Health and Policy at the Brain and Mind Centre (BMC) University of Sydney. The BMC operates an early-intervention youth service at Camperdown under contract to headspace. He is the Chief Scientific Advisor to, and a 5% equity shareholder in, InnoWell Pty Ltd. InnoWell was formed by the University of Sydney (45% equity) and PwC (Australia; 45% equity) to deliver the $30 M Australian Governmentfunded Project Synergy (2017-20; a three-year program for the transformation of mental health services) and to lead transformation of mental health services internationally through the use of innovative technologies. Tracey Davenport is now Director (Research and Evaluation), Design and Strategy Division, Australian Digital Health Agency. The other authors have nothing to disclose. The source of funding does not entail any potential conflict of interest for the other members of the Project Synergy Research and Development Team. 1 No/limited instructions; menu labels, icons are confusing; complicated; sign up process is complicated with no help buttons/FAQ's 5 Comprehensive and concise; contains links to more information and resources 16. Visual information: Is visual explanation of conceptsthrough charts/graphs/images/videos, etc. -clear, logical, correct? N/A There is no visual information within the app/e-tool (e.g. it only contains audio, or text) 1 Completely unclear/confusing/wrong or necessary but missing 2 Mostly unclear/confusing/wrong 3 OK but often unclear/confusing/wrong 4 Mostly clear/logical/correct with negligible issues 5 Perfectly clear/logical/correct 17. Credibility of source: does the information within the app/e-tool seem to come from a credible source? 1 Source identified but legitimacy/trustworthiness of source is questionable (e.g. commercial business with vested interest) 2 Appears to come from a legitimate source, but it cannot be verified (e.g. has no webpage) 3 Developed by small NGO/institution (hospital/centre, etc.) /specialised commercial business, funding body 4 Developed by government, university or as above but larger in scale 5 Developed using nationally competitive government or research funding (e.g. Australian Research Council, NHMRC) 18. Evidence base: Has the app/e-tool been trialled/tested; must be verified by evidence (in published scientific literature)? N/A It has not been trialled/tested 1 The evidence suggests the app/e-tool does not work 2 App/e-tool has been trialled (e.g., acceptability, usability, satisfaction ratings) and has partially positive outcomes in studies that are not randomised controlled trials (RCTs), or there is little or no contradictory evidence. 3 App/e-tool has been trialled (e.g., acceptability, usability, satisfaction ratings) and has positive outcomes in studies that are not RCTs, and there is no contradictory evidence. 4 App/e-tool has been trialled and outcome tested in 1-2 RCTs indicating positive results 5 App/e-tool has been trialled and outcome tested in > 3 high quality RCTs indicating positive results D. Information mean score = ___________* * Exclude questions rated as "N/A" from the mean score calculation _________________________________________________________________________________ SECTION E App/e-tool subjective quality rating 19. Would you recommend this app/e-tool to people who might benefit from it? 1 Not at all -I would not recommend this app/e-tool to anyone 2 There are very few people I would recommend this app/e-tool to 3 Maybe -There are several people whom I would recommend it to 4 There are many people I would recommend this app/e-tool to 5 Definitely -I would recommend this app/e-tool to everyone 20. How many times do you think you would use this app/e-tool in the next 12 months if it was relevant to you? 3 Someaddresses some symptoms/health issues; or considers many but only partly address them 4 5 Yesconsiders multiple symptoms/health issues and related ones, and sufficiently addresses them 27. Real time tracking: Can you use the app/e-tool in real time, as you're experiencing a health issue? 1 Nothe app/e-tool is mainly useful for prevention or recovery 2 3 The app/e-tool is useful for prevention, management and/or recovery of the health issue(s) 4 5 Yesthe app/e-tool is useful for prevention, management and recovery of the health issue(s) 28. Access to help: Easy/obvious to access health related help when needed? 1 No -Difficult to navigate or find related health information when needed 2 Can find needed information after a lot of time/effort 3 Can find needed information after some time/effort 4 Easy to understand/navigate needed information 5 Perfectly logical, easy, clear and intuitive screen flow throughout, and/or has shortcuts to needed health information. Offline options are available. F. Health-related information mean score = ___________ Scoring App/e-tool quality scores: Social anxiety apps: a systematic review and assessment of app descriptors across mobile store platforms The State of Mobile in 2020: The Key Stats You Need to Know Australia's National Digital Health Strategy Australian Government Department of Health and Ageing Managing clinical knowledge for healthcare improvements Implementation of the goalsetting components in popular physical activity apps: review and content analysis Objective user engagement with mental health apps: systematic search and panel-based usage analysis There is a non-evidence-based app for that: a systematic review and mixed methods analysis of depression-and anxietyrelated apps that incorporate unrecognized techniques Assessment of the health IT usability evaluation model (health-ITUEM) for evaluating mobile health (mHealth) technology Adolescent mental health literacy: young people's knowledge of depression and help seeking The internet as a setting for mental health service utilisation by young people Strategies for Adopting and Strengthening e-Mental Health. A review of the evidence Prescribable mHealth apps identified from an overview of systematic reviews Goal setting improves retention in youth mental health: a cross-sectional analysis Results of Australia's Health Panel Survey on Recommendations and Regulations of Smartphone Apps for Health and Wellness Review and analysis of existing mobile phone apps to support heart failure symptom monitoring and self-care management using the Mobile Application Rating Scale (MARS) Validation of the InnoWell Platform: protocol for a Clinical Trial Development and validation of the Italian version of the Mobile Application Rating Scale and its generalisability to apps targeting primary prevention Improvements in stress, affect, and irritability following brief use of a mindfulness-based smartphone app: a randomized controlled trial SPSS for Windows step by step: A simple guide and reference. 11.0 update A comparison of the reliability of smartphone apps for opioid conversion Randomised controlled trials-the gold standard for effectiveness research Good Practice Guidelines on Health Apps and Smart Devices (Mobile Health or mHealth) Right care, first time: a highly personalised and measurement-based care model to manage youth mental health Project synergy: co-designing technology-enabled solutions for Australian mental health services reform What is project synergy? Available at Digital phenotyping: technology for a new science of behaviour The Growing Value of Digital Health: Evidence and Impact on Human Health and the Healthcare System Institute Report German mobile apps in rheumatology: review and analysis using the mobile application rating scale (MARs) Mobile health technology evaluation: the mHealth evidence workshop Technology-enabled person-centred mental health service reform: strategy for implementation science Technologyenabled solutions for Australian mental health services reform: impact evaluation Understanding technology preferences and requirements for health information technologies designed to improve mental health and maintain wellbeing for older persons: a participatory design study A systematic assessment of smartphone tools for suicide prevention Using science to sell apps: evaluation of mental health app store quality claims Why digital medicine depends on interoperability mHealth and mobile medical apps: a framework to assess risk and promote safer use The efficacy of app-supported smartphone interventions for mental health problems: a meta-analysis of randomized controlled trials Mental Health Apps: How to Make an Informed Choice National College of Physicians (UK). Patient Experience in Adult NHS Services: Improving the Experience of Care for People Using Adult NHS Services: Patient Experience in Generic Terms The effectiveness of internet-delivered cognitive behavioural therapy for health anxiety in routine care Mobile apps for bipolar disorder: a systematic review of features and content quality Criteria for assessing the quality of mHead apps: a systematic review Understanding factors affecting patient and public engagement and recruitment to digital health interventions: a systematic review of qualitative studies Reenvisioning clinical science: unifying the discipline to improve the public health Spanish adaptation and validation of the Mobile Application Rating Scale questionnaire Patients seek better exchange of health data among their care providers Foundations of Clinical Research: Applications to Practice What do patients really want? An in-depth examination of patient experience in four Australian hospitals Co-designing the InnoWell platform to deliver the right mental health care first time to regional youth A user-centered model for designing consumer mobile health (mHealth) applications (apps) A review and comparative analysis of security risks and safety measures of mobile health apps Many mobile health apps target highneed, high-cost populations, but gaps remain Mobile app rating scale: a new tool for assessing the quality of health mobile apps Anxiety: there is an app for that. A systematic review of anxiety apps Assessing the Quality of Pregnancy Apps through Development and Validation of the Dutch Version of the Mobile Application Rating Scale (MARS). Presentation at European Congress of Intrapartum Care. Location: Turijn Randomized controlled trial of web-based treatment of social phobia without clinician guidance Internet treatment for depression: a randomized controlled trial comparing clinician vs. technician assistance United Kingdom National Health Service The continued use of mobile health apps: insights from a longitudinal study There is an app for that! The current state of mobile applications (apps) for DSM-5 obsessivecompulsive disorder, posttraumatic stress disorder, anxiety and mood disorders User-centered design, activity-centered design, and goal-directed design: a review of three methods for designing web applications The COVID-19 pandemic: the 'black swan' for mental health care and a turning point for e-health Digital navigators to implement smartphone and digital tools in care Diagnostic inaccuracy of smartphone applications for melanoma detection Current issues and future directions for research into digital behavior change interventions Empirical studies on usability of mHealth apps: a systematic literature review We would like to acknowledge and thank all of the participants from our co-design workshops who helped to identify apps and e-tools for use in clinical practice by and with young people. 2 Takes a lot of time or effort, sign up process is somewhat complicated and/or asks for too much information and/or offers little help 3 Takes some time or effort 4 Easy to learn (or has clear instructions); sign up process relatively simple; some help/FAQ's 5 Able to use app/e-tool immediately; intuitive; simple (no instructions needed); relevant support is obvious and helpful 8. Navigation: Does moving between screens make sense; Is it easy to move from one section of the app/e-tool to another? Does the app/e-tool provide all necessary links between screens?1 No logical connection between screens at all/navigation is difficult 2 Understandable after a lot of time/effort 3 Understandable after some time/effort 4 Easy to understand/navigate 5 Perfectly logical, easy, clear and intuitive screen flow throughout, and/or has shortcuts 9. Design: Are there intuitive popup boxes, videos, animations, audio clips, flash images etc within the e-tool or are there consistent taps/swipes, pinches/scrolls within the app/e-tool? Are these relevant/accurate/make sense and in theme with the rest of the app/e-tool?1 Completely confusing/inconsistent, information lacks relevance or is inaccurate/unnecessary 2 Often confusing/inconsistent, information of little relevance or contains some unnecessary/incorrect 3 Okay, some confusing and/or unnecessary information or some inconsistencies 4 Mostly intuitive, with negligible problems with majority of information is accurate/necessary 5 Perfectly consistent and intuitive, information is accurate/necessary B. Functionality mean score = ___________ _________________________________________________________________________________ SECTION C Aestheticsgraphic design, overall visual appeal, colour scheme, and stylistic consistency 10. Layout: Is arrangement and size of buttons, icons, menus and content on the screen appropriate? 1 Very bad design, cluttered, some options impossible to select, locate, see or read 2 Bad design, random, unclear, some options difficult to select/locate/see/read 3 Satisfactory, few problems with selecting/locating/seeing/reading items 4 Mostly clear, able to select/locate/see/read items 5 Professional, simple, clear, orderly, logically organised 11. Graphics: How high is the quality/resolution of graphics used for buttons, icons, menus and content? 1 Graphics appear amateur, very poor visual design -disproportionate, stylistically inconsistent 2 Low quality/low resolution graphics; low quality visual designdisproportionate 3 Moderate quality graphics and visual design (generally consistent in style) 4 High quality/resolution graphics and visual designmostly proportionate, consistent in style 5 Very high quality/resolution graphics and visual design -proportionate, consistent in style throughout 12. Visual appeal: How good does the app/e-tool look? 1 Ugly, unpleasant to look at, poorly designed, clashing, mismatched colours 2 Badpoorly designed, bad use of colour, visually boring 3 OKaverage, neither pleasant, nor unpleasant 4 Pleasantseamless graphicsconsistent and professionally designed 5 Beautifulvery attractive, memorable, stands out; use of colour enhances app/e-tool features/menus C. Aesthetics mean score = ___________ _________________________________________________________________________________ SECTION D Information -Contains high quality information (e.g. text, feedback, measures, references) from a credible source 13. Goals: Does app/e-tool have specific, measurable and achievable goals (are these goals specified/obvious within the app/e-tool)? N/A Description does not list goals, or app/e-tool goals are irrelevant to research goal (e.g. using a game for educational purposes) 1 App/e-tool has no chance of achieving its stated goals 2 Description lists some goals, but app/e-tool has very little chance of achieving them 3 OK. App/e-tool has clear goals, which may be achievable. 4 App/e-tool has clearly specified goals, which are measurable and achievable 5 App/e-tool has specific and measurable goals, which are highly likely to be achieved 14. Quality of information: Is the content within the app/e-tool correct (including description in app storeif an app)? Is app/e-tool up to date with current research, well written, and relevant to the goal/topic of the app/e-tool? N/A There is no information within the app/e-tool 1 Irrelevant/inappropriate/incoherent/incorrect 2 Poor. Barely relevant/appropriate/coherent/may be incorrect 3 Moderately relevant/appropriate/coherent/and appears correct 4 Relevant/appropriate/coherent/correct 5 Highly relevant, appropriate, coherent, and correct 15. Quantity of information: Is the information within the app/e-tool comprehensive and/or relevant but concise? N/A There is no information within the app/e-tool 1 Minimal or overwhelming 2 Insufficient or possibly overwhelming 3 OK but not comprehensive or concise 4 Offers a broad range of information, has some gaps or unnecessary detail; or has no links to more information and resources