key: cord-0734038-4aj3tmnm
authors: Pickles, Tim; Macefield, Rhiannon; Aiyegbusi, Olalekan Lee; Beecher, Claire; Horton, Mike; Christensen, Karl Bang; Phillips, Rhiannon; Gillespie, David; Choy, Ernest
title: Patient Reported Outcome Measures for Rheumatoid Arthritis Disease Activity: a systematic review following COSMIN guidelines
date: 2022-03-29
journal: RMD Open
DOI: 10.1136/rmdopen-2021-002093
sha: f159f90b7a1706979c2ea0fbefc96c683b5e9fb6
doc_id: 734038
cord_uid: 4aj3tmnm

BACKGROUND: The current standard of care in rheumatoid arthritis (RA) requires regular assessment of disease activity (DA). All standard RA DA measurement instruments require joint counts to be undertaken by a healthcare professional with/without a blood test. Few healthcare providers have the capacity to assess patients as frequently as stipulated by guidelines. Patient Reported Outcome Measures (PROMs) could be an efficient and informative way to assess RA DA, which is highlighted by the SARS-COV-2 pandemic, as most consultations are remote rather than face-to-face. We aimed to assess all PROMs for RA DA against the internationally recognised COSMIN guidelines to provide evidence‐based recommendations to select the most suitable PROMs. METHODS: Review registered on PROSPERO as CRD42020176176. The search strategy was based on a previous similar systematic review and expanded to include all articles up to January 2019. All identified articles were rated by two independent assessors following the COSMIN guidelines. RESULTS: 668 abstracts were identified, with 10 articles included. A further 21 were identified from a previous review. Ten PROMs were identified. There was insufficient evidence to place any of the identified PROMs into recommendation for use category A due to lack of evidence for content validity, as stipulated by the COSMIN guidelines. CONCLUSION: Lack of evidence of content validity limits suitable PROM selection, therefore none can be recommended for use. It is acknowledged that all included PROMs were developed before the COSMIN guidelines were published. Future research on PROMs for RA DA must provide evidence of content validity.

The standard measurement instrument for assessing disease activity (DA) for patients with rheumatoid arthritis (RA) has, for many years, been the Disease Activity Score (DAS) with 28-joint count (DAS28), 1 and more recently Simple Disease Activity Index (SDAI) and Clinical Disease Activity Index (CDAI). 2 DAS28 has four variants 3 but all require a laboratory test of either erythrocyte sedimentation rate or C reactive protein (CRP), and a formal tender and swollen joint count assessment (of shoulders, elbows, wrists, hands and knees) undertaken by a healthcare professional. Some of the DAS28 variants also factor in a patient global assessment on a 10 cm visual analogue scale, which adds a level of patient involvement. In common with DAS28, SDAI and CDAI require tender and swollen joint counts and a patient global assessment. In addition, CRP and a physician global assessment are also required for SDAI

What is already known about this subject? ► Ten Patient Reported Outcome Measures (PROMs) have been developed to measure in rheumatoid arthritis (RA) disease activity (DA). Previous reviews have suggested the use of RADAI, RADAI5, PAS-II and RAPID3.

► This is the first systematic review of PROMs for RA DA that follows the recent COSMIN guidelines. ► There was insufficient evidence to recommend any of the identified PROMs for use due to lack of evidence for content validity.

How might this impact on clinical practice or further developments?

RMD Open RMD Open RMD Open and CDAI, respectively. Between joint counts and laboratory tests, these assessments are very time-consuming and resource-intensive and can only be undertaken when a patient comes in for a scheduled consultation. The current standard of care in RA is 'Treat-to-Target' (T2T), which aims for sustained remission or failing this, low DA score. 4 5 Regular assessment of DA and adjustment of treatment accordingly is an integral part of T2T. National Institute for Health and Care Excellence (NICE) 6 and the European Alliance of Associations for Rheumatology (EULAR) 5 recommend DA is monitored every one to 3 months when disease is uncontrolled, and every 6-12 months when treatment target has been reached. Few healthcare providers have the capacity to assess patients as frequently as stipulated by NICE or EULAR guidelines: every 6 months is typically the best that is currently managed. 7 The SARS-COV-2 pandemic has made the problem more conspicuous with remote rather than face-to-face consultations. With infrequent monitoring, treatment is not adjusted sufficiently to keep pace with fluctuation in DA. It can also be the case that those in RA remission are seen more often than necessary while opportunities to treat RA flares are often provided too late. 8 Alongside this, and with the advent of patient-centred care and value-based healthcare, Patient Reported Outcome Measures (PROMs) have become the pertinent options for monitoring disease progression and quality of life in numerous fields. 9 Unlike paper-based PROMs, electronic PROMs and computer adaptive test (CAT) platforms provide efficient, patient-friendly and locationindependent methods of collecting such data, which can also satisfy the necessary properties required to enable useful measurement. 9 These CAT platforms are developed under item response theory or Rasch measurement theory methodologies, and allow for patients to respond to a minimal set of items while still calculating an accurate estimate. 10 Such examples are seen in the patient-reported outcomes measurement information system initiative. 11 The research around RA DA has suggested PROMs might prove preferable to measures requiring biomarkers. 12 13 Further, electronic versions of PROMs could be the future of measurement in rheumatology. 7 14 15 To best understand the currently available psychometric evidence for these PROMs, a first step is to undertake a systematic review and assess the identified PROMs. Our review builds on the work of a 2016 systematic review in the same area, 16 which concluded that three PROMS: Rheumatoid Arthritis Disease Activity Index (RADAI), Rheumatoid Arthritis Disease Activity Index-Five (RADAI5) and Routine Assessment of Patient Index Data 3 (RAPID3), plus another measurement instrument called Patient-derived Disease Activity Score with 28-joint counts (Pt-DAS28, which is DAS28 but with the 28-joint count completed by the patient) had the strongest and most extensive validation. This systematic review identified articles describing, and assessed the properties of, measurement instruments that are not PROMs. Here though, a tighter lens is applied to focus solely on PROMs in the justification for the inclusion of articles in this review. Furthermore, the accepted guidelines concerning these systematic reviews for assessing PROMs from COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) have been updated since the 2016 systematic review 16 and now have a major focus on content validity. [17] [18] [19] This inevitably influences the assessment of legacy PROMs like those for RA DA, which were developed prior to these guidelines. The recommendations for use, which are the endpoint of the application of the COSMIN guidelines, are based largely on content validity as well.

Given this clear and definitive gap, our objective was to systematically review all PROMs for RA DA against the internationally recognised 2018 COSMIN guidelines [17] [18] [19] to provide evidence-based recommendations for use of the most suitable PROMs in research and clinical practice.

This systematic review of all PROMs for RA DA was registered with the International Prospective Register of Systematic Reviews (PROSPERO) 20 as CRD42020176176, where a protocol is available, 21 and written in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines, 22 with the PRISMA checklist provided (online supplemental appendix 1).

COSMIN guidelines were applied throughout this systematic review. [17] [18] [19] The review of Hendrikx et al 16 used the original COSMIN guidelines. [23] [24] [25] [26] [27] These have since been updated and it is these guidelines [17] [18] [19] and the methodology within them that were implemented here.

A search strategy was required to identify relevant articles (online supplemental appendix 2). Hendrikx et al published their search strategy, 16 which was based on a COSMIN guideline. 28 This search strategy was tested and refined to ensure certain articles were identified. An adapted version of that strategy was used to search the PubMED and EMBASE databases up to January 2019. The search was undertaken by one reviewer (TP).

Following the implementation of the search strategy, a single assessor (TP) undertook the work of reviewing the articles for relevance, through screening of titles and abstracts to assessing eligibility for the review. The following inclusion and exclusion criteria were used:

Inclusion criteria: 1. The study population described in the article is of adult patients with RA.

Rheumatoid arthritis Rheumatoid arthritis 2. The article describes details on a PROM specifically for DA in RA that can be reviewed against the COS-MIN guidelines 17-19 3. The article is in the English language. 4. The article is published in a peer-reviewed journal.

Exclusion criteria 1. The study population described in the article includes diseases other than RA (unless the details pertaining the patients with RA are presented separately). 2. The study population described in the article includes children (unless the details pertaining the adult patients are presented separately); 3. The article describes a measurement instrument that requires healthcare professional assessment; 4. The article describes a measurement instrument that requires a biomarker level determined through laboratory test.

Data extraction: study population characteristics Characteristics of the study population, including number of participants, age, gender, rheumatoid factor (per cent positive), disease duration and DA (at baseline if reported at multiple timepoints) were extracted by one assessor (TP). Where multiple study populations were described within a single article, these were pooled together and described as one population if the statistics presented allowed for this. Where study population characteristics were described in a separate article to that reviewed here, that separate article was sought out and characteristics extracted as necessary.

Data extraction: risk of bias, content validity and quality of measurement properties Data on the relevant measurement properties were extracted and summarised in the 'COSMIN checklist' Microsoft Excel (2018) spreadsheet (online supplemental appendix 3) available for download from the COSMIN website. 29 The spreadsheet contains the necessary risk of bias questions, which require the assessor to complete with categories: very good (V), adequate (A), doubtful (D), inadequate (I) or N/A (N). It also requires the assessor to rate the content validity and quality of measurement properties with ratings: sufficient (+), insufficient (-), inconsistent (±) or indeterminate (?). Decisions were made across the COSMIN domains of PROM development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypothesis testing for construct validity and responsiveness. Two assessors (TP, and one of RM, OLA or CB) independently assessed the articles for risk of bias and quality of measurement property. Where ratings differed, a consensus was agreed by the two assessors. For the purpose of the COSMIN criterion validity and responsiveness domains, a gold standard measurement instrument needed to be identified. DAS28, 1 SDAI and CDAI 2 are the most widely used and are accepted measurement instruments for this purpose and were therefore used as gold standard measurement instruments for this review.

For hypothesis testing for the COSMIN construct validity and responsiveness domains, subgroups within a rheumatoid arthritis population were assessed.

Where statistics, such as effect sizes, to be reviewed for COSMIN domains were not provided in an article, but could be readily calculated from other values given in the article, such as means and SD, the relevant statistics were calculated.

For all correlations, Spearman's ρ and Kendall's τ were accepted as appropriate statistical methods. Pearson's r could also be considered as an appropriate statistical method if some form of distributional information of the PROM was provided, such as mean and SD or a histogram.

In the case where multiple statistics were used to assess for quality of a PROM's measurement property (that is, one COSMIN domain) within an article, the lowest rating was used.

The COSMIN team were consulted for advice on the following two methodological points to confirm the best course of action.

Where risk of bias items required disease-specific knowledge, the independent assessors consulted with EC who, as a rheumatologist, is a clinical expert in RA to ensure consensus.

Assessment of some measurement properties according to the COSMIN guidelines requires statistical tests that can only be performed with specific software. In the case that the article stated that an analysis had been undertaken but the software stated and/or the outputs reported in the article were not feasible for that analysis, then the assessors followed the rating for risk of bias and quality of measurement properties for the actual analysis, rather than what it was stated as.

Determination of overall rating, quality of evidence and recommendation for use The quality of the evidence for each summarised measurement property of each PROM was determined by assessors using the modified GRADE 30 approach defined in the COSMIN guidelines. [17] [18] [19] Quality of evidence has categories: high (H), moderate (M), low (L) and very low (VL). These were determined for each COSMIN domain for each PROM on the basis of risk of bias, inconsistency, imprecision and indirectness as specified in the COSMIN guidelines. [17] [18] [19] The rules for downgrading levels are well set for risk of bias, imprecision and indirectness, but require some formulation for inconsistency. We decided the following: ► If there was a ≥75% majority for a quality of measurement property rating across the studies, then there was no inconsistency. ► If there was a majority towards one such quality of measurement property rating but that majority was 60% to <75%, then we noted the inconsistency as serious.

► If there was no majority (50%) or if there was a majority towards one such quality of measurement property rating but that majority was 50% to <60%, then we noted the inconsistency as very serious. The overall rating was the majority content validity or quality of measurement property rating for each domain for each PROM, so we similarly used categories sufficient (+), insufficient (-), inconsistent (±) or indeterminate (?). In the case of no majority, the lower quality of measurement property rating was used.

Once all summarised ratings for each COSMIN domain of each PROM were decided, recommendations for use of the reviewed PROMS for RA DA were applied according to the categories stipulated in the COSMIN guidelines. [17] [18] [19] These are: 1. PROM has evidence for sufficient content validity (any level) and at least low-quality evidence for sufficient internal consistency. Therefore, the PROM can be recommended for use and results obtained with these PROMS can be trusted; 2. PROM cannot be categorised into A or C. Therefore, the PROM has potential to be recommended for use, but requires further research to assess its quality. 3. PROM has high quality evidence for an insufficient measurement property. Therefore, the PROM should not be recommended for use.

Percentage agreement was calculated for all content validity and related risk of bias, plus all quality of measurement properties and related risk of bias combined. While there are three individuals acting as independent assessors, these are all combined here.

A PRISMA Flow Diagram 22 31 presents the results of the search strategy and the proceeding reviewing undertaken to reach the 31 articles in this review (figure 1). Of the 34 articles from the Hendrikx et al review 16 (left side of figure 1), 13 articles 32-44 were excluded. The 21 remaining articles 45-65 from the Hendrikx et al review 16 were included in this review.

After the deletion of 8 duplicates (identified twice by the same source), 668 articles were identified (right side of figure 1 ). Ten 66-75 of these were included in the review, so a total of 31 articles were included. These 31 articles described ten PROMs: RADAI, Rheumatoid Arthritis Disease Activity Index-Short Form (RADAI-SF), RADAI5, RAPID3, Routine Assessment of Patient Index Data 4 (RAPID4), Patient-based Disease Activity Score 2 (PDAS2), Patient Activity Score (PAS), Patient Activity Score-II (PAS-II), Patient Reported Outcome CLinical ARthritis Activity (PRO-CLARA) and Global Arthritis Score (GAS). Of the 10 articles published since the Hendrikx et al review, 16 

The characteristics of the study populations of the 31 articles are given in table 1. The majority of these articles (n=19) described a single PROM, nine described two PROMs, two described three PROMs and one article described four PROMs.

Only 1 clinical trial 68 was in the was included in this review of the 51 clinical trial articles sought for retrieval. It was notable that many excluded trials describe PROMs but not for RA DA, and where PROMs for RA DA were described, the statistical detail provided is only available to be assessed against the COSMIN guidelines in this single article.

Additional methodological requirements defined after articles were identified The majority of PROMs described in the articles found in this review have a scoring system involving precalculation of a variable before summing up to create a total score, rather than just summing up the items they include. This was problematic for the assessment of the COSMIN structural validity and internal consistency domains, as, to be Rheumatoid arthritis Rheumatoid arthritis Rheumatoid arthritis 

undertaken correctly and to provide relevant meaningful results, these require the individual items to be inputted into the analysis, rather than a combination of precalculated variable and individual items. Therefore, in the case where results that would be assessed under the COSMIN structural validity and internal consistency domains were given but the PROM had such a scoring structure, the result was ignored and a note added to the relevant cell or cells of the 'COSMIN checklist' Microsoft Excel (2018) spreadsheet stating this. All other COSMIN domains were still assessed, as those numeric COSMIN domains focus on analyses requiring the total score, rather than that of the individual items. A list of PROMs, the reasons why there was a problem and whether any structural validity or internal consistency analyses were undertaken are provided (online supplemental appendix 4). For the assessment of the COSMIN measurement error domain, the minimally important change (MIC) must be defined for comparison against smallest detectable change (SDC) or limits of agreement (LoA). SDCs were calculated for RADAI and RAPID3 in one reviewed article 63 and these were compared against values (labelled as minimally important difference (MID)) from an article not reviewed here. 76 It was notable that in article not reviewed here, 76 RAPID3 was correctly scored 0-30, which was the case in the article providing the MIC, 76 but in the reviewed article, 63 RAPID3 was scored 0-10, so for comparison, the MIC in Pope was divided by 3; 3.6 became 1.2.

Within the COSMIN hypothesis testing for construct validity domain, and specifically the COSMIN comparison with other outcome measurement instruments (convergent validity) subdomain, the second risk of bias item asked: 'Were the measurement properties of the comparator instrument(s) adequate?' and to answer this, there was a need to determine whether sufficient measurement properties of the comparator instrument were available and which study population they applied to. In all cases but one, there were 'sufficient measurement properties of the comparator instrument(s) in a population similar to the study population' except for pulp-to-palm distance, which only had measurement properties described in an orthopaedics population. 77 For this reason, the risk of bias rating for RAPID3 in one of the reviewed articles 72 was doubtful (D), regardless of the other comparator instruments.

Relevant specifically to quality of measurement properties ratings of the COSMIN hypothesis testing for construct validity and responsiveness domains, COSMIN recommend that the review team formulate a set of hypotheses. 78 Therefore, hypotheses (online supplemental appendix 5) were agreed in consultation between all assessors and with clinical expertise from EC.

Only two articles described content validity, both under the heading of Cognitive interview study or other pilot test for the COSMIN comprehensibility domain (table 2, Content validity-PROM Development columns). These covered the PROMs PDAS2 (47) and PRO-CLARA. 59 For PDAS2, very little detail about the process was available other than the number of participants interviewed, which was 20, so a D risk of bias rating was given under cognitive interview study or other pilot test comprehensibility study. For the content validity comprehensibility, patients were not asked about the comprehensibility of item instructions and it was not clear if patients were asked about the comprehensibility of all of the items and RMD Open RMD Open RMD Open their response options (some items were mentioned but others were not), so the overall content validity rating was -. The rating of reviewers was + because it was clear that all items and response options were appropriately worded and response options matched the items. For PRO-CLARA, 72 patients were surveyed but a qualitative method should be used to assess content validity, so a D risk of bias rating was given under Cognitive interview study or other pilot test Comprehensibility study. There was no detail about patients being asked about the comprehensibility of item instructions and of the items and their response options, so the overall content validity rating was ?. The rating of reviewers was + because it was clear that all items and response options were appropriately worded and response options matched the items.

Content validity, downgrading, overall rating and quality of evidence There were no Relevance or Comprehensiveness ratings, so there were no overall content validity ratings; therefore, the quality of evidence rating for both of these was Very low. As there was no majority for content validity overall rating, the lowest was used and was thus insufficient (-) for PDAS2 and indeterminate (?) for PRO-CLARA (table 3, Content validity Comprehensibility row). Table 2 columns to the right of content validity summarise the quality of measurement properties and related risk of bias ratings.

Quality of measurement properties, downgrading, overall rating and quality of evidence The evidence in table 2, allowed the overall rating and quality of evidence to be determined, as presented in table 3.

Using the overall Rating and quality of evidence for each COSMIN domain within each PROM, recommendations for use in research and clinical practice, the main result of this systematic review, were attributed (table 3, final row) as follows: ► Category B: RADAI-SF, PDAS2, PAS, PRO-CLARA and RAPID3. ► Category C: RADAI, RADAI5, PAS-II, RAPID4 and GAS. There were no PROMs attributed to Category A, as none had sufficient evidence of content validity. All Category C PROMs had at least one COSMIN domain with High quality evidence for an insufficient (-) measurement property and all Category B PROMs had at least one COSMIN domain with High quality evidence for a sufficient (+) measurement property, except RAPID3, which, at best, had Moderate quality evidence for an insufficient (-) measurement property. Despite this, it fitted into neither Category A nor C and was therefore attributed to Category B.

From a total of 435 ratings, 399 were in agreement, giving an overall agreement of 91.7%.

The lack of sufficient evidence for content validity means that no PROMs identified in this review can be recommended for use (attributed to Category A) in research and clinical practice. PROMs RADAI-SF, PDAS2, PAS, PRO-CLARA and RAPID3 are attributed to Category B and therefore have potential to be recommended for use, but require further research to assess their quality. PROMs RADAI, RADAI5, PAS-II, RAPID4 and GAS are attributed to Category C and therefore should not be recommended for use.

RAPID3 is attributed to Category B despite, at best, having Moderate quality evidence for an insufficient (-) measurement property for the COSMIN responsiveness domain. This is a lower level of evidence than all PROMs in Category C, which have at least one COSMIN domain with High quality evidence for an insufficient (-) measurement property. It would appear as a limitation of the COSMIN guidelines that there is no Category D for PROMs like RAPID3.

While not possible to excuse the research community for not having undertaken the necessary research, it is notable that all identified PROMs were first described before or in the same year as the first set of COSMIN guidelines in 2010, [23] [24] [25] [26] [27] and therefore all before the updated COSMIN guidelines [17] [18] [19] in which content validity was prioritised. This research must be done before any of these PROMs can be recommended and is also true of any new PROMs that are developed for the measurement of RA DA.

It is also important to note the fact that many of the PROMs identified here have a scoring system involving precalculation of a variable before summing up to create a total score, rather than just summing up the items they include, and that this causes an issue with the assessment of the COSMIN structural validity domain and the COSMIN internal consistency domain. That this is the case for 8 of the 10 PROMs (RADAI, PDAS2, PAS, PAS-II, RAPID3, RAPID4, PRO-CLARA and GAS) identified here suggests that there is a systematic reason behind this for PROMs for RA DA. Five of these precalculate a joint count variation and six precalculate a functional ability variation joint count variations are used in all three gold standard measurement instruments defined here (DAS28, CDAI and SDAI), 1-3 while joint count and functional ability variations are key within the American College of Rheumatology (ACR) criteria. 79 A desire to continue the use of known instruments with PROMs may have contributed to this issue and can be moved away from, as is seen in RADAI-SF and RADAI5.

Assessor agreement was considerably lower for content validity related risk of bias and content validity ratings. This is due to the paucity of information available in Rheumatoid arthritis Rheumatoid arthritis Rheumatoid arthritis RMD Open RMD Open RMD Open the two articles, 47 59 and difficulty in interpreting what the authors actually undertook. Assessor agreement was much higher for quality of measurement property related risk of bias and quality of measurement property, and the overall agreement is also high. This is largely dominated by 231 quality of measurement property related risk of bias agreements on a Very good rating. The ACR Rheumatoid Arthritis Disease Activity Measure Workgroup have published a systematic review of all RA DA measurement instruments 80 and recommend the following two PROMs identified here as preferred measures for regular use: RAPID3 and PAS-II. DAS28, CDAI and SDAI are also recommended as preferred measures for regular use. Additionally, in this ACR review, two PROMs identified here reached the minimum standard for regular use: RADAI and RADAI5, and this was also the case for DAS, Patient Derived DAS28, Hospital Universitario La Princesa Index, Multi-Biomarker Disease Activity Score and Routine Assessment of Patient Index Data 5 (RAPID5).

The previous systematic review undertaken by Hendrikx et al in 2016 16 stated that, of the PROMs identified here, RADAI, RADAI5 and RAPID3 had the most extension validations and the strongest level of evidence. It also stated the same of a measurement instrument labelled as Pt-DAS28, which is not a PROM.

There are therefore recommendations for PROMs RADAI, RADAI5 and RAPID3 from both sources 16 80 and PROM PAS-II from the ACR review, 80 while here we cannot recommend any identified PROMs.

The ACR review, 80 an update from 2012, 81 includes all possible measurement instruments for assessing RA DA. It is therefore difficult to fully implement the COSMIN guidelines [17] [18] [19] for this review as these relate solely to PROMs. Furthermore, their methods included a Delphi survey to aid with determining if measurement instrument should be recommended.

In the Hendrikx et al review, 16 a previous set of COSMIN guidelines was employed 23-27 that did not prioritise content validity. Also, measurement instruments reviewed included biomarkers and/or healthcare professional assessments, included articles contained information on the evolution of PROMs not yet in their finalised state, and other included articles described measurement instruments as PROMs when they fulfilled a different role. This review applies a tighter lens focusing solely on PROMs and makes use of the most recent COSMIN guidelines, [17] [18] [19] which provides some reasoning behind the discrepancies noted above.

The set of assessors were not experts in RA and therefore did not have the knowledge to complete with certainty some risk of bias items. As mentioned, where this was the case, TP discussed the matter with EC, and then with the independent assessors. This is a limitation of the independence of the assessors on these few risk of bias items, as there was essentially only one opinion. Further limitations are that only one assessor (TP) undertook the search strategy, article selection and data extraction of study population characteristics.

We state the necessary hypotheses required for this review in the Additional methodological requirements defined after articles were identified. These were written in consultation with the review team and EC. For comparison, we searched PROSPERO for the term 'COSMIN' and limited the review to those registered in musculoskeletal Health area of review. A total of 184 records were returned but found only one published article that defines hypotheses. 82 The hypotheses in this article relate solely to correlations. As we have stated, this articles also sets 0.5 as a lower bound for convergent correlations, but then uses 0.3-0.5 for semiconvergent correlations, where we use 0.4 as a lower bound, and also sets 0.3 as an upper bound for divergent correlations, where we use 0.3-0.5. There is of course no set guideline on where to place these bounds, and we see here that only correlations are set, where we also defined hypotheses for effect sizes and areas under the curve. The article found through PROS-PERO 82 makes reference to some of the updated 2018 COSMIN articles 17 18 and also the online user manual, which does provide very similarly written generic hypotheses, which are reproduced from Measurement in Medicine. 78 There is a case to attempt more detailed research into this area to provide guidelines on how review teams, or indeed researchers attempting the original research, could define these hypotheses.

None of the identified articles made use of item response theory or Rasch measurement theory to evidence the psychometric properties of these PROMs. These are defined for use in the COSMIN structural validity domain and can also provide evidence for the COSMIN crosscultural validity/measurement invariance domain, for which there was no evidence. All evidence in the COSMIN structural validity domain was provided through confirmatory or explanatory factor analyses. 52 53 64 In conclusion, no PROMs identified in this review can be recommended for use according to COSMIN guidelines [17] [18] [19] due to lack of sufficient evidence for content validity. This is despite previous reviews 16 80 suggesting the use of RADAI, RADAI5, PAS-II and RAPID3. All PROMs identified here were first described before initial COSMIN guidelines were published and thus also before the updated guidelines that prioritised content validity. The majority of identified PROMs have scoring systems that preclude evidence in the COSMIN structural validity and internal consistency domains. Care should be taken when making use, or interpreting the results, of any of the PROMs for RA DA identified in this review. Future research on the PROMs identified here, or any future developed PROMs for RA DA, must look to evidence content validity. Future developed PROMs should implement scorings systems without precalculation of variations entered into the scoring system with other items. These could also look to item response theory or Rasch measurement theory to evidence their psychometric Contributors TP and EC drafted the initial research idea. TP undertook the search strategy, reviewed all titles and then abstracts for inclusion, reviewed all full-text articles, extracted results and assessed them against COSMIN guideline criteria. RM, OLA and CB extracted results and assessed them against COSMIN guideline criteria for the articles assigned by TP. All authors declare having read and made a substantial contribution to the final manuscript. TP is the guarantor. Disclaimer The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data availability statement Data sharing not applicable as no datasets generated and/or analysed for this study.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Open access This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Modified disease activity scores that include twenty-eight-joint counts. development and validation in a prospective longitudinal study of patients with rheumatoid arthritis

The simplified disease activity index (SDAI) and the clinical disease activity index (CDAI): a review of their usefulness and validity in rheumatoid arthritis

Comparison of the disease activity score using erythrocyte sedimentation rate and C-reactive protein in African Americans with rheumatoid arthritis

Progression of radiologic damage in patients with rheumatoid arthritis in clinical remission

EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2016 update

Rheumatoid arthritis in adults: management

Providing 'the bigger picture': benefits and feasibility of integrating remote monitoring from smartphones into the electronic health record

The role of remote monitoring in the future of the NHS

Patient reported outcome measures could help transform healthcare

Computerised adaptive testing accurately predicts CLEFT-Q scores by selecting fewer, more patient-focused questions

The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research

AB0325 Biomarkers and Patient Tailored Approach in Rheumatoid Arthritis: Can Proms be the Missing Biomarker?

Call for action: how to improve use of patient-reported outcomes to guide clinical decision making in rheumatoid arthritis

Patient reported outcome measures in rheumatic diseases

Capturing remote disease activity -results of a 12-month clinical pilot of a smartphone app in NHS rheumatology clinics in Bristol

Systematic review of patient-reported outcome measures (PROMs) for assessing disease activity in rheumatoid arthritis

COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures

COSMIN guideline for systematic reviews of patient-reported outcome measures

COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study

Registration of systematic reviews in prospero: 30,000 records and counting

Patient-reported outcome measures for rheumatoid arthritis disease activity: a systematic review

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

Inter-Rater agreement and reliability of the COSMIN (consensus-based standards for the selection of health status measurement instruments) checklist

The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content

The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patientreported outcomes

The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study

Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist

Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments

Guideline for Systematic Reviews of Outcome Measurement Instruments

GRADE Handbook -Handbook for grading the quality of evidence and the strength of recommendations using the GRADE approach

Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement

Weekly home selfassessment of RAPID-4/3 scores in rheumatoid arthritis: a 6-month study in 26 patients

Can remission in rheumatoid arthritis be assessed without laboratory tests or a formal joint count? possible remission criteria based on a self-report RAPID3 score and careful joint examination in the ESPOIR cohort

The uses of disease activity scoring and the physician global assessment of disease activity for managing rheumatoid arthritis in rheumatology practice

Test-Retest reliability of the disease activity score 28 CRP (DAS28-CRP), the simplified disease activity index (SDAI) and the clinical disease activity index (CDAI) in rheumatoid arthritis when based on patient self-assessment of tender and swollen joints

Patient selfassessment and physician's assessment of rheumatoid arthritis activity: which is more realistic in remission status? A comparison with ultrasonography

Patient-Derived joint counts are a potential alternative for determining disease activity score

an index of only 3 patient self-report core data set measures, but not ESR, recognizes incomplete responses to methotrexate in usual care of patients with rheumatoid arthritis

An index of only patientreported outcome measures, routine assessment of patient index data 3 (RAPID3), in two abatacept clinical trials: similar results to disease activity score (DAS28) and other rapid indices that include physician-reported measures

An index of patient reported outcomes (PRO-Index) discriminates effectively between active and control treatment in 4 clinical trials of adalimumab in rheumatoid arthritis

Proposed severity and response criteria for routine assessment of patient index data (RAPID3): results for categories of disease activity and response criteria in abatacept clinical trials

An index of the three core data set patient questionnaire measures distinguishes efficacy of active treatment from that of placebo as effectively as the American College of Rheumatology 20% response criteria (ACR20) or the Disease Activity Score (DAS) in a rheumatoid arthritis clinical trial

Patient-Reported 28 swollen and tender joint counts accurately represent RA disease activity and can be used to assess therapy responses at the group level

The rheumatoid arthritis disease activity index-5 in daily use. proposal for disease activity categories

Evaluation of self-report questionnaires for assessing rheumatoid arthritis activity: a crosssectional study of RAPID3 and RADAI5 and flare detection in 200 patients

MDHAQ/RAPID3 to recognize improvement over 2 months in usual care of patients with osteoarthritis, systemic lupus erythematosus, spondyloarthropathy, and gout, as well as rheumatoid arthritis

Development and validation of a patient-based disease activity score in rheumatoid arthritis that can be used in clinical trials and routine practice

Responsiveness of the self-assessed rheumatoid arthritis disease activity index to a flare of disease activity

Feasibility and validity of the radai, a self-administered rheumatoid arthritis disease activity index

Reexamination of the assessment criteria for rheumatoid arthritis disease activity based on comparison of the disease activity score 28 with other simpler assessment methods

A patient-derived disease activity score can substitute for a physician-derived disease activity score in clinical research

Patient-Centered rheumatoid arthritis disease activity assessment by a modified radai

A comparison of patient questionnaires and composite indexes in routine care of rheumatoid arthritis patients

RAPID3 (routine assessment of patient index data 3) severity categories and response criteria: similar results to DAS28 (disease activity score) and CDAI (clinical disease activity index) in the rapid 1 (rheumatoid arthritis prevention of structural damage) clinical trial of certolizumab pegol

RAPID3 (routine assessment of patient index data 3), a rheumatoid arthritis index without formal joint counts for routine care: proposed severity categories compared to disease activity score and clinical disease activity index categories

RAPID3 (routine assessment of patient index data) on an MDHAQ (multidimensional health assessment questionnaire): agreement with DAS28 (disease activity score) and CDAI (clinical disease activity index) activity categories, scored in five versus more than ninety seconds

Remission in rheumatoid arthritis: a comparison of the 2 newly proposed ACR/EULAR remission criteria with the rheumatoid arthritis disease activity index-5, a patient self-report disease activity index

The comparative responsiveness of the patient self-report questionnaires and composite disease indices for assessing rheumatoid arthritis activity in routine care

Psychometric properties of an index of three patient reported outcome (pro) measures, termed the clinical arthritis activity (PRO-CLARA) in patients with rheumatoid arthritis. the new indices study

Evaluation of disease activity in rheumatoid arthritis by routine assessment of patient index data 3 (RAPID3) and its correlation to disease activity score 28 (DAS28) Rheumatoid arthritis Rheumatoid arthritis Rheumatoid arthritis and clinical disease activity index (CDAI): an Indian experience

A self-administered rheumatoid arthritis disease activity index (radai) for epidemiologic research. psychometric properties and correlation with parameters of disease activity

Evaluation of selected rheumatoid arthritis activity scores for office-based assessment

Test-Retest reliability of disease activity core set measures and indices in rheumatoid arthritis

Psychometric properties of the rheumatoid arthritis disease activity index (radai) in a cohort of consecutive Dutch patients with RA starting anti-tumour necrosis factor treatment

A composite disease activity scale for clinical practice, observational studies, and clinical trials: the patient activity scale (PAS/PAS-II)

Evaluating patient reported outcomes in routine practice of patients with rheumatoid arthritis treated with biological disease modifying anti rheumatic drugs (b-DMARDs)

Performance of patient-reported outcomes in the assessment of rheumatoid arthritis disease activity:the experience of the ESPOIR cohort

Physical activity to reduce fatigue in rheumatoid arthritis: a randomized controlled trial

Performance of routine assessment of patient index data 3 (RAPID3) for assessment of rheumatoid arthritis in clinical practice: differential agreement of RAPID3 according to disease activity categories

Clinical Disease Activity Index (CDAI), Health Assessment Questionnaire Disability Index (HAQ-DI) & Routine Assessment of Patient Index Data with 3 measures (RAPID3) for assessing disease activity in patients with rheumatoid arthritis at initial presentation

Correlation between rapid-3, DAS28, CDAI and SDAI as a measure of disease activity in a cohort of Colombian patients with rheumatoid arthritis

RAPID3 scores and hand outcome measurements in RA patients: a preliminary study

Patient acceptable symptom state in self-report questionnaires and composite clinical disease index for assessing rheumatoid arthritis activity: identification of cutoff points for routine care

Disease activity dynamics in rheumatoid arthritis: patients' self-assessment of disease activity via WebApp

Validation of RAPID3 using a Japanese version of multidimensional health assessment questionnaire with Japanese rheumatoid arthritis patients: characteristics of RAPID3 compared to DAS28 and CDAI

Impact of certolizumab pegol on patient-reported outcomes in rheumatoid arthritis and correlation with clinical measures of disease activity

Validity of pulp-topalm distance as a measure of finger flexion

Measurement in medicine: a practical guide

American College of Rheumatology Committee to Reevaluate Improvement Criteria. A proposed revision to the ACR20: the hybrid measure of American College of rheumatology response

2019 update of the American College of rheumatology recommended rheumatoid arthritis disease activity measures

Rheumatoid arthritis disease activity measures: American College of rheumatology recommendations for use in clinical practice

Systematic review and meta-analysis of measurement properties of the Hip disability and Osteoarthritis Outcome Score -Physical Function Shortform (HOOS-PS) and the Knee Injury and Osteoarthritis Outcome Score -Physical Function Shortform (KOOS-PS)

Certolizumab pegol plus methotrexate is significantly more effective than placebo plus methotrexate in active rheumatoid arthritis: findings of a fifty-twoweek, phase III, multicenter, randomized, double-blind, placebocontrolled, parallel-group study

Validity of single variables and composite indices for measuring disease activity in rheumatoid arthritis