key: cord-0785047-pvl8jvxy authors: Bradley, Kendall E.; Cook, Chad; Reinke, Emily K.; Mather, Richard C.; Riboh, Jonathan; Lassiter, Tally; Wittstein, Jocelyn R. title: Comparison of the accuracy of telehealth examination versus clinical examination in the detection of shoulder pathology date: 2020-08-29 journal: J Shoulder Elbow Surg DOI: 10.1016/j.jse.2020.08.016 sha: a329d847b2f3662dc2206431db6794208633ff85 doc_id: 785047 cord_uid: pvl8jvxy Hypothesis/Background In 2017, the American Orthopaedic Association advocated for the increased use of telehealth as an assessment and treatment platform, and demand has significantly increased during the COVID 19 pandemic. Diagnostic effectiveness (also called overall diagnostic accuracy) and reliability of a telehealth clinical examination versus a traditional shoulder clinical examination (SCE) has not been established. Our objective is to compare the diagnostic effectiveness of a telehealth shoulder examination against a SCE for rotator cuff tear (RCT), using magnetic resonance imaging (MRI) as a reference standard; secondary objectives included assessing agreement between test platforms and validity of individualized tests. We hypothesize that tests provided in a telehealth platform would not have inferior diagnostic effectiveness to a SCE. Methods The study is a case based, case control design. Two clinicians selected movement, strength and special tests for the SCE that are associated with diagnosis of RCT and identified similar tests to replicate for a simulated telehealth-based examination (STE). Consecutive patients with no prior shoulder surgery or advanced imaging underwent both the SCE and STE in the same visit using two separate assessors. We randomized the order of SCE or STE. A blinded reader assessed an MRI, to use as a reference standard. We calculated diagnostic effectiveness, which provides values from 0% to 100% as well as agreement statistics (Kappa) between tests by assessment platform, and sensitivity, specificity, and likelihood ratios for individual tests in both SCE and STE. We compared diagnostic effectiveness (overall) of SCE and STE with a Mann Whitney U. Results We included sixty-two (62) consecutive patients with shoulder pain, aged 40 or older; 50 (81%) received an MRI as a reference standard. Diagnostic effectiveness of stand-alone tests were poor regardless of the group, with the exception of a few tests with high specificity. None had greater than 70% accuracy. There was no significant difference between the overall diagnostic effectiveness of the STE and the SCE (p=0.98). Overall agreement between the STE tests and the SCE tests ranged from poor to moderate (Kappa 0.07-0.87). Conclusion This study identified initial feasibility and noninferiority of the physician-guided, patient-performed STE when compared to a SCE in detection of rotator cuff tears. Whereas these results are promising, larger studies are needed for further validation of a STE assessment platform. were poor regardless of the group, with the exception of a few tests with high specificity. None 25 had greater than 70% accuracy. There was no significant difference between the overall 26 diagnostic effectiveness of the STE and the SCE (p=0.98). Overall agreement between the STE 27 tests and the SCE tests ranged from poor to moderate (Kappa 0.07-0.87). 28 Conclusion: This study identified initial feasibility and noninferiority of the physician-guided, 29 patient-performed STE when compared to a SCE in detection of rotator cuff tears. Whereas Shoulder pain is a common cause of disability in the adult population, with rotator cuff 39 (RTC) tendon tears increasing in frequency each decade after the age of 40 years, and affecting 40 approximately 25% of adults over the age of 50. 40; 44 Differentiating shoulder pain requires a 41 careful assessment of patient report, physical examination and imaging. Meta-analyses have 42 demonstrated that the stand-alone diagnostic utility of movement, physical testing and special 43 tests of the shoulder are poor. 15; 16 Imaging fares better and improves the ability to identify full 44 thickness RCTs primarily in addition to other shoulder pathology. 27 These challenges suggest 45 that a shoulder specialist is required to distinguish between the prevalent shoulder pathologies, 46 variabilities in test findings, and imaging findings. 6; 17 47 Unfortunately, there is a lack of access to subspecialized orthopedic care, including 48 shoulder specialists, in rural settings as compared to urban settings. 21; 23; 26; 28; 33 Access issues 49 Accuracy of telehealth versus clinical examination will worsen with a projected shortfall of 20,000-30,000 surgical specialists by 2030. 18 To address 50 this shortfall, some shoulder surgeons have integrated telehealth as an alternative to conventional 51 clinical care 22 . Telehealth evaluations are cost effective and provide access to specialized care in 52 a variety of orthopedic conditions in the US and abroad. 11; 19; 24; 29; 38; 42; 43 In Finland, 53 teleconsultations allowed general practitioners to examine and diagnose 25 % of patients that 54 otherwise would have required referral to an outside provider. 11 The United States Army began 55 using telehealth medicine for orthopedic surgery in July 2007 for soldiers deployed overseas. 2 56 The use of telehealth as an assessment and treatment platform has been advocated by the 57 American Orthopaedic Association. 41 58 Public demand for orthopedic telehealth services has surged with the arrival of the 59 COVID-19 pandemic in early 2020. To date, most telehealth-based studies are survey oriented or 60 involve physician and patient perceptions of care. There are no studies that compare the 61 diagnostic effectiveness (also known as overall diagnostic accuracy) of a telehealth examination 62 platform to a standard clinical examination of the shoulder. One study examined the accuracy of 63 self-administered hip examination for FAI, and actually showed the accuracy of telehealth-based 64 assessment was slightly higher. 32 Our objective is to compare the diagnostic effectiveness of a 65 simulated telehealth shoulder examination (STE) against a standard clinical examination (SCE) 66 for RCT, using magnetic resonance imaging (MRI) as a reference standard. Secondary objectives 67 included assessing agreement between test platforms and diagnostic validity of individualized 68 tests (e.g., sensitivity, specificity, likelihood ratios). We hypothesize that the STE would be non-69 inferior to a SCE in accurately diagnosing RCTs. 70 71 Study Design 74 J o u r n a l P r e -p r o o f The study is a case based, case control design. We used the Standards for Reporting 75 Diagnostic Accuracy Studies (STARD) reporting standards to guide this study. 3 We provided all 76 care in outpatient, orthopedic specialists' practices, by three orthopedic surgeons (JW, TL, JR). 77 Patients provided electronic informed consent and then underwent both the SCE and STE during 78 the same visit but by two different providers to avoid confirmation bias. We randomized the 79 order of which this examination was performed. This study involved two sets of index tests for two assessment platforms (SCE and STE). 93 We selected the SCE tests if they were commonly used tests and measures from the literature and 94 practice, and if they achieved reasonable diagnostic accuracy in summated study. 16 We designed 95 tests to identify all types of RCTs (i.e., supraspinatus, infraspinatus, subscapularis) and selected 96 tests for either having high sensitivity or specificity, or when available, both. 97 The senior author, an orthopedic surgeon with 10 years of experience, and a physical 98 therapist who is PhD trained and has specialized in diagnostic accuracy research for 15 years 99 created the STE tests. The goal of both clinicians was to identify tests that reflected a clinical 100 examination; each SCE test had an analogous "sister" STE test they created to reflect its purpose 101 in clinical practice. We included tests that detected other shoulder pathologies in order to provide 102 the standard of care to patients for a shoulder examination. The goal was also to create tests that 103 were transferable to any telehealth setting with a video feed. A description of these testing 104 procedures is available in Table 1 . senior resident and fellows participated in the telehealth version of the examination (45% by 4 120 fellows, 32% by 2 PGY5 residents and 23% by 3 PGY4 residents). These were the residents and 121 fellows on service with the attending. Both devices had video capability such that the patient and 122 examiner could see, hear, and observe each other. For the STE, the senior author developed a 123 script to ask standardized questions regarding the quality of the shoulder pain in order to 124 minimize differences between trainees. The resident then directed the patient through a series of 125 self-examination maneuvers, using the script provided. The examinations were meant to mimic 126 traditional clinic testing. 127 128 We collected and managed all study data, with the exception of the MRI images, using 130 REDCap electronic data capture tools hosted at Duke University. REDCap (Research Electronic 131 Data Capture; Vanderbilt University, Nashville, TN, USA) is a secure, web-based software 132 platform designed to support data capture for research studies, providing: 1) an interface for 133 validated data capture; 2) audit trails for tracking data manipulation and export procedures; 3) 134 export procedures for data downloads to statistical packages; and 4) procedures for data 135 integration and interoperability with external sources. 12; 13 136 137 We compared the STE of shoulder pain using patient self-examination to SCE using MRI 139 as the gold standard 16 to determine the accuracy of detecting RCTs. Fifty (81%) patients had 140 shoulder imaging obtained using Magnetic Resonance Imagining with a dedicated shoulder coil. When determining sample size of a non-inferiority trial, one aims is to show that a new 154 testing platform is not unacceptably worse than an older one (SCE). 34 To do so requires the 155 selection of a non-inferiority margin and calculate the confidence window around the difference 156 between the treatments; then determine the acceptability of difference one is willing to accept. 157 We estimated sample size based on intention to treat analyses, and these forms of analyses 158 typically lead to non-inferiority between groups. 34 We assumed a 20% difference in overall 159 diagnostic effectiveness between groups as unacceptable, and at 95% power, our projected use of 160 (perfect lack of agreement) to +1.0 (perfect agreement). It is possible for the statistic to be 177 negative, which suggests the agreement is worse than random. Although arbitrary, Landis were considered for enrollment ( Figure 1 ). Of the 98, 34 patients declined to participate in the 205 study. Of the remaining 62, two were unable to tolerate an enclosed MRI or had a 206 contraindication to MRI that was discovered by the MRI technician. As previously stated, we 207 included 62 patients were included in this manuscript for analysis of agreement (secondary 208 objective) and 50 (81%) for diagnostic effectiveness analysis (primary objective). 209 The mean age was 57.9 years (+/-11.2) and 31 (51.7%) of these patients were women. 210 The 50 patients who received an MRI exhibited a similar demographic distribution; 52% women 211 with a mean age of 58.2 years. The final diagnosis, per official radiology read, indicated 22% of 212 patients had a full thickness supraspinatus tear, whereas 62% of patients had partial tearing of 213 one of their rotator cuff tendons. All of the full thickness supraspinatus tears were accompanied 214 J o u r n a l P r e -p r o o f Accuracy of telehealth versus clinical examination by other complete or partial tears, most commonly of the infraspinatus tendon. There were no 215 isolated full thickness tears of subscapularis nor infraspinatus (Figure 2) . 216 This study endeavored to compare the diagnostic effectiveness (overall accuracy) of a 242 simulated telehealth examination to a standard clinical examination. In this case-based study, we 243 identified commonly used clinical tests and created tests that would complement the clinical tests 244 but could be used in a simulated telehealth setting. Further, 50 of the 62 subjects received an 245 MRI as a reference standard, in which the rater was blinded to the clinical findings of the patient. 246 Our goal was to determine if the overall diagnostic effectiveness, which reflects whether true 247 positives and true negatives were better in one group or another. We think our findings are 248 promising and timely, especially during a global pandemic, when virtual appointments will 249 become increasingly important. Several areas we feel are worth discussing. 250 251 Agreement 252 There was fair to good agreement between the SCE and STE. Night pain, as expected, 253 had almost perfect agreement. This test was a question to the patient, and it is interesting that the 254 answer changed in the minutes between SCE and STE for a few patients. Some patients gave a 255 conditional response, such as "I have pain at night, but it doesn't keep me awake" or their answer 256 changed while they were talking. Intuitively, tests that required minimal intervention by an 257 examiner also had higher agreement. For example, the painful arc test, shoulder shrug with 258 active abduction and active internal rotation limitation all had a moderate amount of agreement. We found that neither the SCE nor STE were accurate to identify a RCT. The tests were 265 usually either sensitive or specific, and in some cases, the tests were neither. While the accuracy 266 between the SCE and STE was low, these findings are consistent with past meta-analytic 267 literature. 15; 16 Particularly, this is well represented in studies with smaller sample sizes that did 268 not differentiate tendon tear type (we did not) or which included a case-based designed as we did 269 in our study. Worth noting, emerging data suggests that differentiating tendon type may also lack 270 accuracy. 35; 36 Combining tests or examination findings may improve accuracy for identification 271 of shoulder pathology, although this was not the purpose of our study, nor did we have the power 272 to combine tests to look at conditions. 14; This study had a relatively small sample size and although the results are promising, 288 larger studies will be needed for validation. Our study was also limited by the only being able to 289 complete 50 (81%) of 62 MRIs prior to restrictions on research being placed due to COVID-19. 290 We calculated our power analysis using 20% difference in overall diagnostic effectiveness. 291 While this may seem high, diagnostic tests are typically either sensitive or specific, with the 292 majority of tests ranging between 50-70% accuracy; thus a 20% difference in plausible That said, 293 we are comfortable in reporting that there are no differences among groups. If we used the 294 current data, the same statistical analyses, and differences between groups, our post-hoc power 295 analysis suggests we would need over 60,000 subjects in each group to show a statistically 296 significant difference. 297 While we describe the locations of the specific tendon tears, we did not analyze overall 298 accuracy for full thickness tears by individual tendon or other shoulder pathology. Emerging 299 data questions the utility of this with clinical tests. 35; 36 Out of concern for asymptomatic cuff 300 tears, we have intentionally only included full thickness tears and not partial thickness tears in 301 our analysis of accuracy for detecting cuff tears. We felt that new shoulder pain, in patients over 302 the age of 40 with full thickness rotator cuff tears would likely be symptomatic, possibly with 303 other pathology as well. Future research into this data set will address alternative diagnoses. 4; 9; 304 16; 35; 39 We performed pooled accuracy, rather than dividing specific tests by tear type as it is 305 well known that clinical tests do not effectively distinguish tears of the rotator cuff across all 306 tendon types. Indeed, cross over positive findings are very common with some diagnostic 307 clinical tests being used for multiple conditions. 10 For example, a drop arm test will be positive 308 for a supraspinatus tendon injury, but it's also positive for infraspinatus problems as well as 309 impingement. We felt that, although we were unable to complete all 62 MRI studies, the utility of this 324 study revealing noninferiority of a telehealth examination was important in the wake of the 325 current global pandemic. Future studies should expand the sample size, consider analyzing tests 326 specifically for each RCT lesion, and consider accuracy for other shoulder pathology, and 327 determine cost effectiveness and management consequences examination using STE. In addition 328 assessing patient satisfaction is important as telehealth continues to increase in use. It will be also 329 J o u r n a l P r e -p r o o f important to note the impacts of misdiagnosis and malpractice as the telehealth use increases. 8 330 We also feel that the data gleaned from this study will be useful in future studies of clinical 331 Part 1: simple definition and calculation of 365 accuracy, sensitivity and specificity Early analysis of 368 the United States Army's telemedicine orthopaedic consultation program 371 an updated list of essential items for reporting diagnostic accuracy studies Limited diagnostic accuracy of 375 magnetic resonance imaging and clinical tests for detecting partial-thickness tears of the 376 rotator cuff Quality of care for remote 380 orthopaedic consultations using telemedicine: a randomised controlled trial. BMC 381 health services research The effectiveness of diagnostic tests for the 384 assessment of shoulder pain due to soft tissue disorders: a systematic review Create a blocked randomisation list Reported Cases of Medical Malpractice in Direct-to-Consumer 392 Physical tests for shoulder 399 impingements and local lesions of bursa, tendon or labrum that may accompany 400 impingement Clinical effectiveness and cost 404 analysis of patient referral by videoconferencing in orthopaedics The REDCap 408 consortium: Building an international community of software platform partners Research electronic data 412 capture (REDCap)-a metadata-driven methodology and workflow process for providing 413 translational research informatics support Combining orthopedic special tests to 417 improve diagnosis of shoulder pathology Which physical 425 examination tests provide clinicians with the most value when examining the shoulder? 426 Update of a systematic review with meta-analysis of individual tests Accuracy of office-430 based ultrasonography of the shoulder for the diagnosis of rotator cuff tears The Complexities of Physician Supply and Demand 2017 Update: Projections from 434 2015 to 2030 Virtual Outreach 436 Project Group Virtual outreach: economic evaluation of joint teleconsultations for 437 patients referred by their general practitioner for a specialist opinion Users' Guides 441 to the Medical Literature: III. How to Use an Article About a Diagnostic Test B. What Are 442 the Results and Will They Help Me in Caring for My Patients? The role of 449 telehealth as a platform for postoperative visits following rotator cuff repair: a 450 prospective, randomized controlled trial Socioeconomic impact of e-health 457 services in major joint replacement: a scoping review The measurement of observer agreement for categorical data Telehealth as a means of health care delivery for physical therapist 464 practice Magnetic 467 resonance imaging, magnetic resonance arthrography and ultrasonography for assessing 468 rotator cuff tears in people with shoulder pain for whom surgery is being considered Access to primary health care among persons 472 with disabilities in rural areas: a summary of the literature A comparison of elbow range of motion 479 measurements: smartphone-based digital photography versus goniometric 480 measurements Suppl 1: M3: risk factors, 484 pathobiomechanics and physical examination of rotator cuff tears. The open 485 orthopaedics journal Concurrent 488 validity of a patient self-administered examination and a clinical examination for 489 femoroacetabular impingement syndrome Patient perspectives on primary health care in rural communities: 493 effects of geography on access Through the looking glass: understanding non-inferiority Diagnostic value of clinical tests for supraspinatus 500 tendon tears Measures of diagnostic accuracy: basic definitions Terwee 514 CB et al. The diagnostic value of the combination of patient characteristics, history, and 515 clinical shoulder tests for the diagnosis of rotator cuff tear Global, regional, 519 and national incidence, prevalence, and years lived with disability for 328 diseases and 520 injuries for 195 countries The opportunity awaits to lead 524 orthopaedic telehealth innovation: AOA critical issues An electronic clinic for 528 arthroplasty follow-up: a pilot study J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f