key: cord-0847152-tmocmvse
authors: Chorba, John S.; Shapiro, Avi M.; Le, Le; Maidens, John; Prince, John; Pham, Steve; Kanzawa, Mia M.; Barbosa, Daniel N.; Currie, Caroline; Brooks, Catherine; White, Brent E.; Huskin, Anna; Paek, Jason; Geocaris, Jack; Elnathan, Dinatu; Ronquillo, Ria; Kim, Roy; Alam, Zenith H.; Mahadevan, Vaikom S.; Fuller, Sophie G.; Stalker, Grant W.; Bravo, Sara A.; Jean, Dina; Lee, John J.; Gjergjindreaj, Medeona; Mihos, Christos G.; Forman, Steven T.; Venkatraman, Subramaniam; McCarthy, Patrick M.; Thomas, James D.
title: Deep Learning Algorithm for Automated Cardiac Murmur Detection via a Digital Stethoscope Platform
date: 2021-04-26
journal: J Am Heart Assoc
DOI: 10.1161/jaha.120.019905
sha: 69c94f00c60dad68f0f887d5e4fc90522c12630a
doc_id: 847152
cord_uid: tmocmvse

BACKGROUND: Clinicians vary markedly in their ability to detect murmurs during cardiac auscultation and identify the underlying pathological features. Deep learning approaches have shown promise in medicine by transforming collected data into clinically significant information. The objective of this research is to assess the performance of a deep learning algorithm to detect murmurs and clinically significant valvular heart disease using recordings from a commercial digital stethoscope platform. METHODS AND RESULTS: Using >34 hours of previously acquired and annotated heart sound recordings, we trained a deep neural network to detect murmurs. To test the algorithm, we enrolled 962 patients in a clinical study and collected recordings at the 4 primary auscultation locations. Ground truth was established using patient echocardiograms and annotations by 3 expert cardiologists. Algorithm performance for detecting murmurs has sensitivity and specificity of 76.3% and 91.4%, respectively. By omitting softer murmurs, those with grade 1 intensity, sensitivity increased to 90.0%. Application of the algorithm at the appropriate anatomic auscultation location detected moderate‐to‐severe or greater aortic stenosis, with sensitivity of 93.2% and specificity of 86.0%, and moderate‐to‐severe or greater mitral regurgitation, with sensitivity of 66.2% and specificity of 94.6%. CONCLUSIONS: The deep learning algorithm’s ability to detect murmurs and clinically significant aortic stenosis and mitral regurgitation is comparable to expert cardiologists based on the annotated subset of our database. The findings suggest that such algorithms would have utility as front‐line clinical support tools to aid clinicians in screening for cardiac murmurs caused by valvular heart disease. REGISTRATION: URL: https://clinicaltrials.gov; Unique Identifier: NCT03458806.

and correctly interprets the diagnostic sounds of the patient. Although nearly all providers can perform the act of auscultation with minimal training, interpretation of the heart sounds is difficult, even for specialists. Interrater reliability in detecting a classic finding, the "murmur," is fair at best (κ=0.3-0.48), 1,2 and the ability to identify the underlying pathological feature is even worse. 3 Moreover, these challenges are further exacerbated by a noisy and rushed environment, which is the norm in modern practice. Despite the paucity of data in this area, anecdotally, these conclusions ring true for a wide spectrum of medical providers. Because cardiac auscultation remains a cornerstone of the physical examination, diagnostic assistance of its interpretation could therefore be of great use.

Classic teaching of auscultation, and murmurs in particular, focuses on valvular heart disease (VHD). VHD is a major cause of mortality and reduced quality of life for tens of millions of patients worldwide. [4] [5] [6] [7] [8] As life expectancies increase, so does the prevalence of VHD in elderly patients. Annual VHD fatalities have increased 2.8% each year in the United States since 1979 and are projected to double over the next 25 years. 5, 9 VHD can also manifest with a prolonged asymptomatic period, which can be dangerous if not identified. For example, patients with asymptomatic severe aortic stenosis (AS) who do not undergo aortic valve replacement have an annual rate of sudden death of 3% to 13%. 10 Echocardiography remains the gold standard for diagnosis of VHD, given its minimal physical risk and excellent test characteristics. 11 Yet, echocardiography requires both highly trained sonographers to acquire the data and cardiologists to interpret the images. Accordingly, echocardiography is expensive, with a total annual cost of $1.2 billion for Medicare enrollees alone. 12 In addition, echocardiography requires a preexisting suspicion from the referring provider and may not be locally available for patients in medically underserved areas. Because VHD is associated with textbook auscultatory findings, 13 cardiac auscultation can be a fast, familiar, and inexpensive tool to improve access to VHD screening, facilitate earlier detection of VHD, and reduce the need for echocardiography. We therefore investigated the use of an electronic stethoscope platform to develop a deep learning algorithm to identify cardiac murmurs.

Deep learning approaches have shown great promise in medicine, using radiologic studies 14 and echocardiograms 15 to develop interpretative algorithms, and can even translate auxiliary data, unintended to be part of the original data set, into useful information. 16 Stethoscope sound analysis has recently led to applications in lung 17 and heart 18 sound classification. For example, an independently developed algorithm, focused on the binary distinction between pathologic and normal heart sounds, has tested favorably in a pediatric cohort. 19 The 2016 PhysioNet/Computing in Cardiology Challenge inspired a wide range of solutions by curating the largest public data set of normal and abnormal heart sounds. 20 However, deep learning approaches often require a large amount of ground truth labeled data. Most prior research, and the top performing submissions for this challenge, used traditional machine learning or shallow neural networks requiring hand-engineered features and manual tuning that made assumptions about temporal and spectral characteristics of heart sounds. 21, 22 These assumptions potentially limit their generalizability to widespread clinical use. A central contribution of the current work is an algorithm that learns the important features directly from the raw audio instead of them being prescribed. Because cardiologists are traditionally trained to identify VHD by auscultation, we hypothesized that a deep learning approach could perform similarly, if not better, than these specialty providers and assist in the diagnosis of VHD.

Additional supporting data are available on request from the corresponding authors. Programming code CLINICAL PERSPECTIVE What Is New?

• A deep learning algorithm applied to digital heart sounds detects murmurs with similar accuracy to expert cardiologists. • Applying the algorithm to heart sounds captured at the appropriate anatomic location identifies severe forms of aortic stenosis or mitral regurgitation.

What Are the Clinical Implications?

• Our results suggest that a murmur detection algorithm used with a digital stethoscope can serve as a clinical decision-support tool for the diagnosis of murmurs and valvular heart disease. • By offloading the burden of "auscultation interpretation" from the provider, the algorithm could improve the utility of the auscultation in screening for severe forms of valvular heart disease.

AS aortic stenosis MR mitral regurgitation VHD valvular heart disease related to data processing and not subject to intellectual property or confidentiality obligations will be made available on request. All requests for raw and analyzed and related materials data will be reviewed by the corresponding authors and the Eko legal department to verify whether the request is subject to intellectual property or confidentiality obligations. Any data and materials that can be shared will be released via a Material Transfer Agreement. Patient-related data not included in the article were generated as part of a prospective clinical study (NCT03458806) and may be subject to patient confidentiality and institutional review board review.

Eko's heart murmur detection algorithm has been approved for US Food and Drug Administration 510(k) clearance 23 and is integrated with the Eko digital stethoscope and ECG software platform to assess heart sound recordings. The algorithm was trained on recordings from a Health Insurance Portability and Accountability Act (HIPPA)-compliant collection of 400 000 audio recordings from Eko CORE and DUO electronic stethoscopes. The training set consisted of 5878 deidentified audio recordings, totaling >34 hours from 5318 unique patients. Recordings were initially randomly selected from the first 60 000 collected in the cloud-based Eko database and then subselected to ensure sufficient murmur examples to train the model. The training data quality and patient population are thus representative of what we expect in actual clinical use. A fraction of the database was set aside for internal testing and tuning hyperparameters of the model. The validation set recordings used to measure classification performance of the trained algorithm are entirely separate and collected specifically for this purpose.

To complete the training set for a supervised learning problem, audio recordings and phonocardiograms were reviewed and labeled by one physician as 1 of 3 classes: heart murmur, no heart murmur, or inadequate signal. Recordings of lung sounds, noise, and human speech were examples of data labeled inadequate signal. The neural network model used for phonocardiogram classification uses a ResNet 24 deep convolutional neural network architecture ( Figure S1 ). Before being sent to the model, input recordings are filtered using an eighth-order Butterworth high-pass filter at 30 Hz and downsampled to 2000 Hz. The model consists of 34 layers, each made of a 1-dimensional convolution, rectified linear unit nonlinearity, batch normalization, dropout for regularization, and maximum pooling. Layers are linked by residual connections to facilitate training by allowing gradients to propagate. The final output of the network consists of a fully connected layer followed with 3 outputs subsequently normalized to a probability distribution via a softmax function. The network was initialized with random weights and optimized using the ADAM optimizer. 25 The end-to-end algorithm makes a sequence of binary decisions to produce 1 of 3 possible outputs. First, it determines whether the recording is of sufficient signal quality for murmur classification, using the output from the neural network corresponding to "inadequate signal" as a measure of signal quality. If the signal quality is found to be below a prespecified threshold, then the recording is classified as "inadequate signal." Otherwise, the classifier then provides either a "heart murmur" or a "no heart murmur" output based on another set threshold. All model parameters and thresholds are fixed at training time.

To validate the end-to-end algorithm, separate expert clinicians annotated a test subset of 1774 recordings from 373 patients collected through the multisite clinical study. For the binary signal quality screening step, the algorithm output was compared with the annotations of signal quality qualitatively. For the final murmur detection step, algorithm output was compared with annotations of murmur presence using the measures of sensitivity and specificity.

To show that the algorithm detects murmurs associated with clinically significant VHD, such as AS and mitral regurgitation (MR), we compared murmur predictions for the clinical study participants with echocardiographic assessment of VHD. For AS, we compared recordings at either the aortic or the pulmonic positions, with aortic recordings preferred. For MR, we compared recordings at the mitral position only. A single recording was used for each subject, with all recordings required to have an algorithm output of "murmur" or "no murmur" (ie, adequate signal quality). For greater consistency, CORE recordings were preferred over DUO recordings because the former were more numerous. Recordings used for annotator metrics naturally required annotation, and for AS, this at times resulted in a pulmonic recording being assessed by the annotators while an aortic recording was assessed by the algorithm.

We undertook a cross-sectional, multisite study of subjects presenting to the echocardiography laboratories and structural heart disease clinics at the Northwestern Memorial Hospital (Chicago, IL), University of California San Francisco Medical Center (San Francisco, CA), Los Alamitos Cardiology Clinic (Los Alamitos, CA), and Mount Sinai Medical Center (Miami, FL) to obtain paired electronic stethoscope recordings with clinical echocardiography results. Inclusion criteria included age >18 years, a complete (ie, not limited) echocardiogram, and provision of informed consent. Because of the lower prevalence of severe VHD compared with normal hearts in subjects presenting to the echocardiography laboratories, we also prescreened potential subjects by chart review to preferentially enroll expected cases. The primary outcome measures were defined as the ability of the algorithm to differentiate either clinically significant AS or clinically significant MR from normal hearts, reported through a receiver operating characteristic (ROC) curve. The protocol was approved as a minimal risk study by each of the institutional review boards of the participating sites and registered on clinicaltrials.gov (NCT03458806). All patients gave written informed consent.

For our validation set, we assumed an algorithm sensitivity and specificity of 0.9 for both the detection of clinically significant AS and MR, which were unknown at the time the study began. We estimated that a sample size of 110 subjects in each group (AS cases, MR cases, or structurally normal controls) would exceed a minimum threshold likelihood ratio of 5 with 95% confidence. 26 Because final echocardiography results were not known before auscultation, and enrollment of controls would likely exceed that of cases, we estimated that auscultation of 900 subjects would be required.

Recordings of the phonocardiogram were performed by trained study personnel in a standardized manner in each subject's clinic or laboratory examination room at the study site. Each subject underwent 15-second recordings, while seated, at the 4 standard auscultation positions (aortic: second intercostal space, right sternal border; pulmonic: second intercostal space, left sternal border; tricuspid: fifth intercostal space, left sternal border; and mitral: fifth intercostal space, midclavicular line). A second attempt at recording was encouraged if real-time auscultation quality at a given position was poor. Subjects remained in the study database even if recordings were not obtained from all 4 positions. These recordings were obtained with the standard, clinically available Eko mobile application wirelessly connected first to the Eko CORE Stethoscope, then to the Eko DUO, which also records a single-lead ECG. Recorded phonocardiogram and ECG data were saved as 16-bit, 4000-and 500-Hz sampled WAV files, respectively, and were synced in real-time to a Health Insurance Portability and Accountability Act-compliant cloud storage location and sent to the algorithms for analysis. Auscultatory recordings were reviewed by the study investigators for quality control. At the time of recording, study personnel performing auscultation were unaware of the final echocardiography reports.

Using a custom-made web platform, expert annotators listened to heart sound recordings from a subset of the overall clinical study with headphones while viewing a plot of the phonocardiogram, and while blinded to the results of the algorithm and echocardiogram. Expert annotators were cardiologists having completed fellowship training in cardiology and having at least 10 years of clinical cardiology practice, and each received modest financial compensation. Annotations were performed on existing recordings while the phonocardiogram database was actively expanding, but because not all annotators were available after the entire database was collected, only a subset of the final database underwent complete annotation. Annotators assessed signal quality (on a 1-5 scale with defined rubric), murmur presence (true or false), and murmur grade (Levine scale 1-6 27 ). Because murmur grade was determined by recording only, murmur grades 4 through 6 were not used. To establish a single set of ground truth labels for a recording, we aggregated the responses of the 3 cardiologists. For murmur detection, we used a majority vote. For signal quality and murmur grade, which were encouraged but not required for annotation, we used the median of the responses if there were 3 and the lessor if there were only 2, which occurred for a small number of subjects.

Comprehensive transthoracic echocardiograms, including 2-dimensional, M-mode, and color and spectral Doppler imaging, were obtained as part of routine clinical care. Clinical echocardiograms and their reports followed American Society of Echocardiography guidelines. 28, 29 Reports therefore graded VHD as none, mild, moderate, severe, or critical, with additional borderline categories (eg, moderate to severe) also allowed. Cardiologists reading the echocardiograms were unaware of study participation and thus blinded to any auscultatory results. Because the reports directed the clinical care of the patients, we considered them as the "gold standard" for our study. The reports were deidentified, and a single report was associated with each subject at the study site. Most echocardiograms and phonocardiograms were captured on the same day, although echocardiograms within 1 month of recording were permitted, which occurred only for a small number of subjects presenting through the structural heart disease clinics. We defined "clinically important" or "significant" VHD cases as those graded moderate to severe or worse, for this level of disease would typically require an evaluation by a cardiologist for possible procedural intervention. We allowed mixed VHD as cases provided that the disease severity at any other valve did not exceed that of the index valve. We defined controls as subjects free of valvular, structural, or congenital heart disease, with no valvular regurgitation or stenosis beyond trivial or physiologic severity.

Data analysis and visualization were performed in Python using the standard packages numpy, pandas, seaborn, matplotlib, keras, and tensorflow. CIs were computed by bootstrap rather than approximations, which require assumptions about data distributions. To compare means, the Welch t-test was used. To compare proportions, such as sensitivity, on different data samples, the "N-1" χ 2 test was used. To assess interrater reliability, Fleiss' κ was used.

Of the 962 subjects, 954 had sufficient phonocardiographic information for inclusion in the final analysis (Table 1 and Table S1 ). The patient population tended to be elderly, predominantly White, and nearly equally split in sex, consistent with cases of VHD seen mainly at academic medical centers. As expected, both AS and MR cases were older than their respective controls (P<0.0001 for each). Male sex was also more prevalent in the MR cases than in the MR controls (P=0.0055).

We first compared algorithm output on the test subset of 1774 recordings with their annotated ground truth. Of this subset, the algorithm signal quality filter excluded 226 recordings from analysis. These "inadequate signal" recordings also had low annotator signal quality scores (Figure 1) , showing that the algorithm does not prevent the analysis of potential murmurs when the recordings are clinically adequate. The remaining 1548 recordings, which constituted 87% of this test subset, received either a "murmur" or a "no murmur" output from the algorithm. Further murmur detection performance analysis of the algorithm was based on these 1548 recordings.

We then directly compared the algorithm's murmur prediction with annotator defined ground truth ( Table 2) . Algorithm performance had a sensitivity and specificity for detecting murmurs of 76.3% (95% CI, 72.9%-79.3%) and 91.4% (95% CI, 89.6%-93.1%), respectively, and a positive predictive value of 86.6% (95% CI, 84.0%-89.3%) using the murmur prevalence (42.2%) from this test subset. The positive and negative likelihood ratios were 8.89 (95% CI, 7.35-11.08) and 0.259 (95% CI, 0.225-0.297), respectively. Individual annotators showed modest interrater agreement (κ=0.478), consistent with prior studies 1 and mirroring what would be expected in clinical practice.

We then evaluated whether certain patient, examination, or device characteristics affected algorithm performance. We looked first at murmur intensity, because in certain clinical contexts, softer murmurs can be less likely to indicate meaningful disease. 2, 30, 31 When excluding murmurs of grade 1 intensity (annotator aggregated), sensitivity significantly increased to 90.0% (P<0.0001; Table 2 ).

Next, we evaluated whether algorithm performance differed on the basis of auscultatory position. Overall, these performances were similar, as evidenced by the overlapping CIs (Table 2) . Notably, recordings at the pulmonic position were more numerous in this subset because fewer recordings at that position were removed by the algorithm as "inadequate signal."

Because the recordings were made from 2 devices, the Eko CORE and the Eko DUO, we next evaluated whether algorithm performance differed on the basis of the specific device used. The sensitivity on CORE recordings was slightly higher than on those from DUO (P<0.05; Table 2 ), but was not statistically significant after controlling for signal quality (Table S2) , an example of the well-known Simpson paradox.

Allowing the algorithm's positive-negative decision boundary to vary, we then generated an ROC curve to illustrate the sensitivity and specificity tradeoffs ( Figure 2 ). The US Food and Drug Administrationcleared murmur detection algorithm, however, operates at a single point on this curve ( Figure 2 , orange circle), with performance described above. Stratification of the ROC curve based on the annotator-aggregated murmur grade shows the improved characteristics of the algorithm with higher-grade murmurs ( Figure 2 , green line).

We then measured algorithm performance as a screening tool for VHD by comparing murmur predictions at the appropriate anatomic locations with a different gold standard: the clinical echocardiogram. First, we considered AS. Of the 954 eligible patients, we grouped 81 with AS labeled moderate to severe or greater as cases and 185 without structural heart disease as controls ( Figure 3 ). As previously mentioned, this severity threshold for disease was chosen to include cases that would typically require further evaluation for possible mechanical intervention. We further removed 8 cases and 13 controls with "inadequate signal" classifications at both the aortic and pulmonic positions, giving a total of 73 cases and 172 controls (with To maximize our sensitivity for detection, aortic position recordings were used for algorithm performance when they had adequate signal (and an annotation for annotator performance), and if not, pulmonic recordings were considered. We defined a positive test for AS as a "murmur" detected from the recording at either the aortic or pulmonic position as described, and a negative test where no murmur was detected. For the detection of "clinically significant" AS, the algorithm operates with a sensitivity of 93.2% (95% CI, 86.9%-98.5%) and a specificity of 86.0% (95% CI, 80.9%-91.0%) ( Table 3) . Although the commercially available algorithm operates at a fixed point (Figure 4 , orange circle), the ROC curve illustrates the theoretical potential to tune these test characteristics to the appropriate clinical scenario. Overall, the murmur detection algorithm (Figure 4 ; area under the curve=0.952) compares favorably with the expert clinicians (Figure 4 , green, red, and purple circles), whose performances on the annotated subset of 122 subjects fell just under the algorithm's ROC curve.

We also screened for MR with the same murmur detection algorithm. Using our same overall patient cohort, and the same inclusion and exclusion criteria as for AS, except testing at the mitral location, we had 68 cases and 130 controls (with 29 cases and 62 controls annotated). At the mitral position, there were a greater number of "inadequate signal" recordings, which were statistically more common in controls than in cases (P=0.0184). For the detection of "clinically significant" MR, the algorithm operates with a sensitivity of 66.2% (95% CI, 54.7%-77.4%) and a specificity of The plot on the right shows that the recordings predicted as "inadequate signal" by the algorithm have low signal quality, as assessed by the cardiologist annotators.

predicted quality = adequate signal predicted quality = inadequate signal 94.6% (95% CI, 90.4%-98.4%) ( Table 3 ). The algorithm ( Figure 5 ; area under the curve=0.865) compares similarly to the annotators, whose performances on the annotated subset of 91 subjects fall either along or below the algorithm's ROC curve.

We also explored whether the signal quality classifier would bias our results by preferentially excluding lower-grade murmurs associated with cases of VHD. Importantly, of the few recordings anatomically corresponding to severe VHD but labeled as a grade 1 murmur by any annotator (9 in total: 4 for AS, and 5 for MR), the algorithm identified adequate signal and produced a correct "murmur" output for all. This suggests that the algorithm can still detect the softer murmurs indicative of clinically significant disease.

Our results suggest that the algorithm would be a useful decision support tool in detecting murmurs attributed to "clinically significant" VHD. To put this in perspective, in the elderly population, where the prevalence of surgically intervenable AS reaches 5%, 32 a negative test, carrying a negative likelihood ratio of 0.08, will nearly rule out the diagnosis, reducing its probability to <0.5%. Conversely, a positive test, with a positive likelihood ratio of 6.68, will increase disease probability to 26%. Because we compared cases with severe disease with disease-free controls in our study, a positive AS screening result in a separate population could also indicate nonsurgical AS (ie, AS of moderate severity or less). Even with this caveat, the test outcome is likely to influence clinical management in this common clinical scenario. Moreover, because the algorithm results are intended to be combined with a provider's clinical interpretation, the overall accuracy of a clinical VHD diagnosis may be even higher. 33 In addition, to our knowledge, our validation set represents the world's largest adult echocardiogram-paired heart sound recording database. Looking ahead, this database may facilitate the development of future algorithms to differentiate between innocent and pathologic murmurs, identify specific types of VHD, or correlate other cardiac pathological features to a patient's auscultatory signature.

Our study has several limitations when applying the results to the clinic. First, we did not evaluate algorithm Confusion matrix listed at top, with test characteristics stratified by annotated murmur grade, auscultation position, and auscultation device listed below. Under the heading of recordings, "murmur" indicates algorithm-identified murmurs, "total" indicates algorithm-analyzed recordings after removing inadequate signals, and "inadequate signal" indicates recordings labeled as inadequate signal by signal quality classifier. Test characteristics are computed after excluding inadequate signals from analysis. LR indicates likelihood ratio.

*represent inadequate signal recordings.

implementation in direct clinical care. The Eko platform can be integrated into a delivery system's electronic medical record, but 15-second recordings are often longer than typical clinical practice. Thus, the ultimate effect on the length of the clinical encounter, and whether this translates to higher efficiency, lower costs, or improved outcomes all remain unknown. We plan subsequent studies to evaluate the effects of this technology on care delivery, because this was not tested here. Second, our algorithm purposely does not interpret poor-quality heart sounds. An "inadequate signal" output does not rule out severe VHD. This does, however, mirror clinical practice, where examination findings, and auscultatory findings in particular, are often inconclusive. For any given patient, the test characteristics of the algorithm are best represented by excluding these nonevaluable results, because the provider sees the output of "inadequate signal," rather than a test outcome. However, this could overestimate the true test characteristics when applied on a population level. Although extreme, applying an "intention-to-diagnose" approach, which groups all nondiagnostic results as incorrect outcomes, can identify the potential limits of such bias. 34 When doing so, the test characteristics are unsurprisingly worse, with sensitivity and specificity of 84% and 80%, respectively, for AS, and 57% and 69%, respectively, for MR. This analysis is not truly representative of the test, because the algorithm is extremely unlikely to misclassify all nondiagnostic results, even if forced to make a decision. Nonetheless, this "intention-todiagnose" analysis underscores the need to identify the predictors of nondiagnostic auscultation, which we hope to clarify in future studies. Noise cancellation software, for example, may help to reduce the number of these nonevaluable results, although this hypothesis requires further testing.

Third, we effectively performed a case-control study, and therefore the test characteristics we report may be influenced by the spectrum effect. 35 The high prevalence of disease in our cohort is likely to enrich for murmurs when compared with a general screening population, where the test characteristics could be different. We also compared a severe form of disease with healthy, normal controls. As a result, the specificity we report is likely higher than in a general population, because a case of mild AS or MR may well have a murmur detected by the algorithm and be labeled as disease. These events would be "false positives" for disease requiring surgical intervention. However, they should not affect the sensitivity, because neither the number of true positives nor false negatives would change. Because we anticipate the algorithm to be used primarily for screening purposes, where sensitivity is paramount, we suspect this particular bias to be of minimal clinical consequence. Ultimately, the results of the algorithm should be placed in the appropriate clinical context. Although a false positive from "mild" VHD, for example, would generally prompt further diagnostic testing, obtaining an echocardiogram to confirm the diagnosis and initiate disease surveillance may not be appropriate in every clinical situation.

Last, our validation set consisted primarily of patients presenting through both tertiary care centers and a community cardiology clinic diagnosed with severe VHD via standard transthoracic echocardiography. Thus, subjects requiring other diagnostics to confirm disease severity, such as dobutamine stress echocardiography for low-flow, low-gradient AS, or 3-dimensional transesophageal echocardiography for MR, were not captured. Similarly, our algorithm was developed on a training set from subjects seen in actual clinical practice. Although these subjects may well represent the US population, they may not be reflective of developing countries, where the prevalence and cause of VHD are different. Because these populations would likely benefit from an accessible and low-cost decision-support tool, like the one tested herein, further investigations are warranted.

Our results also illustrate several physiologic principles within cardiovascular disease. Although our algorithm evaluates for murmurs, auscultatory findings are much richer than this. AS is an excellent example, as the intensity of the A2 component of the second heart sound and the timing of the peak of the systolic murmur are considered indicators of disease severity. 13 An extended algorithm inclusive of other predictors of VHD, beyond the presence of a murmur, may improve disease screening performance. In addition, both the cardiologists and the algorithm perform better in detecting AS than detecting MR. This may be attributed to AS having a more discernible auscultatory signature. Physiologically, MR produces a load-dependent murmur, with the severity of regurgitation dependent on minute-to-minute hemodynamics. Moreover, MR can be directional, and therefore may not manifest a murmur at a predefined auscultatory location. Additional recording positions or physiologic maneuvers might also improve disease screening performance. The algorithm tested herein addresses the need for an effective and accessible method to screen for murmurs and ultimately detect VHD. It is accurate and reliable, with comparable performance to that of an expert cardiologist, at least in the annotated subset of the overall data set we present herein. We anticipate that it would be particularly useful in hurried situations, such as rapid diagnosis in the emergency department or risk stratification for urgent noncardiac surgery, where minimizing the time to diagnostic test results, as well as the strain on providers, is particularly important. In this vein, we purposefully captured heart sounds in a real-world We defined valvular heart disease cases as those graded moderate to severe or worse to encompass all levels of disease that could require timely intervention beyond serial monitoring. We defined controls as subjects free of valvular, structural, or congenital heart disease, with no valvular regurgitation or stenosis beyond trivial or physiologic severity. Potential participants included all enrolled subjects (ie, those with recordings). Eligible participants included only those with the appropriate data for analysis. Aortic stenosis (AS) was assessed by a single recording at either the aortic (preferred) or the pulmonic position, and mitral regurgitation (MR) was assessed by a single recording at the mitral position. Actual cases and controls were further filtered from potential cases and controls by removing subjects with "inadequate signal" at the corresponding anatomic locations by the signal quality classifier. Numbers listed in italics represent the subset of annotated recordings.

Excluded clinical setting, rather than an artificial research environment, to enhance the generalizability of our findings. The considerable variability among highly trained clinicians we observed in our study also represents real-world practice, and the algorithm is well equipped to address this problem. Further potential benefits include enabling clinicians to detect VHD earlier and more consistently, and reducing morbidity and mortality because of earlier clinical intervention. 36 Because the algorithm operates at the point of care, requiring only cellular or Wi-Fi connectivity with the digital stethoscope and mobile platform, it could serve as an affordable alternative to traditional echocardiography, which remains limited by cost, time, and access. 37 Although handheld echocardiography can also fill this role, it requires more advanced training than the simple capture of heart sounds with a stethoscope. 38 To the extent the algorithm accurately excludes severe VHD, it could render some echocardiograms moot, particularly those ordered to search for VHD that reveal normal hearts. Assuming this indication, and the result, each constitutes 10% of echocardiograms, 39 with 5 million studies performed yearly in the United States at a cost of $1000 each, 40 this could translate to an annual cost savings of $28 million nationally, even when applying the lower specificities from the intention-to-diagnose analyses for AS and MR. Moreover, any potential savings would be specific to this technology, as these echocardiograms are appropriate without another reliable way to exclude the pathological feature in question. In light of the recent and ongoing COVID-19 pandemic, 41 this technology could also provide expert-level diagnostics through telemedicine, thereby limiting the transmission of a highly contagious disease. Furthermore, the digital stethoscope platform used herein could be extended to other auscultation findings, such as lung sounds. Overall, our study shows the promise of this tool as an adjunct to clinical care and illustrates the potential of it expanding into something even greater. 8 (9.9%) 6 (8.2%) 4 (10.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (5.1%) 3 (4.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 9 (0.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Stenosis Mild 30 (3.1%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Moderate 30 (3.1%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0. 7 (17.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 85 (8.8%) 2 (2.5%) 2 (2.7%) 1 (2.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 79 (100.0%) 68 (100.0%) 29 (100.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Stenosis Mild 21 (2.2%) 4 (4.9%) 3 (4.1%) 1 (2.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (1.3%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Moderate 10 (1.0%) 3 (3.7%) 2 (2.7%) 2 (5.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 2 (2.5%) 2 (2.9%) 1 (3.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 8 (0.8%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0. 6 (20.7%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Moderate 13 (1.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 2 (2.5%) 1 (1.5%) 1 (3.4%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 2 (0.2%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Stenosis Mild 2 (0.2%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Moderate 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0. 1 (1.2%) 1 (1.4%) 1 (2.5%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 4 (5.1%) 2 (2.9%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Stenosis Mild 2 (0.2%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Moderate 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) Severe 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 

Does this patient have an abnormal systolic murmur?

A study of physician variation in heart-sound interpretation

Cardiac auscultatory skills of internal medicine and family practice trainees: a comparison of diagnostic proficiency

Valvular heart disease: the next cardiac epidemic

Burden of valvular heart diseases: a population-based study

Aortic stenosis in the elderly: disease prevalence and number of candidates for transcatheter aortic valve replacement: a meta-analysis and modeling study

Incidence and characteristics of newly diagnosed rheumatic heart disease in urban African adults: insights from the Heart of Soweto Study

Is primary prevention of rheumatic fever the missing link in the control of rheumatic heart disease in Africa?

Lack of progress in valvular heart disease in the pre-transcatheter aortic valve replacement era: increasing deaths and minimal change in mortality rate over the past three decades

Risk stratification in asymptomatic severe aortic stenosis: a critical appraisal

AHA/ACC guideline for the management of patients with valvular heart disease

data points #20. Data Points Publication Series

Braunwald's Heart Disease: A Textbook of Cardiovascular Medicine. Philadelphia PA: Saunders

Current applications and future impact of machine learning in radiology

Deep echocardiography: data-efficient supervised and semi-supervised deep learning towards automated diagnosis of cardiac disease

Reconstructing faces from fMRI patterns using deep generative neural networks

Application of semi-supervised deep learning to lung sound analysis

Classification of normal/abnormal heart sound recordings: the Physionet/computing in cardiology challenge 2016

Artificial intelligence-assisted auscultation of heart murmurs: validation by virtual clinical trial

Recent advances in heart sound analysis

Algorithms for automatic analysis and classification of heart sounds-a systematic review

Ensemble of feature-based and deep learning-based classifiers for detection of abnormal heart sounds

Deep residual learning for image recognition

Adam: a method for stochastic optimization

Likelihood ratios with confidence: sample size estimation for diagnostic test studies

The systolic murmur

Recommendations on the echocardiographic assessment of aortic valve stenosis: a focused update from the European Association of Cardiovascular Imaging and the American Society of Echocardiography

Recommendations for noninvasive evaluation of native valvular regurgitation

ACC/AHA 2006 guidelines for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines

Echocardiography in the evaluation of systolic murmurs of unknown cause

The evolving epidemiology of valvular aortic stenosis: the Tromsø study

Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study

Use of 3×2 tables with an intention to diagnose approach to assess clinical performance of diagnostic tests: meta-analytical evaluation of coronary CT angiography studies

Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis

Competency in cardiac examination skills in medical students, trainees, physicians, and faculty

Tackling the rural access crisis: cardiologists will need an array of tools to meet patients' needs. Cardiovasc Bus

Handheld echocardiography: current state and future perspectives

Prospective evaluation of the clinical application of the American College of Cardiology Foundation/American Society of Echocardiography appropriateness criteria for transthoracic echocardiography

hospital use of echocardiography: insights from the nationwide inpatient sample

A novel coronavirus from patients with pneumonia in China

Tables S1-S2 Figure S1