key: cord-0852407-emy66t2u
authors: Park, Seong Ho
title: Artificial intelligence for ultrasonography: unique opportunities and challenges
date: 2020-11-03
journal: Ultrasonography
DOI: 10.14366/usg.20078
sha: 59aab93fbc9aeb5e17077899e234f19ea1c55acc
doc_id: 852407
cord_uid: emy66t2u

nan

Ultrasonography 40(1), January 2021 e-ultrasonography.org in the acquisition of imaging data. These factors could exacerbate the limited generalizability of current AI systems built with deep learning [16] . The finally obtained ultrasonographic images are determined by how the examiner captures them. Thus, the results of AI depend on how the target structure is represented and defined by the examiner in the captured image [17] and, furthermore, by whether the target object is correctly identified and captured at all, unless an entire 3-dimensional volume scan is used, such as those obtained using automated breast ultrasound systems. For the same reason, considerable discrepancies may exist between the dataset collected to train an AI algorithm and the imaging data generated in real-world practice to be fed into the AI system. Therefore, even for a highly sophisticated AI system to work correctly, some degree of competency of the human examiner, at least sufficing to scan the patient properly, still matters [17] . Moreover, standardization of scanning and image acquisition, depending on the diagnostic task, would be critical for the successful application of AI to ultrasonography, which requires human expertise. In some sense, the successful application of AI to ultrasonography creates an impetus for standardizing and ensuring the quality of examinations performed by humans. Second, the more widespread use of ultrasonography in clinical practice and its relatively easy accessibility require extra caution when interpreting the results of AI used with ultrasonography. The results given by AI, which capitalizes on the associations between input features and outcome states, are probabilistic. Therefore, unlike the results provided by tests based on cause-effect relationships, the results of AI algorithms should generally not be regarded as fixed results. A positive result from a test that finds a clear causal determinant to make the diagnosis can be accepted as a fixed result regardless of other factors. An illustrative example is the reverse transcription polymerase chain reaction (RT-PCR) test for severe acute respiratory syndrome coronavirus 2. A positive RT-PCR test result is an immutable proof of the presence of the virus, as this test finds the RNA of the virus, as long as extraordinary cases of residual RNA being detected in convalescent patients are excluded. In contrast, the interpretation of AI results is affected substantially by the pretest probability and the relevant spectrum of disease manifestation [18] . An AI algorithm typically applies a threshold to a probability-like internal raw algorithm output to generate the final categorical result shown to the user (e.g., cancer vs. benign) or may present the raw output in the form of probability (e.g., a 65% probability of cancer). Both the accuracy of the probability scale and the optimal threshold are profoundly affected by the pretest probability and disease manifestation spectrum, which are, in turn, determined by the baseline characteristics of the patient and the clinical setting.

It is critical for AI users to understand that the same AI result could be correct for one patient but not for another, right in one hospital but not in another hospital, and so on, depending on patients' baseline characteristics and the clinical setting. The limited generalizability of AI algorithms for medical diagnosis and prediction (i.e., the substantial variability in AI accuracy across patients and hospitals) is a well-known phenomenon, often described as "overfitting" in a broad sense [2, [18] [19] [20] [21] [22] [23] . This problem is primarily due to epidemiological factors, as mentioned above (pretest probability and disease manifestation spectrum), or, more simply, a disparity between training data and real-world data, rather than technical/mathematical overfitting [2, [18] [19] [20] . This pitfall may be especially pronounced for AI algorithms for ultrasonography, as ultrasonography examinations are often used in a wide range of clinical settings and patients, and are performed by a diverse range of medical professionals with varying expertise. Ultrasonography systems are also more diverse, with more vendors and versions, than CT or MRI. While one might expect AI to be more helpful for less-experienced examiners, ironically, less-experienced examiners may be more likely to have difficulties in appraising AI results and more vulnerable to developing a complacent attitude of merely accepting the AI results without the necessary appraisal. Such complacency would ultimately compromise the accuracy of ultrasonography examinations. The fact that ultrasonography is typically performed and interpreted on the fly by a single practitioner may further increase the risk. Consequently, the human expertise of the examiner, including adequate knowledge and experience in ultrasonography examinations, sound clinical and epidemiological knowledge, and ideally some knowledge about AI as well, would be crucial for maximizing the benefits that AI may provide.

The issue of overfitting underscores the importance of an adequate external validation of an AI algorithm in various realworld clinical settings where it is intended to be used [16, 18, [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] . For all the reasons explained above, perhaps, the importance of sufficient external validation should be even more strongly emphasized for AI applications to ultrasonography. A recent systematic review of studies that evaluated AI algorithms for the diagnostic analysis of medical imaging found that only 6% of such studies published in peer-review journals performed some form of external validation (whether they were otherwise methodologically adequate or not) [35] . Future research on AI for ultrasonography should emphasize the external validation of developed algorithms, in addition to the development of novel algorithms. Rigorous external validation helps to clarify the boundaries of when an AI algorithm maintains its anticipated accuracy and when it fails, and can thus help assure the users of conditions where the AI system can be used safely and effectively. Furthermore, establishing a mechanism to deliver such information to the end-users of AI more effectively and explicitly would also be an important next step [36] .

Third, the operator-dependency of ultrasonography makes prospective research studies to validate AI even more essential. The effect of a computerized decision support system such as AI depends on not only its technical analytic capability, but also on how the computerized results are presented to and acted upon by human practitioners. Considering the expected operator-dependency and variability in generating the ultrasonography image data and in acting upon AI results in on-the-fly decision-making during realtime examinations, there could be meaningful differences between an analysis of retrospectively collected images and natural clinical practice. Studies on AI for ultrasonography have so far mostly been retrospective. More prospective studies that involve actual interactions between human examiners and AI systems should be performed.

AI research in healthcare is accelerating rapidly, with numerous potential applications being demonstrated. However, there are currently limited examples of such techniques being successfully deployed in clinical practice [1, 16] . The introduction of AI into medicine is just beginning, and there remain multitudes of challenges to overcome, including difficulties in obtaining sufficiently large, curated, high-quality, representative datasets, deficiencies in robust clinical validation, and technical limitations such as the "black box" nature of AI algorithms, to name just a few [1, 16, 37] . These challenges are all relevant to AI for ultrasonography. This article highlighted a few additional points that are unique to AI as applied to ultrasonography and need to be addressed for the successful development and clinical implementation of AI for ultrasonography. In summary, the nature of how ultrasonography examinations are performed and utilized demands extra attention to the following issues regarding AI for ultrasonography. It is crucial to maintain the human expertise of examiners, in terms of both ultrasonography itself and the related clinical and epidemiological knowledge. Standardization of scanning and image acquisition, depending on the diagnostic tasks that AI is used to perform, is also critical. The importance of sufficient external validation of AI algorithms is especially significant for AI used with ultrasonography. Prospective research studies that involve actual interactions between human examiners and AI systems, rather than analyses of retrospectively collected images, should also be conducted.

ORCID: Seong Ho Park: https://orcid.org/0000-0002-1257-8315

High-performance medicine: the convergence of human and artificial intelligence

Convolutional neural networks for radiologic images: a radiologist's guide

Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning

Deep learning in medical imaging: general overview

Application of machine learning and deep learning to thyroid imaging: where do we stand?

Artificial intelligence in musculoskeletal ultrasound imaging

Artificial intelligence in breast ultrasound

Current status of deep learning applications in abdominal ultrasonography

Clinical implementation of deep learning in thoracic radiology: potential applications and challenges

Technology trends and applications of deep learning in ultrasonography: image quality enhancement, diagnostic support, and improving workflow efficiency

Effect of a deep learning framework-based computer-aided diagnosis system on the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasonography

Application of computeraided diagnosis to the sonographic evaluation of cervical lymph nodes

Computer-aided diagnosis of thyroid nodules via ultrasonography: initial clinical experience

Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: preliminary results

Breast cancer classification in automated breast ultrasound using multiview convolutional neural network with transfer learning

Key challenges for delivering clinical impact with artificial intelligence

Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators

Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction

Ethical challenges regarding artificial intelligence in medicine from the perspective of scientific editing and peer review

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study

Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network

Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study

Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images From multiethnic populations with diabetes

Robustness and explainability of artificial intelligence: from technical to policy solutions

Transforming global health with AI

Understanding artificial intelligence based radiology studies: what is overfitting?

Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers-from the Radiology Editorial Board

Regulation of predictive analytics in medicine

Framing the challenges of artificial intelligence in medicine

Predictive analytics in health care: how can we know it works?

Canadian Association of Radiologists white paper on artificial intelligence in radiology

Principles for evaluating the clinical implementation of novel digital healthcare devices

Advancing the beneficial use of machine learning in health care and medicine: toward a community understanding

Evaluating artificial intelligence applications in clinical settings

Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers

Presenting machine learning model information to clinical end users with model facts labels

Preparing medical imaging data for machine learning

No potential conflict of interest relevant to this article was reported.