key: cord-0852407-emy66t2u authors: Park, Seong Ho title: Artificial intelligence for ultrasonography: unique opportunities and challenges date: 2020-11-03 journal: Ultrasonography DOI: 10.14366/usg.20078 sha: 59aab93fbc9aeb5e17077899e234f19ea1c55acc doc_id: 852407 cord_uid: emy66t2u nan Ultrasonography 40(1), January 2021 e-ultrasonography.org in the acquisition of imaging data. These factors could exacerbate the limited generalizability of current AI systems built with deep learning [16] . The finally obtained ultrasonographic images are determined by how the examiner captures them. Thus, the results of AI depend on how the target structure is represented and defined by the examiner in the captured image [17] and, furthermore, by whether the target object is correctly identified and captured at all, unless an entire 3-dimensional volume scan is used, such as those obtained using automated breast ultrasound systems. For the same reason, considerable discrepancies may exist between the dataset collected to train an AI algorithm and the imaging data generated in real-world practice to be fed into the AI system. Therefore, even for a highly sophisticated AI system to work correctly, some degree of competency of the human examiner, at least sufficing to scan the patient properly, still matters [17] . Moreover, standardization of scanning and image acquisition, depending on the diagnostic task, would be critical for the successful application of AI to ultrasonography, which requires human expertise. In some sense, the successful application of AI to ultrasonography creates an impetus for standardizing and ensuring the quality of examinations performed by humans. Second, the more widespread use of ultrasonography in clinical practice and its relatively easy accessibility require extra caution when interpreting the results of AI used with ultrasonography. The results given by AI, which capitalizes on the associations between input features and outcome states, are probabilistic. Therefore, unlike the results provided by tests based on cause-effect relationships, the results of AI algorithms should generally not be regarded as fixed results. A positive result from a test that finds a clear causal determinant to make the diagnosis can be accepted as a fixed result regardless of other factors. An illustrative example is the reverse transcription polymerase chain reaction (RT-PCR) test for severe acute respiratory syndrome coronavirus 2. A positive RT-PCR test result is an immutable proof of the presence of the virus, as this test finds the RNA of the virus, as long as extraordinary cases of residual RNA being detected in convalescent patients are excluded. In contrast, the interpretation of AI results is affected substantially by the pretest probability and the relevant spectrum of disease manifestation [18] . An AI algorithm typically applies a threshold to a probability-like internal raw algorithm output to generate the final categorical result shown to the user (e.g., cancer vs. benign) or may present the raw output in the form of probability (e.g., a 65% probability of cancer). Both the accuracy of the probability scale and the optimal threshold are profoundly affected by the pretest probability and disease manifestation spectrum, which are, in turn, determined by the baseline characteristics of the patient and the clinical setting. It is critical for AI users to understand that the same AI result could be correct for one patient but not for another, right in one hospital but not in another hospital, and so on, depending on patients' baseline characteristics and the clinical setting. The limited generalizability of AI algorithms for medical diagnosis and prediction (i.e., the substantial variability in AI accuracy across patients and hospitals) is a well-known phenomenon, often described as "overfitting" in a broad sense [2, [18] [19] [20] [21] [22] [23] . This problem is primarily due to epidemiological factors, as mentioned above (pretest probability and disease manifestation spectrum), or, more simply, a disparity between training data and real-world data, rather than technical/mathematical overfitting [2, [18] [19] [20] . This pitfall may be especially pronounced for AI algorithms for ultrasonography, as ultrasonography examinations are often used in a wide range of clinical settings and patients, and are performed by a diverse range of medical professionals with varying expertise. Ultrasonography systems are also more diverse, with more vendors and versions, than CT or MRI. While one might expect AI to be more helpful for less-experienced examiners, ironically, less-experienced examiners may be more likely to have difficulties in appraising AI results and more vulnerable to developing a complacent attitude of merely accepting the AI results without the necessary appraisal. Such complacency would ultimately compromise the accuracy of ultrasonography examinations. The fact that ultrasonography is typically performed and interpreted on the fly by a single practitioner may further increase the risk. Consequently, the human expertise of the examiner, including adequate knowledge and experience in ultrasonography examinations, sound clinical and epidemiological knowledge, and ideally some knowledge about AI as well, would be crucial for maximizing the benefits that AI may provide. The issue of overfitting underscores the importance of an adequate external validation of an AI algorithm in various realworld clinical settings where it is intended to be used [16, 18, [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] . For all the reasons explained above, perhaps, the importance of sufficient external validation should be even more strongly emphasized for AI applications to ultrasonography. A recent systematic review of studies that evaluated AI algorithms for the diagnostic analysis of medical imaging found that only 6% of such studies published in peer-review journals performed some form of external validation (whether they were otherwise methodologically adequate or not) [35] . Future research on AI for ultrasonography should emphasize the external validation of developed algorithms, in addition to the development of novel algorithms. Rigorous external validation helps to clarify the boundaries of when an AI algorithm maintains its anticipated accuracy and when it fails, and can thus help assure the users of conditions where the AI system can be used safely and effectively. Furthermore, establishing a mechanism to deliver such information to the end-users of AI more effectively and explicitly would also be an important next step [36] . Third, the operator-dependency of ultrasonography makes prospective research studies to validate AI even more essential. The effect of a computerized decision support system such as AI depends on not only its technical analytic capability, but also on how the computerized results are presented to and acted upon by human practitioners. Considering the expected operator-dependency and variability in generating the ultrasonography image data and in acting upon AI results in on-the-fly decision-making during realtime examinations, there could be meaningful differences between an analysis of retrospectively collected images and natural clinical practice. Studies on AI for ultrasonography have so far mostly been retrospective. More prospective studies that involve actual interactions between human examiners and AI systems should be performed. AI research in healthcare is accelerating rapidly, with numerous potential applications being demonstrated. However, there are currently limited examples of such techniques being successfully deployed in clinical practice [1, 16] . The introduction of AI into medicine is just beginning, and there remain multitudes of challenges to overcome, including difficulties in obtaining sufficiently large, curated, high-quality, representative datasets, deficiencies in robust clinical validation, and technical limitations such as the "black box" nature of AI algorithms, to name just a few [1, 16, 37] . These challenges are all relevant to AI for ultrasonography. This article highlighted a few additional points that are unique to AI as applied to ultrasonography and need to be addressed for the successful development and clinical implementation of AI for ultrasonography. In summary, the nature of how ultrasonography examinations are performed and utilized demands extra attention to the following issues regarding AI for ultrasonography. It is crucial to maintain the human expertise of examiners, in terms of both ultrasonography itself and the related clinical and epidemiological knowledge. Standardization of scanning and image acquisition, depending on the diagnostic tasks that AI is used to perform, is also critical. The importance of sufficient external validation of AI algorithms is especially significant for AI used with ultrasonography. Prospective research studies that involve actual interactions between human examiners and AI systems, rather than analyses of retrospectively collected images, should also be conducted. ORCID: Seong Ho Park: https://orcid.org/0000-0002-1257-8315 High-performance medicine: the convergence of human and artificial intelligence Convolutional neural networks for radiologic images: a radiologist's guide Basics of deep learning: a radiologist's guide to understanding published radiology articles on deep learning Deep learning in medical imaging: general overview Application of machine learning and deep learning to thyroid imaging: where do we stand? Artificial intelligence in musculoskeletal ultrasound imaging Artificial intelligence in breast ultrasound Current status of deep learning applications in abdominal ultrasonography Clinical implementation of deep learning in thoracic radiology: potential applications and challenges Technology trends and applications of deep learning in ultrasonography: image quality enhancement, diagnostic support, and improving workflow efficiency Effect of a deep learning framework-based computer-aided diagnosis system on the diagnostic performance of radiologists in differentiating between malignant and benign masses on breast ultrasonography Application of computeraided diagnosis to the sonographic evaluation of cervical lymph nodes Computer-aided diagnosis of thyroid nodules via ultrasonography: initial clinical experience Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: preliminary results Breast cancer classification in automated breast ultrasound using multiview convolutional neural network with transfer learning Key challenges for delivering clinical impact with artificial intelligence Computer-aided diagnosis system for thyroid nodules on ultrasonography: diagnostic performance and reproducibility based on the experience level of operators Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction Ethical challenges regarding artificial intelligence in medicine from the perspective of scientific editing and peer review Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study Deep learning with ultrasonography: automated classification of liver fibrosis using a deep convolutional neural network Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images From multiethnic populations with diabetes Robustness and explainability of artificial intelligence: from technical to policy solutions Transforming global health with AI Understanding artificial intelligence based radiology studies: what is overfitting? Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers-from the Radiology Editorial Board Regulation of predictive analytics in medicine Framing the challenges of artificial intelligence in medicine Predictive analytics in health care: how can we know it works? Canadian Association of Radiologists white paper on artificial intelligence in radiology Principles for evaluating the clinical implementation of novel digital healthcare devices Advancing the beneficial use of machine learning in health care and medicine: toward a community understanding Evaluating artificial intelligence applications in clinical settings Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers Presenting machine learning model information to clinical end users with model facts labels Preparing medical imaging data for machine learning No potential conflict of interest relevant to this article was reported.