key: cord-0044198-95hxlqre
authors: Han, Changhee; Rundo, Leonardo; Murao, Kohei; Nemoto, Takafumi; Nakayama, Hideki
title: Bridging the Gap Between AI and Healthcare Sides: Towards Developing Clinically Relevant AI-Powered Diagnosis Systems
date: 2020-05-06
journal: Artificial Intelligence Applications and Innovations
DOI: 10.1007/978-3-030-49186-4_27
sha: 5aaf7fd83f006d0da4f0515139fbc7e0c2a09029
doc_id: 44198
cord_uid: 95hxlqre

Despite the success of Convolutional Neural Network-based Computer-Aided Diagnosis research, its clinical applications remain challenging. Accordingly, developing medical Artificial Intelligence (AI) fitting into a clinical environment requires identifying/bridging the gap between AI and Healthcare sides. Since the biggest problem in Medical Imaging lies in data paucity, confirming the clinical relevance for diagnosis of research-proven image augmentation techniques is essential. Therefore, we hold a clinically valuable AI-envisioning workshop among Japanese Medical Imaging experts, physicians, and generalists in Healthcare/Informatics. Then, a questionnaire survey for physicians evaluates our pathology-aware Generative Adversarial Network (GAN)-based image augmentation projects in terms of Data Augmentation and physician training. The workshop reveals the intrinsic gap between AI/Healthcare sides and solutions on Why (i.e., clinical significance/interpretation) and How (i.e., data acquisition, commercial deployment, and safety/feeling safe). This analysis confirms our pathology-aware GANs’ clinical relevance as a clinical decision support system and non-expert physician training tool. Our findings would play a key role in connecting inter-disciplinary research and clinical applications, not limited to the Japanese medical context and pathology-aware GANs.

Convolutional Neural Networks (CNNs) have enabled accurate/reliable Computer-Aided Diagnosis (CAD), occasionally outperforming expert physicians [1] [2] [3] . However, such research results cannot be easily transferred to a clinical environment. Artificial Intelligence (AI) and Healthcare sides have a huge gap around technology, funding, and people [4] . In Japan, the biggest challenge lies in medical data sharing because each hospital has different ethical codes and tends to enclose collected data without annotating them for AI research. This differs from the US, where National Cancer Institute provides annotated medical images [5] .

Therefore, a Research Center for Medical Big Data was launched in November 2017: collaborating with 6 Japanese medical societies and 6 institutes of informatics, we collected large-scale annotated medical images for CAD research. Using over 60 million available images, we achieved prominent research results, presented at major Computer Vision [6] and Medical Imaging conferences [7] . Moreover, we published 6 papers [8] [9] [10] [11] [12] [13] on Generative Adversarial Network (GAN)-based medical image augmentation [14] . Since the GANs can generate realistic samples with desired pathological features via many-to-many mappings, they could mitigate the medical data paucity via Data Augmentation (DA) and physician training.

Aiming to further identify/bridge the gap between AI and Healthcare sides in Japan towards developing medical AI fitting into a clinical environment in five years, we hold a workshop for 7 Japanese people with various AI and/or Healthcare background. Moreover, to confirm the clinical relevance for diagnosis of the pathology-aware GAN methods, we conduct a questionnaire survey for 9 Japanese physicians who interpret Computed Tomography (CT) and Magnetic Resonance (MR) images in daily practice. Figure 1 outlines our investigation.

Our main contributions are as follows:

-AI and Healthcare Workshop: We firstly hold a clinically valuable AI-envisioning workshop among Medical Imaging experts, physicians, and Healthcare/Informatics generalists to bridge the gap between AI/Healthcare sides. -Questionnaire Survey for Physicians: We firstly present both qualitative/quantitative questionnaire evaluation results for many physicians about research-proven medical AI. -Information Conversion: Clinical relevance discussions imply that our pathology-aware GAN-based interpolation and extrapolation could overcome medical data paucity via DA and physician training.

In terms of interpolation, GAN-based medical image augmentation is reliable because acquisition modalites (e.g., X-ray, CT, MR) can display the human body's strong anatomical consistency at fixed position while clearly reflecting inter-subject variability [15, 16] . This is different from natural images, where various objects can appear at any position; accordingly, to tackle large inter-subject/pathology/modality variability, we proposed to use noise-to-image GANs (e.g., random noise samples to diverse pathological images) for (i ) medical DA and (ii ) physician training [8] . While the noise-to-image GAN training is much more difficult than training image-to-image GANs [17] (e.g., a benign image to a malignant one), it can increase image diversity for further performance boost. Regarding the DA, the GAN-generated images can improve CAD based on supervised learning [18] [19] [20] . For the physician training, the GANs can display novel desired pathological images and help train medical trainees despite infrastructural/legal constraints [21] . However, we have to devise effective loss functions and training schemes for such applications. Diversity matters more for the DA to sufficiently fill the real image distribution, whereas realism matters more for the physician training not to confuse the trainees.

So, how can we perform GAN-based DA/physician training with only limited annotated images? Always in collaboration with physicians, for improving 2D classification, we combined the noise-to-image and image-to-image GANs [9, 10] .

Nevertheless, further DA applications require pathology localization for detection and advanced physician training needs the generation of images with abnormalities, respectively. To meet both clinical demands, we proposed 2D/3D bounding box-based GANs conditioned on pathology position/size/appearance. Indeed, the bounding box-based detection requires much less physicians' annotation effort than segmentation [22] .

In terms of extrapolation, the pathology-aware GANs are promising because common and/or desired medical priors can play a key role in the conditioningtheoretically, infinite conditioning instances, external to the training data, exist and enforcing such constraints have an extrapolation effect via model reduction [23] . For improving 2D detection, we proposed Conditional Progressive Growing of GANs that incorporates rough bounding box conditions incrementally into a noise-to-image GAN (i.e., Progressive Growing of GANs [24] ) to place realistic/diverse brain metastases at desired positions/sizes on 256 × 256 MR images [11] . Since the human body is 3D, for improving 3D detection, we proposed 3D Multi-Conditional GAN that translates noise boxes into realistic/diverse 32 × 32 × 32 lung nodules [25] placed at desired position/size/attenuation on CT scans [12] . Interestingly, inputting the noise box with the surrounding tissues has the effect of combining the noise-to-image and image-to-image GANs.

We succeeded to (i ) generate images even realistic for physicians and (ii ) improve detection using synthetic training images, respectively; they require different loss functions and training schemes. However, to exploit our pathologyaware GANs as a (i ) non-expert physician training tool and (ii ) clinical decision support system, we need to confirm their clinical relevance for diagnosis-such information conversion [26] techniques to overcome the data paucity, not limited to our pathology-aware GANs, would become a clinical breakthrough.

-Subjects: 2 Medical Imaging experts (i.e., a Medical Imaging researcher and a medical AI startup entrepreneur), 2 physicians (i.e., a radiologist and a psychiatrist), and 3 Healthcare/Informatics generalists (i.e., a nurse and researcher in medical information standardization, a general practitioner and researcher in medical communication, and a medical technology manufacturer's owner and researcher in health disparities). -Experiments: As its program shows (Table 1) , during the workshop, we conduct 2 activities: (Learning) Know the overview of Medical Image Analysis, including state-of-the-art research, well-known challenges/solutions, and the summary of our pathology-aware GAN projects; (Thinking) Find the intrinsic gap and its solutions between AI researchers and Healthcare workers after sharing their common and different thinking/working styles. This workshop was held on March 17th, 2019 at Nakayama Future Factory, Open Studio, The University of Tokyo, Tokyo, Japan.

-Subjects: 3 physicians (i.e., a radiologist, a psychiatrist, and a physiatrist) committed to (at least one of) our pathology-aware GAN projects and 6 project non-related radiologists without much AI background. This paper's authors are surely not included. -Experiments: Physicians are asked to answer the following questionnaire within 2 weeks from December 6th, 2019 after reading 10 summary slides written in Japanese about general Medical Image Analysis and our pathologyaware GAN projects along with example synthesized images. We conduct both qualitative (i.e., free comments) and quantitative (i.e., five-point Likert scale [27] ) evaluation: Likert scale 1 = very negative, 2 = negative, 3 = neutral, 4 = positive, 5 = very positive.

-Question 1: Are you keen to exploit medical AI in general when it achieves accurate and reliable performance in the near future? (five-point Likert scale) Please tell us your expectations, wishes, and worries (free comments). 

We show the clinically-relevant findings from this Japanese workshop. Gap

Between AI and Healthcare Sides Gap 1: AI, including Deep Learning, provides unclear decision criteria, does it make physicians reluctant to use it for diagnosis in a clinical environment?

-Healthcare side: We rather expect applications other than diagnosis. If we use AI for diagnosis, instead of replacing physicians, we suppose a reliable second opinion, such as alert to avoid misdiagnosis, based on various clinical data not limited to images-every single diagnostician is anxious about their diagnosis. AI only provides minimum explanation, such as a heatmap showing attention, which makes persuading not only the physicians but also patients difficult. Thus, the physicians' intervention is essential for intuitive explanation. Methodological safety and feeling safe are different. In this sense, pursuing explainable AI generally decreases AI's diagnostic accuracy [28] , so physicians should still serve as mediators by engaging in high-level conversation or interaction with patients. Moreover, according to the medical law in most countries including Japan, only doctors can make the final decision. The first autonomous AI-based diagnosis without a physician was cleared by the Food and Drug Administration in 2018 [29] , but such a case is exceptional. -AI side: Compared with other systems or physicians, Deep Learning's explanation is not particularly poor, so we require too severe standards for AI; the word AI is excessively promoting anxiety and perfection. If we could thoroughly verify the reliability of its diagnosis against physicians by exploring uncertainty measures [30] , such intuitive explanation would be optional.

Are there any benefits to actually introducing medical AI?

-Healthcare side: After all, even if AI can achieve high accuracy and convenient operation, hospitals would not introduce it without any commercial benefits. Moreover, small clinics, where physicians are desperately needed, often do not have CT or MR scanners [31] . -AI side: The commercial deployment of medical AI is strongly tied to diagnostic accuracy; so, if it can achieve significantly outstanding accuracy at various tasks in the near future, patients would not visit hospitals/clinics without AI. Accordingly, introducing medical AI would become profitable in five years.

-Healthcare side: To evaluate AI's diagnostic performance, we should consider many metrics, such as sensitivity and specificity. Moreover, its generalization ability for medical data highly relies on inter-scanner/inter-individual variability [32] . How can we evaluate whether it is suitable as a clinically applicable system? -AI side: Generally, alleviating the risk of overlooking the diagnosis is the most important, so sensitivity matters more than specificity unless their balance is highly disturbed. Recently, such research on medical AI that is robust to different datasets is active [33] .

Why: Clinical significance/interpretation -Challenges: We need to clarify which clinical situations actually require AI introduction. Moreover, AI's early diagnosis might not be always beneficial for patients. -Solutions: Due to nearly endless disease types and frequent misdiagnosis coming from physicians' fatigue, we should use it as alert to avoid misdiagnosis [34] (e.g., reliable second opinion), instead of replacing physicians. It should help prevent oversight in diagnostic tests not only with CT and MR, but also with blood data, chest X-ray, and mammography before taking CT and MR [35] . It could be also applied to segmentation for radiation therapy [36] , neurosurgery navigation [37] , and pressure ulcers' echo evaluation. Along with improving the diagnosis, it would also make the physicians' workflow easier, such as by denoising [38] . Patients should decide whether they accept AI-based diagnosis under informed consent.

How: Data acquisition -Challenges: Ethical screening in Japan is exceptionally strict, so acquiring and sharing large-scale medical data/annotation are challenging-it also applies to Europe due to General Data Protection Regulation [39] . Considering the speed of technological advances in AI, adopting it for medical devices is difficult in Japan, unlike in medical AI-ready countries, such as the US, where the ethical screening is relatively loose in return for the responsibility of monitoring system stability. Moreover, whenever diagnostic criteria changes, we need further reviews and software modifications. For example, the Tumor-lymph Node-Metastasis (TNM) classification [40] criteria changed for oropharyngeal cancer in 2018 and for lung cancer in 2017, respectively. Diagnostic equipment/target changes also require large-scale data/annotation acquisition again. -Solutions: For Japan to keep pace, the ethical screening should be adequate to the other leading countries. Currently, overseas research and clinical trials are proceeding much faster, so it seems better to collaborate with overseas companies than to do it in Japan alone. Moreover, complete medical checkup, which is extremely costly, is unique in East Asia, thus Japan could be superior in individuals' multiple medical data-Japan is the only country, where most workers aged 40 or over are required to have medical checkups once a year regardless of their health conditions by the Industrial Safety and Health Act [41] . To handle changes in diagnostic criteria/equipment and overcome dataset/task dependency, it is necessary to establish a common database creation workflow [42] by regularly entering electronic medical records into the database. For reducing data acquisition/annotation cost, AI techniques, such as GAN-based DA [12] and domain adaptation [43] , would be effective.

How: Commercial deployment -Challenges: Hospitals currently do not have commercial benefits to actually introduce medical AI. -Solutions: For example, it would be possible to build AI-powered hospitals [44] operated with less staff. Medical manufacturers could also standardize data format [45] , such as for X-ray, and provide some AI services. Many IT giants like Google are now working on medical AI to collect massive biomedical datasets [46] , so they could help rural areas and developing countries, where physician shortage is severe [31] , at relatively low cost.

How: Safety and feeling safe -Challenges: Considering multiple metrics, such as sensitivity and specificity [47] , and dataset/task dependency [48] , accuracy could be unreliable, so ensuring safety is challenging. Moreover, reassuring physicians and patients is important to actually use AI in a clinical environment [49] . -Solutions: We should integrate various clinical data, such as blood test biomarkers and multiomics, with images [35] . Moreover, developing biasrobust technology is important since confounding factors are inevitable [50] .

To prevent oversight, prioritizing sensitivity over specificity is essential while maintaining a balance [51] . We should also devise education for medical AI users, such as result interpretation, to reassure patients [52] .

We show the questions and Japanese physicians' response summaries. Concerning the following Questions 1, 2, 3, Fig. 2 visually summarizes the expectation scores on medical AI (i.e., general medical AI, GANs for DA, and GANs for physician training) from both 3 project-related physicians and 6 project nonrelated radiologists.

Are you keen to exploit medical AI in general when it achieves accurate and reliable performance in the near future?

-Response summary: As expected, the project-related physicians are AIenthusiastic while the project non-related radiologists are also generally very positive about the medical AI. Many of them appeal the necessity of AI-based diagnosis for more reliable diagnosis because of the lack of physicians. Meanwhile, other physicians worry about its cost and reliability. We may be able to persuade them by showing expected profitability (e.g., currently CT scanners have an earning rate 16% and CT scans require 2-20 min for interpretation in Japan). Similarly, we can explain how experts annotate medical images and AI diagnoses disease based on them (e.g., multiple physicians, not a single one, can annotate the images via discussion).

Bar chart of the expectations on medical AI, expressed by five-point Likert scale scores, from 9 Japanese physicians: 3 project-related physicians and 6 project non-related radiologists, respectively. The vertical rectangles and error bars denote the average scores with 95% confidence intervals.

What do you think about using GAN-generated images for DA?

-Response summary: As expected, the project-related physicians are very positive about the GAN-based DA while the project non-related radiologists are also positive. Many of them are satisfied with its achieved accuracy/sensitivity improvement when available annotated images is limited. However, similarly to their opinions on general Medical Image Analysis, some physicians question its reliability.

What do you think about using GAN-generated images for physician training?

-Response summary: We generally receive neutral feedback because we do not provide a concrete physician training tool, but instead general pathologyaware generation ideas with example synthesized images-thus, some physicians are positive, and some are not. A physician provides a key idea about a pathology-coverage rate for medical student/expert physician training, respectively. For extensive physician training by GAN-generated atypical images, along with pathology-aware GAN-based extrapolation, further GANbased extrapolation would be valuable.

Any comments/suggestions about our projects towards developing clinically relevant AI-powered systems based on your experience?

-Response summary: Most physicians look excited about our pathologyaware GAN-based image augmentation projects and express their clinically relevant requests. The next steps lie in performing further GAN-based extrapolation, developing reliable and clinician-friendly systems with new practice guidelines, and overcoming legal/financial constraints.

Our first clinically valuable AI-envisioning workshop between people with various AI and/or Healthcare background reveals the intrinsic gap between both sides and its preliminary solutions. Regarding clinical significance/interpretation, medical AI could play a key role in supporting physicians with diagnosis, therapy, and surgery. For data acquisition, countries should utilize their unique medical environment, such as complete medical checkup for Japan. Commercial deployment could come as AI-powered hospitals and medical manufacturers' AI service.

To assure safety and feeling safe, we should integrate various clinical data and devise education for medical AI users. We believe that such solutions on Why and How would play a crucial role in connecting inter-disciplinary research and clinical applications. Through a questionnaire survey for physicians, we confirm our pathologyaware GANs' clinical relevance for diagnosis as a clinical decision support system and non-expert physician training tool: many physicians admit the urgent necessity of general AI-based diagnosis while welcoming our GAN-based DA to handle the lack of medical images. Thus, GAN-powered physician training is promising only under careful tool designing.

We find that better DA and expert physician training require further generation of images with abnormalities. Therefore, for better GAN-based extrapolation, we plan to conduct (i ) generation by parts with coordinate conditions [53] , (ii ) generation with both image and radiogenomic conditions [54] , and (iii ) transfer learning among different body parts and disease types. Whereas this paper only explores the Japanese medical context and pathology-aware GANs, our findings are more generally applicable and can provide insights into the clinical practice in other countries.

Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs

Deep neural networks improve radiologists' performance in breast cancer screening

International evaluation of an AI system for breast cancer screening

A road map for translational research on artificial intelligence in medical imaging: from the 2018 National Institutes of Health/RSNA/ACR/The Academy Workshop

The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository

Adaptive weighting multifield-of-view CNN for semantic segmentation in pathology

Gastric cancer detection from endoscopic images using synthesis by GAN

GAN-based synthetic brain MR image generation

Infinite brain MR images: PGGAN-based data augmentation for tumor detection

Combining noise-to-image and image-toimage GANs: brain MR image augmentation for tumor detection

Learning more with less: conditional PGGAN-based data augmentation for brain metastases detection using highlyrough annotation on MR images

Synthesizing diverse lung nodules wherever massively: 3D multi-conditional GAN-based CT image augmentation for object detection

Learning more with less: GAN-based medical image augmentation

Generative adversarial nets

Computed Tomography: Principles, Design, Artifacts, and Recent Advances. SPIE

Magnetic Resonance Imaging: Physical Principles and Sequence Design

CycleGAN for style transfer in X-ray angiography

GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification

Chest X-ray generation and data augmentation for cardiovascular abnormality classification

Generative adversarial networks as an advanced data augmentation technique for MRI data

Towards generative adversarial networks as a new paradigm for radiology education

GTVcut for neuro-radiosurgery treatment planning: an MRI brain cancer seeded image segmentation method based on a cellular automata model

Enforcing constraints for interpolation and extrapolation in generative adversarial networks

Progressive growing of GANs for improved quality, stability, and variation

Lung nodule classification using deep local-global networks

Multi-aspect mining of complex sensor sequences

Likert scales and data analyses

Peeking inside the black-box: a survey on Explainable Artificial Intelligence (XAI)

Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices

Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation

Commentary-radiology in India: the next decade

The healthy brain network serial scanning initiative: a resource for evaluating inter-individual differences and their reliabilities across scan conditions and sessions

USE-Net: incorporating squeeze-andexcitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets

Relevance of deep learning to facilitate the diagnosis of HER2 status in breast cancer

Medical data stream distribution pattern association rule mining algorithm based on density estimation

A generative model for segmentation of tumor and organs-at-risk for radiation therapy planning of glioblastoma patients

Machine learning as a potential solution for shift during stereotactic brain surgery

Low-dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss

The effect of the general data protection regulation on medical research

TNM Classification of Malignant Tumours, 7th edn

Evaluation of the distribution and factors affecting blood pressure using medical checkup data in Japan

Visual charting method for creating electronic medical documents

Transfer learning for domain adaptation in MRI: application in brain lesion segmentation

Feasibility study for implementation of the AI-powered Internet+ Primary Care Model (AiPCM) across hospitals and clinics in Gongcheng county

Hearing device manufacturers call for interoperability and standardization of internet and audiology

Google Health and the NHS: overcoming the trust deficit

Diagnostic accuracy and measurement sensitivity of digital models for orthodontic purposes: a systematic review

Medical knowledge constrained semantic breast ultrasound image segmentation

The rise of artificial intelligence and the uncertain future for physicians

Fully convolutional network ensembles for white matter hyperintensities segmentation in MR images

Addressing class imbalance problem in medical diagnosis: a genetic algorithm approach

Reimagining medical education in the age of AI

COCO-GAN: generation by parts via conditional coordinating

Correlation via synthesis: end-to-end nodule image generation and radiogenomic map learning based on generative adversarial network