key: cord-0217379-fwe6efw5
authors: Han, Changhee; Okamoto, Takayuki; Takeuchi, Koichi; Katsios, Dimitris; Grushnikov, Andrey; Kobayashi, Masaaki; Choppin, Antoine; Kurashina, Yutaka; Shimahara, Yuki
title: Tips and Tricks to Improve CNN-based Chest X-ray Diagnosis: A Survey
date: 2021-06-02
journal: nan
DOI: nan
sha: 92e8773499562f05e14385b7d1faf0f1894418c6
doc_id: 217379
cord_uid: fwe6efw5

Convolutional Neural Networks (CNNs) intrinsically requires large-scale data whereas Chest X-Ray (CXR) images tend to be data/annotation-scarce, leading to over-fitting. Therefore, based on our development experience and related work, this paper thoroughly introduces tricks to improve generalization in the CXR diagnosis: how to (i) leverage additional data, (ii) augment/distillate data, (iii) regularize training, and (iv) conduct efficient segmentation. As a development example based on such optimization techniques, we also feature LPIXEL's CNN-based CXR solution, EIRL Chest Nodule, which improved radiologists/non-radiologists' nodule detection sensitivity by 0.100/0.131, respectively, while maintaining specificity.

Since many findings on Chest X-Ray (CXR), the world's most performed medical imaging test [1] , are subtle or doubtful, CXR reading suffers from high inter-observer variability among even expert radiologists [2] . In this context, Convolutional Neural Networks (CNNs) have revolutionized CXR diagnosis (i.e., classification, regression, object detection, segmentation) [3] [4] [5] . However, the CNNs intrinsically requires ample data whereas the CXR images tend to be data/annotation-scarce, leading to over-fitting [6] . Therefore, as Fig.1 shows, this paper thoroughly introduces tricks to improve generalization in the CXR diagnosis: how to (i) leverage additional data, (ii) augment/distillate data, (iii) regularize training, and (iv) conduct efficient segmentation. A discussion on which specific CNN architectures to choose for Medical Imaging is out of scope in this paper, so refer to other surveys [7] [8] [9] .

As a development example, we also feature LPIXEL's CNN-based CXR solution, EIRL Chest Nodule, which partially applies the introduced optimization techniques to empower doctors for more rapid and reliable nodule diagnosis. With the EIRL Chest Nodule assistance, radiologists/non-radiologists improved sensitivity by 0.100/0.131, respectively, while maintaining specificity.

Since CNNs are data-hungry and obtaining large-scale CXR images is often unfeasible, it is essential to exploit publicly-available CXR images for pre-training, supervised learning, or semi-supervised learning. Currently, 4 large labeled open datasets exist: ChestXray-NIHCC [10] (~112,000 images); CheXpert [11] (~224,000 images); MIMIC-CXR [12] (~372,000 images); PadChest [13] (~160,000 images). For more detailed information on 20 public CXR datasets, refer to this review paper [14] . It should be noted that those datasets do not always provide definitive diagnosis information by Computed Tomography (CT) scans, which is associated with better prediction performance.

Most dominant pre-training methods are (i) transfer learning, which uses labeled natural/medical images to obtain good initial weights, and (ii) self-supervised learning, which uses unlabeled medical images for initialization by solving auxiliary tasks based on input samples [6] . Generally, transfer learning on medical images (e.g., public CXR datasets) is ideal since the learned representation is strongly associated with the target medical task. However, due to the difficulty of data collection/annotation and unavailability of pre-trained models, transfer learning on natural images (e.g., ImageNet [15] /COCO [16] -based transfer learning) and selfsupervised learning on medical images (e.g., Models Genesis [17] , MoCo-CXR [18] ) have been prevailing. Truncating final blocks of pre-trained models may significantly decrease parameters while statistically maintaining prediction performance on CXR images [19] .

Semi-supervised learning refers to training a model on both limited labeled/large-scale unlabeled (or pseudolabeled) medical images to cut (especially segmentation) annotation cost [20] . Usually, effective semi-supervised learning requires at least thousands of labeled CXR images.

Unsupervised anomaly detection can discover various unseen abnormalities (e.g., rare disease, bleeding) without specifying disease types, relying on large-scale unannotated healthy medical images. Towards this, Generative

AutoEncoders reconstruct medical images to detect outliers either in the learned feature space or from high reconstruction loss [21] .

Along with CXR images, concatenating patient data (e.g., age, gender, X-ray view position) to the (flattened) layer could improve prediction [22] . has shown promising performance, especially in medical image segmentation [25] . Conditional GAN-based DA also plays a big role in Medical Imaging, offering both interpolation/extrapolation effect [26, 27] .

In Medical Imaging, Test-time DA and model ensembling assures robust model prediction [28] , similar to dropout [29] , which provides robustness in training network parameters.

Training a model on both gray-scale/color images, respectively, and combining their results might also improve prediction [22] .

Reducing a parameter space to a suitable subspace via regularized image representation helps avoid over-fitting on CXR images: multi-scale patch-based prediction [30] and resizing to a smaller image size simply reduces a search space; Conditional GAN-based denoising removes prediction-irrelevant noises while preserving image structure and details [31] ; similarly, Conditional GAN-based bone suppression also increases the visibility of soft tissues by suppressing bones [32] ; lung field detection isolates a lung region (i.e., region of interest) [4] .

As training with regularization, multi-task learning performs multiple tasks (e.g., classification, object detection, segmentation) using a single learned representation. In Medical Imaging, it typically refers to training a segmentation model with auxiliary heads, each for an individual classification task; urging the model to represent a classification-relevant (i.e., diagnosis-relevant) feature space tends to improve segmentation [33] . Since the prevalence of positive cases significantly differs across diseases, we need to address the data imbalance; wide-spread solutions are (i) under-sampling normal class, (ii) oversampling rare class, (iii) Synthetic Minority Oversampling

Technique [34] , and (iv) weighted loss. In addition, combining multi-scale receptive fields helps capture diverse diseases varying in size [35] .

Since Clipping [37] ).

CXR reading suffers from high inter-observer variability.

Therefore, computational ground truth prediction by estimating the annotator confusion leads to robust annotation [38] . Moreover, cost-effective annotation requires (i) active learning [39] , which provides the annotators with samples to annotate that may improve generalization and (ii) interactive segmentation [40] , which supports the annotators by propagating their modifications through the whole segmentation mask.

Post-segmentation refinement removes false positives and produces a smoother CXR segmentation: Kuzin et al.

heuristically averaged cross-fold predictions using an optimized binarization threshold and a dilation technique [5] ;

Larrazabal et al. used a denoising autoencoder to obtain an anatomically plausible segmentation from the initial prediction [41] .

Japanese medical AI startup LPIXEL provides a variety of intelligent AI diagnostic solutions called EIRL Series to empower doctors for more rapid and reliable diagnosis.

Specifically, EIRL Chest Nodule reliably detects nodules (between 5-30 mm) on CXR images by partially applying the numerous optimization techniques as mentioned in Section 2.

Its version 1.8 adopts various DA methods (e.g., intensity/geometric augmentation) and post-processing methods (e.g., thresholding-based segmentation, isolated small area exclusion, lung field-based false positive reduction). 

Based on our development experience and related work, 

An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks

Pneumothorax segmentation with effective conditioned post-processing in chest X-ray

Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

J: Deep convolutional neural network in medical image processing

Medical image classification using deep learning

Medical image segmentation based on U-Net: A review

Chestx-ray8: Hospitalscale chest x-ray database and benchmarks on weaklysupervised classification and localization of common thorax diseases

Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison

MIMIC-CXR, a deidentified publicly available database of chest radiographs with free-text reports

Padchest: A large chest X-ray image dataset with multi-label annotated reports

Deep learning for chest X-ray analysis: A survey

ImageNet classification with deep convolutional neural networks, In: Proc. Advances in Neural Information Processing Systems (NIPS)

European Conference on Computer Vision (ECCV)

MoCo-CXR: MoCo pretraining improves representation and transferability of chest X-ray models

CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-ray interpretation

Semi-supervised multi-task learning with chest X-ray images

MADGAN: unsupervised Medical Anomaly Detection GAN using multiple adjacent brain MRI slice reconstruction

Hybrid deep learning for detecting lung diseases from X-ray images

Randaugment: Practical automated data augmentation with a reduced search space

Proc. International Conference on Learning Representations (ICLR)

Prostate cancer segmentation using manifold mixup U-Net

Synthesizing diverse lung nodules wherever massively: 3D multiconditional GAN-based CT image augmentation for object detection

Learning more with less: Conditional PGGAN-based data augmentation for brain metastases detection using highly-rough annotation on MR images

A two-stream mutual attention network for semi-supervised biomedical segmentation with noisy labels

Dropout: A simple way to prevent neural networks from overfitting

Accurate segmentation of lung fields on chest radiographs using deep convolutional networks

Image denoising with Conditional Generative Adversarial Networks (CGAN) in low dose chest images

Imageto-images translation for multi-task organ segmentation and bone suppression in chest X-ray radiography

Y-Net: joint segmentation and classification for diagnosis of breast biopsy images

SMOTE: Synthetic minority over-sampling technique

Camouflaged object detection

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Proc. International Conference on Machine Learning

Highperformance large-scale image recognition without normalization

Disentangling human error from the ground truth in segmentation of medical images

GOAL: Gistset Online Active Learning for efficient chest X-ray image annotation

Interactive segmentation of medical images through fully convolutional neural networks

Post-DAE: Anatomically plausible segmentation via postprocessing with denoising autoencoders