key: cord-0500706-qgxo1lcm authors: Tang, Sheyang; Hosseini, Mahdi S.; Chen, Lina; Varma, Sonal; Rowsell, Corwyn; Damaskinos, Savvas; Plataniotis, Konstantinos N.; Wang, Zhou title: Probeable DARTS with Application to Computational Pathology date: 2021-08-16 journal: nan DOI: nan sha: 000cad8dfc9c3aa602688f1baafbd3a6f2e504ec doc_id: 500706 cord_uid: qgxo1lcm AI technology has made remarkable achievements in computational pathology (CPath), especially with the help of deep neural networks. However, the network performance is highly related to architecture design, which commonly requires human experts with domain knowledge. In this paper, we combat this challenge with the recent advance in neural architecture search (NAS) to find an optimal network for CPath applications. In particular, we use differentiable architecture search (DARTS) for its efficiency. We first adopt a probing metric to show that the original DARTS lacks proper hyperparameter tuning on the CIFAR dataset, and how the generalization issue can be addressed using an adaptive optimization strategy. We then apply our searching framework on CPath applications by searching for the optimum network architecture on a histological tissue type dataset (ADP). Results show that the searched network outperforms state-of-the-art networks in terms of prediction accuracy and computation complexity. We further conduct extensive experiments to demonstrate the transferability of the searched network to new CPath applications, the robustness against downscaled inputs, as well as the reliability of predictions. Recent years have witnessed great advances in AI-based Computational Pathology (CPath) [22, 15] . The emerging AI techniques have shown their superiority in more accurate, efficient, and large-scale medical diagnoses [4] . In particular, Convolutional Neural Networks (CNNs) have been widely employed to extract meaningful information from medical images for various pathology applications, including disease diagnoses [5, 38] , medical image segmentation [27, 31] , etc. Yet designing the network architectures has * Equal contribution long been a manual process that requires adequate domain knowledge. As a result, it has become a common standard that architectures from CV applications (such as ResNet [8] and GoogLeNet [32] ) are transferred for technical developments in other fields, including CPath [29, 34] . The ultimate question is whether transferring architectures between the two domains is an efficient strategy. To answer this question, we first demonstrate how CV and CPath datasets are different. Here we compare the CIFAR [19] and ADP [11] datasets. Besides different data structures shown in Table. 1, the nature of images from both sides is also different, which makes CV datasets more complicated. First, the pixel resolution in CPath is fixed, corresponding to a fixed field of view (FOV) size. The root cause of such uniformity is the acquisition of whole slide images by a scanner in a much more controlled environment from both optics and illumination viewpoint [11] . In contrast, the pixel resolution in CV is randomly distributed across different images due to different image setup and configurations. CV images are captured in natural scenes where the distance has much variance. Examples from each imaging modality are shown in Figure 1(a) , where the ship images are taken from a further distance than the dog ones, resulting in larger pixel size and lower resolution. Second, target objects in CV images only occupy part of the whole FOV and the rest are background which is irrelevant to the class label. Note that the diversity of the background in Figure 1 (a) is very high. This is quite different in CPath where the background information is obtained from an empty area of the sample using uniform white light illumination [11] -leading to more uniform and homogeneous images. This is illustrated in Figure 1(b) , where the white part denotes the background. In the light of this difference, we form a hypothesis that such simplified imaging modality in CPath translates to simpler network architecture compared to CV. To this end, new network architectures should be designed for CPath applications. Neural architecture search (NAS) has recently been proposed to automate the design of neural networks by searching for the optimal network structure on a given dataset. In many CV applications, NAS has outperformed state-ofthe-art manually designed networks in terms of prediction accuracy and computation complexity [7] . In medical image analysis, it has been utilized to find suitable networks for various applications, such as image segmentation for Magnetic Resonance Imaging (MRI) [17, 36, 3] , ultrasound imaging [35] , disease diagnoses from Computed Tomography (CT) scans [16, 9] , etc. In pathology, however, NAS is not fully explored. There is a lack of a general framework that can be easily extended to various CPath applications. In this work, we propose an architecture search platform based on differentiable architecture search (DARTS) [23] . We choose DARTS because it is gradient-based and thus much more efficient and computation-friendly than other searching strategies including reinforcement learning [39] and evolutionary algorithms [26] . DARTS achieves this by relaxing the search space to be continuous and dividing the whole pipeline into a search phase and an evaluation phase. However, in CV applications, it is reported that DARTS tends to exhibit overfitting issues, and the searched architecture does not generalize well in the evaluation phase [21, 37] . To combat these challenges, we first conduct searching on CIFAR [19] and utilize a probing metric stable rank [12] for each layer. In this way, we can better monitor the searching process and show that the overfitting issue comes from improper hyperparameter tuning. In addition, we use an adaptive optimizer Adas [12] that automatically tunes the learning rates for each layer based on their probing metrics, so that the generalization ability of the searched ar-chitecture is improved. We then apply this searching framework on ADP [11] , which contains a great variety of histological tissue types that are representative enough, so that the searched architecture can generalize well in different CPath applications. The searched network outperforms the state-of-the-art architectures in the speed-accuracy tradeoff, which is crucial for real-time high-throughput CPath applications. We further conduct extensive experiments to show the transferability of the searched architecture on new CPath datasets, demonstrate its robustness against decreased input images, and verify its superiority in extracting label-pertinent features. Our main contributions are listed below: • We use a probing metric to show that the existing DARTS framework lacks proper hyperparameter tuning, and use an adaptive optimizer to improve the generalization ability of the searched model; • We apply the proposed searching platform on CPath applications and show the superiority of the searched model in prediction accuracy and computation complexity; • We demonstrate the transferability of the searched architecture in various CPath applications, show its robustness against decreased resolutions and its reliability in prediction. Table 2 . Summary of NAS applications in medical image analysis. Searching Strategy Gradient-based Reinforcement Learning Evolutionary Algorithms Segmentation [35, 17, 6] [3] [36] Classification [25, 9] [10] [16] As NAS has achieved promising results in many CV applications [7] , several attempts are made to utilize NAS techniques to find optimum architectures for applications in medical image analysis. Based on the task and the searching strategy, these works can be categorized as in Table. 2. In applications of image segmentation, most works adopt a U-net structure, where detail configurations are searched in different manners. [35, 17] use differentiable architecture search to find cell structures as building blocks in the encoder and decoder. Bae et al. [3] utilize reinforcement learning to search for hyper-parameter configurations of the U-Net architecture. Yu et al. [36] first search for cell connections to form a U-Net topology using evolutionary algorithms, and then search for operations within each cell. Dong et al. [6] extend the differentiable searching framework to work in adversarial training. For classification applications, the searching is more task-specific. Using gradient-based searching, Peng et al. [25] develop a network to predict distant metastases on PET-CT images, and He et al. [9] design a network for COVID-19 detection with Chest CT Scans. Hosseini et al. [10] use a reinforcement learning-based controller to find the best parameter configuration of a CNN model for histological tissue type classification. Jiang et al. [16] search for a network to classify pulmonary nodules with evolutionary algorithms. To the best of our knowledge, there hasn't been any work that fully explores the potentials of NAS in digital pathology applications. In this section, we introduce our searching algorithm. We first review the basic concepts of DARTS [23] , then show how the existing DARTS framework can be improved using a probing metric and a new optimizer. Finally, a network size-based searching is proposed to seek a trade-off between prediction accuracy and model complexity. The goal of DARTS [23] is to search for two types of cells (namely normal and reduction) as building blocks, which are stacked to form a full network. Each cell is represented as a directed acyclic graph with N nodes, including two input nodes, intermediate nodes and one output node. Every node x i is a latent representation (e.g., feature map in CNN) and every edge (i, j) is a mixture of weighted candidate operations in a pre-defined operation search space O (e.g., convolution, skip-connection). The outputō i,j of an edge (i, j) is then a weighted sum of candidate operations [23] :ō where α o i,j is an architecture parameter for weighting operation o (x i ). The output of an intermediate node x j is the sum of all input edges, i.e., x j = i