Submitted 27 November 2018 Accepted 10 February 2019 Published 4 March 2019 Corresponding author Kh Tohidul Islam, kh.tohidulislam@gmail.com Academic editor Gang Mei Additional Information and Declarations can be found on page 13 DOI 10.7717/peerj-cs.181 Copyright 2019 Islam et al. Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS A rotation and translation invariant method for 3D organ image classification using deep convolutional neural networks Kh Tohidul Islam, Sudanthi Wijewickrema and Stephen O’Leary Department of Surgery (Otolaryngology), University of Melbourne, Melbourne, Victoria, Australia ABSTRACT Three-dimensional (3D) medical image classification is useful in applications such as disease diagnosis and content-based medical image retrieval. It is a challenging task due to several reasons. First, image intensity values are vastly different depending on the image modality. Second, intensity values within the same image modality may vary depending on the imaging machine and artifacts may also be introduced in the imaging process. Third, processing 3D data requires high computational power. In recent years, significant research has been conducted in the field of 3D medical image classification. However, most of these make assumptions about patient orientation and imaging direction to simplify the problem and/or work with the full 3D images. As such, they perform poorly when these assumptions are not met. In this paper, we propose a method of classification for 3D organ images that is rotation and translation invariant. To this end, we extract a representative two-dimensional (2D) slice along the plane of best symmetry from the 3D image. We then use this slice to represent the 3D image and use a 20-layer deep convolutional neural network (DCNN) to perform the classification task. We show experimentally, using multi-modal data, that our method is comparable to existing methods when the assumptions of patient orientation and viewing direction are met. Notably, it shows similarly high accuracy even when these assumptions are violated, where other methods fail. We also explore how this method can be used with other DCNN models as well as conventional classification approaches. Subjects Artificial Intelligence, Computer Vision, Data Mining and Machine Learning Keywords Deep Learning, Medical Image Processing, Image Classification, Symmetry, 3D Organ Image Classification INTRODUCTION With the rapid growth of medical imaging technologies, a large volume of 3D medical images of different modalities such as magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) has become available (Research & Markets, 2018). This has resulted in the formation of large medical image databases that offer opportunities for evidence-based diagnosis, teaching, and research. Within this context, the need for the development of 3D image classification methods has risen. For example, 3D medical image classification is used in applications How to cite this article Islam KT, Wijewickrema S, O’Leary S. 2019. A rotation and translation invariant method for 3D organ image classification using deep convolutional neural networks. PeerJ Comput. Sci. 5:e181 http://doi.org/10.7717/peerj-cs.181 https://peerj.com mailto:kh.tohidulislam@gmail.com https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.181 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj-cs.181 such as computer aided diagnosis (CAD) and content-based medical image retrieval (CBMIR) (Zhou et al., 2006; Kumar et al., 2013). In recent years, many algorithms have been introduced for the classification of 3D medical images (Arias et al., 2016; Mohan & Subashini, 2018). Both conventional classification methods and deep learning have been used for this purpose. For example, Öziç & Özşen (2017) proposed a voxel-based morphometry method to transform 3D voxel values into a vector to be used as features in a support vector machine (SVM) in order to identify patients with Alzheimer’s disease using MR images. Bicacro, Silveira & Marques (2012) investigated the classification of 3D brain PET images into three classes: Alzheimer’s disease, mild cognitive impairment, and cognitively normal. To this end, they used three different feature extraction approaches (volumetric intensity, 3D Haar-like (Cui et al., 2007), and histogram of oriented gradients (HoG) (Dalal & Triggs, 2005)) and trained a SVM using these features. Morgado, Silveira & Marques (2013) also performed a similar classification (Alzheimer’s disease, mild cognitive impairment, and cognitively normal) for PET brain images. They used 2D and 3D local binary patterns as texture descriptors to extract features and performed the classification task using a SVM. A 3D image classification method was proposed by Liu & Dellaert (1998) for the pathological classification of brain CT images (captured by the same scanner) as normal, (evidence of) blood, or stroke. First, in a pre-processing step, they manually realigned all images so that the mid-sagittal plane was at the middle of the image. Then, considering the symmetry of the image, they extracted 50 image features from half of each 2D slice (in the superior-inferior direction) and used kernel regression for classification. A limitation of conventional classification methods such as these, is that the most appropriate features for a given problem have to be extracted first, in order to train the classifiers. In contrast, deep learning techniques such as deep convolutional neural networks (DCNNs) extract the features as part of the learning process, thereby ensuring that the optimal features for a given task are extracted. A 3D DCNN was used by Ahn (2017) to classify lung cancer (cancer positive or negative) from CT images. The author modified the SqueezeNet (Iandola et al., 2016) architecture (which is traditionally suitable for 2D images) to obtain SqueezeNet3D which is appropriate for 3D image classification. Liu & Kang (2017) introduced a lung nodule classification approach by using a multi- view DCNN for CT images. They obtained a 3D volume by considering multiple views of a given nodule (patches of different sizes around the nodule) prior to classification. They performed two classifications: binary (benign or malignant) and ternary (benign, primary malignant, or metastatic malignant). Jin et al. (2017) modified the AlexNet (Krizhevsky, Sutskever & Hinton, 2012) architecture to make it suitable for the classification of 3D CT images of lungs. They segmented the lungs from the CT image using a pre-processing step and performed a binary classification (cancer or not) on the resulting image. Instead of using 3D DCNNs, other researchers have considered how 2D DCNNs can be used to classify 3D medical images. For example, Qayyum et al. (2017) used each and every 2D slice of a 3D image Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 2/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 as input to a 2D DCNN. They classified the images into 24 classes and used a publicly available 3D medical image database to evaluate their methodology. Usually 3D medical images are captured/reconstructed so that they are consistent with respect to viewing direction and patient orientation (rotation and translation). For example, image slices are typically taken in the superior-inferior direction. An effort is made to ensure that the patient is aligned in such a way that the mid-sagittal and mid-coronal planes are aligned at the middle of the image. Thus, most classification methods assume that these requirements are met (e.g., Qayyum et al., 2017). As such, they may not perform well if these assumptions are violated. Others perform manual pre-processing prior to the classification to avoid this issue (e.g., Liu & Dellaert, 1998). In this paper, we consider the specific case of 3D organ image classification and propose an algorithm that is robust against rotation and translation. To this end, we exploit the fact that the human body is roughly symmetric, and extract a 2D slice from the plane of best symmetry from a 3D image of the organ in a pre-processing step. We consider this slice to be representative of the 3D image, as it provides a relatively consistent cross-section of the 3D image, irrespective of its orientation. Then, we use this ‘representative’ 2D image to train a 2D DCNN to classify the 3D image. As discussed later, simplicity is one of the major features of the algorithm we propose. We show through experiments performed on publicly available muliti-modal (CT and MRI) data that (1) the proposed method is as accurate as other similar methods when the above assumptions are met, (2) it significantly outperforms other methods when faced with rotated and/or translated data, (3) the training time of the proposed method is low, and (4) it achieves similarly high results when used with other DCNN architectures. MATERIALS AND METHODS In this section, we discuss the steps of the algorithm we propose for rotation and translation invariant 3D organ image classification: volume reconstruction, segmentation, symmetry plane extraction, and classification using a DCNN. Volume reconstruction First, we loaded the 2D slices of a DICOM image into a 3D array considering the InstanceNumber in the metadata to be the z dimension. As the slice thickness (z spacing) is not necessarily the same as the pixel spacing, this volume does not represent the real-world shape of the imaged organ/body part. To retain the actual shape, we resampled the 3D image using cubic interpolation (Miklos, 2004). The new array size for the resampled image was calculated using Eq. (1) where [nx,ny,nz] is the original array size, psx and psy are the x and y spacings respectively (PixelSpacing in metadata), and st is the z spacing (SliceThickness in metadata). An example of a volume reconstruction is shown in Fig. 1. [nx,ny,nz]= [ nx, ny∗psy psx , nz ∗st psx ] (1) Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 3/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 Figure 1 Volume reconstruction from DICOM images: (A) stack of 2D slices, (B) reconstructed 3D volume, and (C) resampled 3D volume. Full-size DOI: 10.7717/peerjcs.181/fig-1 Figure 2 Multi-level volume thresholding: (A) resampled volume from the previous step, (B) seg- mented volume, and (C) resulting point cloud. Full-size DOI: 10.7717/peerjcs.181/fig-2 3D volume segmentation To segment the organ(s) from the background, we used a multi-level global thresholding using Otsu’s method (Otsu, 1979). We used two thresholds and considered the voxels with intensity values within these thresholds to be the organ(s). This provides a segmentation (point cloud) of the organ(s) and also avoids the inclusion of possible imaging artifacts at the extremes of the intensity spectrum. An example of the segmentation process is shown in Fig. 2. Note that this is a simple segmentation process, and as such, does not provide an exact segmentation. However, from our results, we observed that this was sufficient for our purpose: simplifying the symmetry plane calculation in the next step of our algorithm. Representative 2D image extraction We calculated the plane of best symmetry from the point cloud resulting from the previous step using the method discussed in Cicconet, Hildebrand & Elliott (2017). They calculated Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 4/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-1 https://doi.org/10.7717/peerjcs.181/fig-2 http://dx.doi.org/10.7717/peerj-cs.181 Figure 3 Representative 2D image extraction: (A) segmented volume, (B) segmented point cloud with the plane of best symmetry shown as a circular plane, and (C) 2D image extracted from the symmetry plane. Full-size DOI: 10.7717/peerjcs.181/fig-3 the reflection of a point cloud around an arbitrary plane, used the iterative closest point algorithm (Besl & McKay, 1992) to register the original and reflected point clouds, and solved an eigenvalue problem related to the global transformation that was applied to the original data during registration. The first eigenvector in this solution is the normal to the plane of best symmetry. We extracted the 2D image resulting from the intersection of this plane with the 3D volume using the nearest neighbour method (Miklos, 2004). We considered the second and third eigenvectors to be the x and y axes for this 2D image respectively. We determined the bounds of the 2D image to be the minimum and maximum values resulting from the projections of the 3D volume vertices on the axes and the origin to be the middle of these minimum and maximum values. Figure 3 shows the extraction of the plane of best symmetry. Although the accuracy of the symmetry plane calculation depends on the segmentation step, and this can be avoided by using algorithms that minimize the distance between intensity values of voxels (Tuzikov, Colliot & Bloch, 2003; Teverovskiy & Li, 2006) instead, using the segmented point cloud is more efficient. As we found it is sufficient for our purposes, we used this method of symmetry plane calculation for the sake of efficiency. Classification using a DCNN Due to the roughly symmetric nature of the human body, the 2D images resulting from the previous step provide relatively consistent cross-sections of the 3D images. As such, we used these 2D images to train a standard 2D DCNN for the classification task. The DCNN used here consisted of 20 layers: one image input layer, four convolution layers, four batch normalization layers, four rectified linear unit (ReLU) (Nair & Hinton, 2010) layers, four max poling layers, one fully connected layer, one softmax layer, and one classification output layer. We resized the images to the size of 224×224 and normalized the intensity values to be in the range of [0 255]. Figure 4 illustrates the DCNN architecture. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 5/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-3 http://dx.doi.org/10.7717/peerj-cs.181 Figure 4 Deep convolutional neural network architecture. Full-size DOI: 10.7717/peerjcs.181/fig-4 RESULTS In this section, we discuss the performance metrics and databases used in the experiments, the implementation/extension of existing methods, and the experimental results. All experiments were performed in MATLAB R© (MathWorks Inc., 1998) on a HP Z6 G4 Workstation running Windows R© 10 Education on an Intel R© Xeon R© Silver 4108 CPU with a clock speed of 1.80 GHz, 16 GB RAM, and a NVIDIA R© Quadro R© P2000 GPU. Performance evaluation metrics To evaluate classification performance, we used commonly utilized metrics (accuracy and mean value of sensitivity, specificity, precision, f-measure, and g-mean) (Japkowicz, 2006; Powers, 2011; Olson & Delen, 2008). These metrics are defined in Eqs. (2)–(7) with respect to values of the confusion matrix: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Accuracy = TP+TN TP+FN +FP+TN (2) Sensitivity = TP TP+FN (3) Specificity = TN TN +FP (4) Precision = TP TP+FP (5) F−Measure =2× ( TP TP+FP × TP TP+FN TP TP+FP + TP TP+FN ) (6) G−Mean = √ TP TP+FN × TN TN +FP (7) Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 6/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-4 http://dx.doi.org/10.7717/peerj-cs.181 Figure 5 Database formation. Full-size DOI: 10.7717/peerjcs.181/fig-5 Databases We collected data from a publicly available 3D medical image database for our experiments: the cancer imaging archive (TCIA) (Clark et al., 2013). TCIA stores a large number of multi-modal medical images stored in the digital imaging and communications in medicine (DICOM) file format. From this database, we collected data (CT and MRI) for four classes that define different areas of the human body: head, thorax, breast, and abdomen. Some images, such as those that had a very low number of 2D DICOM slices and those with inconsistent imaging directions, were removed from our database. A total of 2400 3D images were obtained (600 images per class). Seventy percent of the images were used for training and the remaining thirty percent were used for testing. In addition to the original testing database, we created two other databases by (1) randomly rotating and translating and (2) randomly swapping the axes of the original test data. The former database was used to test for rotation and translation (patient orientation) invariance and the latter was used to test for robustness against changes in the imaging direction. In addition to this, we created an augmented training database by randomly rotating and translating 50% of the original training data, and randomly swapping axes of the remaining 50%. Figure 5 illustrates this process. To generate the transformed data that simulated changes in patient orientation (in the transformed test database and the augmented training database), we performed a random rotation in the range of [−150 150] with respect to the three coordinate axes and a random translation of [−5 5] along the coordinate axes on each image. Figure 6 shows an example of such a 3D transformation. To obtain the axis swapped data that simulated changes in imaging direction (in the axis swapped test database and the augmented training database), we randomly changed the axes of the original data. Note that this is synonymous to rotations of 900 around the x, y, or z axis. An example of a random axis swapping is shown in Fig. 7. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 7/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-5 http://dx.doi.org/10.7717/peerj-cs.181 Figure 6 An example of a random 3D transformation: (A) original volume, (B) transformed volume (a rotation of +15 0 counterclockwise around the z axis and a translation of [3, 2, 2] in the x, y, and z di- rection respectively), and (C) mid-axial slices of both original and transformed volumes. Full-size DOI: 10.7717/peerjcs.181/fig-6 Figure 7 An example of a random 3D axis swapping: (A) original volume, (B) axis swapped volume with the x axis changed to the y axis and z axis changed to the x axis, and (C) mid-axial slices of both original and axis swapped volumes. Full-size DOI: 10.7717/peerjcs.181/fig-7 Performance comparison with other similar methods We evaluated our method against similar existing methods. We reimplemented the method of Qayyum et al. (2017) that used all 2D slices to represent the 3D volume. We implemented their DCNN in MATLAB and used all slices of the training and testing sets respectively to train and evaluate this method. As the authors used images of size 224×224 as their input, we also performed the same resizing of the data in a pre-processing step. We also implemented the method used in the classification of 3D lung images introduced in Jin et al. (2017). They used thresholds based on the Hounsfield unit (HU) to extract an initial mask from CT images and used morphological operations to fill holes in this mask. Then, they segmented the lungs using this mask and trained a DCNN for the classification task. However, this method cannot be directly used in our problem. First, we have multi-modal data, and hence, it is not possible to use the HU scale, which is specific to CT images. Second, we have images of different organs which would require the definition of organ specific thresholds. Third, morphological operations require the input of the size of dilation/erosion which varies depending on the type of image. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 8/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-6 https://doi.org/10.7717/peerjcs.181/fig-7 http://dx.doi.org/10.7717/peerj-cs.181 Table 1 Performance comparison with similar existing methods (without data augmentation). Best performance per metric per database is highlighted in bold. Database Methodology Accuracy Sensitivity Specificity Precision F-Measure G-Mean Jin et al. (2017) 0.9514 0.9514 0.9838 0.9525 0.9515 0.9675 Qayyum et al. (2017) 0.9014 0.9014 0.9671 0.9168 0.9029 0.9337 Prasoon et al. (2013) 0.9653 0.9653 0.9884 0.9679 0.9654 0.9768 Ahn (2017) 0.9972 0.9972 0.9991 0.9973 0.9972 0.9981 Original Proposed 0.9944 0.9944 0.9981 0.9945 0.9944 0.9963 Jin et al. (2017) 0.8611 0.8611 0.9537 0.9080 0.8642 0.9062 Qayyum et al. (2017) 0.6694 0.6694 0.8898 0.7670 0.5921 0.7718 Prasoon et al. (2013) 0.9306 0.9306 0.9769 0.9364 0.9306 0.9534 Ahn (2017) 0.9333 0.9333 0.9778 0.9378 0.9325 0.9553 Transformed Proposed 0.9917 0.9917 0.9972 0.9919 0.9916 0.9944 Jin et al. (2017) 0.6222 0.6222 0.8741 0.7549 0.6065 0.7375 Qayyum et al. (2017) 0.5056 0.5056 0.8352 0.7630 0.4491 0.6498 Prasoon et al. (2013) 0.8028 0.8028 0.9343 0.8420 0.7936 0.8660 Ahn (2017) 0.7264 0.7264 0.9088 0.7707 0.7122 0.8125 Axis Swapped Proposed 0.9875 0.9875 0.9958 0.9876 0.9875 0.9917 Therefore, we used a process that can be generally applied to all images in our database. First, we created a binary mask using the multi-level global thresholding method discussed earlier. Then, we used active contours introduced in Chan & Vese (2001) on the initial mask with 100 iterations to fill in holes and obtain a more refined mask. Finally, we extracted the organ from the background using this mask and used this as the input to the DCNN. Jin et al. (2017) observed that an input image size of 128×128×20 provided the best performance, and therefore, we also resized the input images to this size. Another 3D medical image classification model we implemented was Ahn (2017) which was used for lung image classification. The author performed an intensity normalization of their CT images based on the HU scale in a pre-processing step. Due to the same reasons as above, we did not perform this normalization. We used the same resizing of the data they used (128×128×128) in our implementation. As an additional method of comparison, we extended the idea presented in Prasoon et al. (2013) for image segmentation, to make it applicable to our problem. They explored the classification of each voxel in 3D MRI images for the purpose of knee cartilage segmentation. They extracted three 2D patches around a voxel in the x, y, and z directions, trained DCNNs for each 2D patch, and combined the results of the three DCNNs in the final layer. We applied this idea to our problem by extracting the mid slices in the three coordinate directions and training three DCNNs similar to theirs. Performance comparisons with respect to the metrics discussed above when trained on the original and augmented training datasets are shown in Tables 1 and 2 respectively. Performance with regards to training time is given in Table 3. Figure 8 shows the Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 9/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 Table 2 Performance comparison with similar existing methods (with data augmentation by random transformation and axis swapping on training data). Best performance per metric per database is highlighted in bold. Database Methodology Accuracy Sensitivity Specificity Precision F-Measure G-Mean Jin et al. (2017) 0.9222 0.9222 0.9741 0.9225 0.9213 0.9478 Qayyum et al. (2017) 0.9042 0.9042 0.9681 0.9082 0.9037 0.9356 Prasoon et al. (2013) 0.9375 0.9375 0.9792 0.9387 0.9370 0.9581 Ahn (2017) 0.9597 0.9597 0.9866 0.9607 0.9595 0.9731 Original Proposed 0.9903 0.9903 0.9968 0.9903 0.9903 0.9935 Jin et al. (2017) 0.9139 0.9139 0.9713 0.9139 0.9132 0.9422 Qayyum et al. (2017) 0.8931 0.8931 0.9644 0.9023 0.8917 0.9280 Prasoon et al. (2013) 0.9306 0.9306 0.9769 0.9314 0.9296 0.9534 Ahn (2017) 0.9375 0.9375 0.9792 0.9409 0.9368 0.9581 Transformed Proposed 0.9681 0.9681 0.9894 0.9682 0.9678 0.9786 Jin et al. (2017) 0.8903 0.8903 0.9634 0.8910 0.8894 0.9261 Qayyum et al. (2017) 0.8597 0.8597 0.9532 0.8811 0.8604 0.9053 Prasoon et al. (2013) 0.9028 0.9028 0.9676 0.9039 0.9018 0.9346 Ahn (2017) 0.9194 0.9194 0.9731 0.9222 0.9190 0.9459 Axis Swapped Proposed 0.9653 0.9653 0.9884 0.9658 0.9652 0.9768 Table 3 Comparison of training time with similar existing methods. Method Pre-processing Time (m) Training Time (m) Total Time (m) Jin et al. (2017) 240 1230 1470 Qayyum et al. (2017) N/A 2732 2732 Prasoon et al. (2013) 1012 23 1035 Ahn (2017) 27 7252 7279 Proposed 232 20 252 classification of some random examples using the proposed method, along with corresponding confidence levels. Performance when used with other DCNNs We also investigated the performance of our method when used with some existing state- of-the-art DCNNs: AlexNet (Krizhevsky, Sutskever & Hinton, 2012), GoogLeNet (Szegedy et al., 2015), ResNet-50 (He et al., 2016), and VGG-16 (Simonyan & Zisserman, 2014). To enable these DCNNs to be used in our algorithm, we normalized the 2D images (extracted from the plane of best symmetry) prior to the classification depending on the requirements of each DCNN. The single channel 2D grey scale images were converted to three-channel colour images and resized to the size of 127×127×3 for AlexNet and 224×224×3 for GoogLeNet, ResNet-50, and VGG-16. The performance results are shown in Table 4. Performance when used with conventional classifiers As conventional classification approaches, in concert with image feature extraction methods, have been used extensively for image classification, we also explored how to integrate the concepts discussed here with these methods. For this purpose, we used two Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 10/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 Figure 8 Performance (confidence level of classification) of the proposed method with respect to some random images: (A) Abdomen, 85.7%, (B) Abdomen, 92.3%, (C) Abdomen, 94.2%, (D) Head, 99.9%, (E) Head, 100%, (F) Head, 98.8%, (G) Breast, 100%, (H) Breast, 100%, (I) Breast, 100%, (J) Thorax, 98.9%, (K) Thorax, 98.9%, and (L) Thorax, 99.7%. The images show the views extracted using the pro- posed algorithm from 3D CT and MRI images in the test databases. Note that the differences in the im- ages of the same class are caused by the simplicity of the segmentation method which influences the symmetry plane extraction. Full-size DOI: 10.7717/peerjcs.181/fig-8 Table 4 Performance of the proposed algorithm when used with other state-of-the-art DCNN models. Database Methodology Accuracy Sensitivity Specificity Precision F-Measure G-Mean AlexNet 0.9958 0.9958 0.9986 0.9958 0.9958 0.9972 GoogLeNet 0.9986 0.9986 0.9995 0.9986 0.9986 0.9991 ResNet-50 0.9972 0.9972 0.9991 0.9973 0.9972 0.9981 Original VGG-16 0.9708 0.9708 0.9903 0.9717 0.9708 0.9805 AlexNet 0.9944 0.9944 0.9981 0.9945 0.9944 0.9963 GoogLeNet 0.9931 0.9931 0.9977 0.9931 0.9931 0.9954 ResNet-50 0.9958 0.9958 0.9986 0.9958 0.9958 0.9972 Transformed VGG-16 0.9653 0.9653 0.9884 0.9664 0.9653 0.9768 AlexNet 0.9931 0.9931 0.9977 0.9931 0.9931 0.9954 GoogLeNet 0.9917 0.9917 0.9972 0.9917 0.9916 0.9944 ResNet-50 0.9931 0.9931 0.9977 0.9930 0.9930 0.9954 Axis Swapped VGG-16 0.9639 0.9639 0.9880 0.9650 0.9640 0.9759 image feature extraction methods: bag of words (BoW) (Harris, 1954) and histogram of oriented gradients (HoG) (Dalal & Triggs, 2005). To perform the classification task, we used support vector machines (SVMs) and artificial neural networks (ANNs) as they are widely used in classification. Here we used five-fold cross-validation for the SVM and 10 hidden neurons for the ANN. We normalised the 2D image slices resulting from the symmetry plane calculation to the size of 224×224. The performance of these approaches is shown in Table 5. DISCUSSION From the results in Table 1, we observe that the proposed method is better than other similar methods (except for Ahn (2017) which shows slightly better performance) when Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 11/16 https://peerj.com https://doi.org/10.7717/peerjcs.181/fig-8 http://dx.doi.org/10.7717/peerj-cs.181 Table 5 Performance of the proposed algorithm when used with conventional machine learning approaches. Database Methodology Accuracy Sensitivity Specificity Precision F-Measure G-Mean BoW+SVM 0.8083 0.8083 0.9361 0.8112 0.8047 0.8699 BoW+ANN 0.7667 0.7667 0.9222 0.7927 0.7568 0.8409 HoG+SVM 0.8500 0.8500 0.9500 0.8648 0.8470 0.8986 Original HoG+ANN 0.7472 0.7472 0.9157 0.7983 0.7221 0.8272 BoW+SVM 0.7944 0.7944 0.9315 0.7992 0.7908 0.8602 BoW+ANN 0.7542 0.7542 0.9181 0.7820 0.7423 0.8321 HoG+SVM 0.8375 0.8375 0.9458 0.8549 0.8334 0.8900 Transformed HoG+ANN 0.7458 0.7458 0.9153 0.8113 0.7240 0.8262 BoW+SVM 0.7903 0.7903 0.9301 0.7923 0.7864 0.8573 BoW+ANN 0.7500 0.7500 0.9167 0.7773 0.7350 0.8292 HoG+SVM 0.8361 0.8361 0.9454 0.8556 0.8324 0.8891 Axis Swapped HoG+ANN 0.7361 0.7361 0.9120 0.7982 0.7136 0.8194 applied to data that satisfied the conditions of consistent patient orientation and imaging direction when trained on the original (unaugmented) training database. However, Ahn (2017) uses 3D DCNNs in their classification and therefore, has a much slower training time when compared to the proposed method. As shown in Table 3, even with a relatively higher pre-processing time, our method is the fastest in terms of total training time. Also from Table 1, we can see that the proposed method outperforms the other methods in the face of changes in patient orientation and imaging direction. Although observe that some methods such as Ahn (2017) and Prasoon et al. (2013) are robust against patient orientation to some degree, they also fail when dealing with changes to imaging direction. Performance of the compared methods on transformed and axis swapped data is improved when trained on augmented data, as seen in Table 2. This is the result of the classifiers being trained on images of different orientations. However, the proposed method outperforms the other methods even when training was performed on augmented data. Also, the results imply that data augmentation in the training phase is not required for our method. The high accuracy of the proposed method, specifically on transformed data, is mainly due to the fact that a relatively consistent 2D cross-sectional view of a 3D image is being used to represent the 3D image irrespective of orientation. As such, the variation in the input data per class is minimal and therefore, better classification can be achieved. Comparison results shown in Table 3 reflect the robustness of the proposed method irrespective of the DCNN architecture used in the classification step. The performance results of the classifiers SVM and ANN, when combined with the feature extraction methods of BoW and HoG, show consistent but lower results (Table 4). This indicates that DCNNs may be better suited for our application. The salient feature of this algorithm is its simplicity. First, we reduced the 3D classification problem to a 2D one by extracting the 2D image lying on the plane of best symmetry from the 3D volume. In this operation, we used calculations that were most efficient, such as simple thresholding techniques. It can be argued that using more sophisticated methods Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 12/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 of segmentation would enable more accurate symmetry plane calculation, which in turn would make the 2D views extracted more consistent. Furthermore, we rescaled the data to an 8-bit representation (intensity range of [0 255]), thereby reducing the resolution of the data. However, we found that even in the face of such simplifications, the proposed method displayed very high levels of performance. As such, we can conclude that it has achieved a good balance between efficiency and accuracy. Although the human body is roughly symmetric, most of the organs and how they are aligned inside the body are not perfectly symmetrical. Furthermore, the data we considered here was from a cancer database where there is further asymmetry caused by tumors, lesions etc. Our method was observed to perform well in these circumstances. However, we did not consider the effect of more exaggerated forms of asymmetry, for example, that caused by parts of an organ being cut off due to improper patient alignment. In the future, we will investigate how these forms of asymmetry affect the proposed method and how to compensate for them. We will also explore how it performs on other databases with higher numbers of classes. CONCLUSION In this paper, we proposed a 3D organ image classification approach which is robust against patient orientation and changes in imaging direction. To this end, we extracted the plane of best symmetry from the 3D image and extracted the 2D image corresponding to that plane. Then, we used a DCNN to classify the 2D image into one of four classes. We showed that this method is not only efficient and simple, but is also highly accurate in comparison to other similar methods. We also showed that this algorithm can be used in concert with other state-of-the-art DCNN models and also conventional classification techniques in combination with feature extraction methods. Although our algorithm was specifically developed for 3D organ image classification, it is applicable to any classification task where a 2D image extracted from the plane of best symmetry of the 3D image is sufficient to represent the 3D image. ACKNOWLEDGEMENTS The authors would like to thank Dr. Bridget Copson of the Department of Medical Imaging at St. Vincent’s Hospital, Melbourne, Australia, for her input on imaging techniques. ADDITIONAL INFORMATION AND DECLARATIONS Funding This work was supported by the University of Melbourne under the Melbourne Research Scholarship (MRS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Grant Disclosures The following grant information was disclosed by the authors: University of Melbourne. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 13/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181 Competing Interests The authors declare there are no competing interests. Author Contributions • Kh Tohidul Islam and Sudanthi Wijewickrema conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, performed the computation work, authored or reviewed drafts of the paper, and approved the final draft. • Stephen O’Leary conceived and designed the experiments, contributed reagents/ma- terials/analysis tools, authored or reviewed drafts of the paper, and approved the final draft. Data Availability The following information was supplied regarding data availability: The code is available in the Supplemental Information 1. Supplemental Information Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj-cs.181#supplemental-information. REFERENCES Ahn BB. 2017. The compact 3D convolutional neural network for medical images. Stanford: Standford University. Arias J, Martínez-Gómez J, Gámez JA, De Herrera AGS, Müller H. 2016. Medical image modality classification using discrete Bayesian networks. Computer Vision and Image Understanding 151:61–71 DOI 10.1016/j.cviu.2016.04.002. Besl P, McKay ND. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2):239–256 DOI 10.1109/34.121791. Bicacro E, Silveira M, Marques JS. 2012. Alternative feature extraction methods in 3D brain image-based diagnosis of Alzheimer’s disease. In: 2012 19th IEEE international conference on image processing. Piscataway: IEEE DOI 10.1109/icip.2012.6467090. Chan T, Vese L. 2001. Active contours without edges. IEEE Transactions on Image Processing 10(2):266–277 DOI 10.1109/83.902291. Cicconet M, Hildebrand DGC, Elliott H. 2017. Finding mirror symmetry via reg- istration and optimal symmetric pairwise assignment of curves. In: 2017 IEEE international conference on computer vision workshops (ICCVW). Piscataway: IEEE DOI 10.1109/iccvw.2017.206. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. 2013. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. Journal of Digital Imaging 26(6):1045–1057 DOI 10.1007/s10278-013-9622-7. Cui X, Liu Y, Shan S, Chen X, Gao W. 2007. 3D haar-like features for pedestrian detection. Piscataway: IEEE DOI 10.1109/icme.2007.4284887. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 14/16 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.181#supplemental-information http://dx.doi.org/10.7717/peerj-cs.181#supplemental-information http://dx.doi.org/10.7717/peerj-cs.181#supplemental-information http://dx.doi.org/10.1016/j.cviu.2016.04.002 http://dx.doi.org/10.1109/34.121791 http://dx.doi.org/10.1109/icip.2012.6467090 http://dx.doi.org/10.1109/83.902291 http://dx.doi.org/10.1109/iccvw.2017.206 http://dx.doi.org/10.1007/s10278-013-9622-7 http://dx.doi.org/10.1109/icme.2007.4284887 http://dx.doi.org/10.7717/peerj-cs.181 Dalal N, Triggs B. 2005. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1. Piscataway: IEEE, 886–893 DOI 10.1109/cvpr.2005.177. Harris ZS. 1954. Distributional structure. WORD 10(2–3):146–162 DOI 10.1080/00437956.1954.11659520. He K, Zhang X, Ren S, Sun J. 2016. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). Piscataway: IEEE DOI 10.1109/cvpr.2016.90. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. 2016. Squeezenet: alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. ArXiv preprint. arXiv:1602.07360. Japkowicz N. 2006. Why question machine learning evaluation methods. In: AAAI workshop on evaluation methods for machine learning. New York: ACM, 6–11. Jin T, Cui H, Zeng S, Wang X. 2017. Learning deep spatial lung features by 3d convo- lutional neural network for early cancer detection. In: 2017 international conference on digital image computing: techniques and applications (DICTA). Piscataway: IEEE DOI 10.1109/dicta.2017.8227454. Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in neural information processing systems 25. Lake Tahoe: Curran Associates, Inc., 1097–1105. Kumar A, Kim J, Cai W, Fulham M, Feng D. 2013. Content-based medical image retrieval: a survey of applications to multidimensional and multimodality data. Journal of Digital Imaging 26(6):1025–1039 DOI 10.1007/s10278-013-9619-2. Liu K, Kang G. 2017. Multiview convolutional neural networks for lung nodule clas- sification. International Journal of Imaging Systems and Technology 27(1):12–22 DOI 10.1002/ima.22206. Liu Y, Dellaert F. 1998. A classification based similarity metric for 3D image re- trieval. In: Proceedings. 1998 IEEE computer society conference on computer vi- sion and pattern recognition (Cat. No.98CB36231). Piscataway: IEEE, 800–805 DOI 10.1109/CVPR.1998.698695. MathWorks, Inc. 1998. MATLAB. Natick: MathWorks, Inc. Available at https://www. mathworks.com/products/matlab.html. Miklos P. 2004. Image interpolation techniques. In: 2nd Siberian-Hungarian joint symposium on intelligent systems. Mohan G, Subashini MM. 2018. MRI based medical image analysis: survey on brain tumor grade classification. Biomedical Signal Processing and Control 39:139–161 DOI 10.1016/j.bspc.2017.07.007. Morgado P, Silveira M, Marques JS. 2013. Diagnosis of Alzheimer’s disease using 3D local binary patterns. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 1(1):2–12 DOI 10.1080/21681163.2013.764609. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 15/16 https://peerj.com http://dx.doi.org/10.1109/cvpr.2005.177 http://dx.doi.org/10.1080/00437956.1954.11659520 http://dx.doi.org/10.1109/cvpr.2016.90 http://arXiv.org/abs/1602.07360 http://dx.doi.org/10.1109/dicta.2017.8227454 http://dx.doi.org/10.1007/s10278-013-9619-2 http://dx.doi.org/10.1002/ima.22206 http://dx.doi.org/10.1109/CVPR.1998.698695 https://www.mathworks.com/products/matlab.html https://www.mathworks.com/products/matlab.html http://dx.doi.org/10.1016/j.bspc.2017.07.007 http://dx.doi.org/10.1080/21681163.2013.764609 http://dx.doi.org/10.7717/peerj-cs.181 Nair V, Hinton GE. 2010. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10. Madison: Omnipress, 807–814. Olson DL, Delen D. 2008. Advanced data mining techniques. Springer, Berlin, Heidel- berg: Springer Science & Business Media DOI 10.1007/978-3-540-76917-0. Otsu N. 1979. A threshold selection method from gray-level histograms. IEEE Transac- tions on Systems, Man, and Cybernetics 9(1):62–66 DOI 10.1109/tsmc.1979.4310076. Öziç MÜ, Özşen S. 2017. T-test feature ranking based 3D MR classification with VBM mask. In: 2017 25th signal processing and communications applications conference (SIU). Piscataway: IEEE DOI 10.1109/siu.2017.7960591. Powers DM. 2011. Evaluation: from precision, recall and F-measure to ROC, in- formedness, markedness and correlation. Journal of Machine Learning Technologies 2(1):37–63. Prasoon A, Petersen K, Igel C, Lauze F, Dam E, Nielsen M. 2013. Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Advanced information systems engineering. Springer, Berlin, Heidelberg: Springer Berlin Heidelberg, 246–253 DOI 10.1007/978-3-642-40763-5_31. Qayyum A, Anwar SM, Awais M, Majid M. 2017. Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20 DOI 10.1016/j.neucom.2017.05.025. Research and Markets. 2018. Medical imaging market—global outlook and forecast 2018–2023. Available at https://www.researchandmarkets.com/reports/4455446/ (accessed on 22 October 2018). Simonyan K, Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition. ArXiv preprint. arXiv:1409.1556. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. 2015. Going deeper with convolutions. In: 2015 IEEE con- ference on computer vision and pattern recognition (CVPR). Piscataway: IEEE DOI 10.1109/cvpr.2015.7298594. Teverovskiy L, Li Y. 2006. Truly 3D midsagittal plane extraction for robust neuroimage registration. In: 3rd IEEE international symposium on biomedical imaging: macro to nano, 2006. Piscataway: IEEE, 860–863 DOI 10.1109/isbi.2006.1625054. Tuzikov AV, Colliot O, Bloch I. 2003. Evaluation of the symmetry plane in 3D MR brain images. Pattern Recognition Letters 24(14):2219–2233 DOI 10.1016/s0167-8655(03)00049-7. Zhou X, Hayashi T, Hara T, Fujita H, Yokoyama R, Kiryu T, Hoshi H. 2006. Automatic segmentation and recognition of anatomical lung structures from high-resolution chest CT images. Computerized Medical Imaging and Graphics 30(5):299–313 DOI 10.1016/j.compmedimag.2006.06.002. Islam et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.181 16/16 https://peerj.com http://dx.doi.org/10.1007/978-3-540-76917-0 http://dx.doi.org/10.1109/tsmc.1979.4310076 http://dx.doi.org/10.1109/siu.2017.7960591 http://dx.doi.org/10.1007/978-3-642-40763-5_31 http://dx.doi.org/10.1016/j.neucom.2017.05.025 https://www.researchandmarkets.com/reports/4455446/ http://arXiv.org/abs/1409.1556 http://dx.doi.org/10.1109/cvpr.2015.7298594 http://dx.doi.org/10.1109/isbi.2006.1625054 http://dx.doi.org/10.1016/s0167-8655(03)00049-7 http://dx.doi.org/10.1016/j.compmedimag.2006.06.002 http://dx.doi.org/10.7717/peerj-cs.181