key: cord-0823304-mcxpduy4
authors: Yang, Ziyuan; Leng, Lu; Li, Ming; Chu, Jun
title: A computer-aid multi-task light-weight network for macroscopic feces diagnosis
date: 2022-02-28
journal: Multimed Tools Appl
DOI: 10.1007/s11042-022-12565-0
sha: dcea68fe24c312bf9eb8bf5a1f6649689cd2d9f5
doc_id: 823304
cord_uid: mcxpduy4

The abnormal traits and colors of feces typically indicate that the patients are probably suffering from tumor or digestive-system diseases. Thus a fast, accurate and automatic health diagnosis system based on feces is urgently necessary for improving the examination speed and reducing the infection risk. The rarity of the pathological images would deteriorate the accuracy performance of the trained models. In order to alleviate this problem, we employ augmentation and over-sampling to expand the samples of the classes that have few samples in the training batch. In order to achieve an impressive recognition performance and leverage the latent correlation between the traits and colors of feces pathological samples, a multi-task network is developed to recognize colors and traits of the macroscopic feces images. The parameter number of a single multi-task network is generally much smaller than the total parameter number of multiple single-task networks, so the storage cost is reduced. The loss function of the multi-task network is the weighted sum of the losses of the two tasks. In this paper, the weights of the tasks are determined according to their difficulty levels that are measured by the fitted linear functions. The sufficient experiments confirm that the proposed method can yield higher accuracies, and the efficiency is also improved.

The health condition of digestive system could be reflected from the macroscopic feces images, which could help preliminarily diagnose the diseases early. A large number of medical staffs are required to do the feces examination, which leads to a high infection risk, and the terrible smell would make medical staffs unable to work efficiently for a long time. The abnormal traits and colors of feces typically indicate that the patients are probably suffering from tumor or digestive-system diseases [8] . In addition, feces can also be applied to detect other diseases, such as COVID-19, and assess prognosis [36] . There are three main advantages for a computer-aid diagnosing approach. 1. There is no need to ask medical staffs to examine the feces in a manual way, which greatly reduces the infection risk and workload; 2. The automatic recognition is fast and the results are reliable; 3 The machine can stably work for a long time even with high-intensity workload.

Feces examinations include macroscopic examination and microscopic examination. In recent years, researchers proposed different methods for microscopic image-based diagnosing. Du et al. [16] proposed a novel framework, which is composed of convolutional neural network (CNN) and principal component analysis, to detect cells in the microscopic fecal images. Li et al. [32] proposed a deep learning-based visible component detection approach for microscopic fecal images. Microscopic examination with microscopes is accurate but timeconsuming and expensive, and the laboratorians must have proficient professional skills.

A reliable automatic macroscopic examination is necessary for assessing the health condition of digestive system. Computer technologies have been used to produce gastrointestinal surgery robotics that are deemed as an important next-generation operation mode [13, 27] . Unfortunately, automatic machine is still absent in macroscopic feces examination, so we develop an automatic macroscopic feces examination machine. In the customized medical equipment, the feces are required to put into the tube, and the trait and color of the feces are automatically recognized by computer vision technology. Some feces images are shown in Fig. 1 .

To develop a computer-aid multi-task light-weight network for macroscopic feces diagnosis, the contributions of this paper are summarized as follows.

(1) A database labeled in color and trait To our best knowledge, the database for the two important examination tasks, namely color recognition and trait recognition, was absent before our work. The color and trait of the macroscopic feces samples in our database were carefully labeled by professional doctors, so the database has a high research and medical value. In addition, the results on it are reliable. For the color recognition, the images are labeled into 'Black', 'Red' and 'Normal', the samples are shown in Fig. 2a -c, respectively. For the trait recognition, the images are labeled into 'Loose', 'Watery' and 'Normal', the samples are shown in Fig. 2d -f, respectively.

(2) Data augmentation and over-sampling

Although lots of macroscopic feces images were collected, the database is still classimbalanced because the probabilities of the occurrence of the feces classes are highly different. The traditional data augmentation methods are unsuitable for this task, because the localization of the tube and the camera imaging conditions are fixed in the controlled environment of our machine. Hence, it is unreasonable to directly arbitrarily rotate or add noise for data augmentation. Generative adversarial network (GAN) is widely used to augment data [39] . There are two main disadvantages for GAN- (3) Light-weight advantage

Considering the computing resource and the hardware cost of the machine, a light-weight structure is demanded. Tewari and Gupta [46] proposed an ultra-lightweight mutual authentication protocol to reduce storage and communication cost. Masud et al. [37] proposed a lightweight and physically secure authentication key for Internet of Medical Things. The sufficient experiments and discussions demonstrate that this protocol is robust and consumes fewer resources. In general, there are some advantages of a light-weight approach. 1. The requirement of computation hardware is low, so it is easy for the hospital to update the model if more additional data are collected; 2. The computational complexity can be low, and accordingly the computing speed can be fast; 3. The cost of the machine can be significantly reduced, so more hospitals can afford our machine.

(4) Multi-task recognition It is probable that the color and trait have latent relationship that was neglected in the former works. Thus we develop a fast and accurate multi-task light-weight diagnosis network, in which the parameters are shared by the two tasks, namely color recognition and trait recognition. The parameter number of a single dual-task network is generally much smaller than the total parameter number of two single-task networks. In addition, the multi-task network can mine the latent correlation between the trait and color, so the multi-task network yields higher accuracy and faster converge speed than the single-task networks.

Adaptive weighting can enhance the discrimination and improve the accuracy [1] . In the multitask network, the loss function is the weighted sum of the losses of the two tasks. The weights are determined by the learning abilities of the tasks. The accuracy of a single task on the training set is linearly fitted, the slope and the bias of the fitting function are used to measure the learning ability of the task. The higher weight should be given to the relatively difficult task. The rests of this paper are organized as follows: Section 2 introduces related works. Section 3 specifies our framework in detail. The experiments and discussions are in Section 4. The conclusions are drawn in Section 5.

The traditional examination relies on lots of manual operations that are typically based on some physical or chemical theories. Berhe et al. [7] evaluated the recognition rate of Kato-Katz for feces specimen examination, which is recommended as an effective diagnosis method by World Health Organization. Bergquist et al. [6] and Charoensuk et al. [12] compared different popular feces examination methods, including Kato-Katz, formalin-ethyl acetate concentration technique and fecal parasite concentrator kit. These methods achieved satisfactory performance, but they are difficult to spread to clinics due to some stringent requirements, including a large number of the people with professional skills and expensive consumables.

In terms of cost and accuracy, computer-aid diagnosis is considered as the future generation clinical assistant diagnosis technology. Alomari et al. [2] applied circular Hough transform to count abnormal cells. Carvalho et al. [9] combined support vector machine and fuzzy theory to diagnose lung disease. Baldoumas et al. [4] employed natural time analysis to design a conventional three-electrode electrocardiography system. Yang et al. [50] predicted computed tomography (CT) image from magnetic resonance imaging (MRI) by the nonlinear local descriptors, and achieved satisfactory performance. Liu et al. [35] proposed a deep iterative reconstruction estimation strategy with a 3D residual convolutional network structure to reconstruct the low-dose CT images. Cheng [14] developed a two-step grading scheme. Firstly, the testing image was sparsely represented by the set of the training images, then the image was graded by referencing the selected atoms. Nkamgang et al. [38] applied a neuro-fuzzy classifier to detect intestinal parasite and the input was the histogram of the orientation gradient feature of the microscopic images. Caselli et al. [10] designed an approach based on Bayesian to count the number of cells. Wang et al. [48] developed additive least square support vector machine to reduce the infection of missing data in the community health and epidemiological studies. Ginneken et al. [19] presented an active shape model, which used kNN-classifier to describe the shapes of the tissue or organ in lung and brain images.

The aforementioned methods are based on hand-crafted features, so they require the prior knowledge in the related fields. Deep learning models with self-learned structures, such as CNN, do not need the hand-crafted features, which have robuster and higher recognition performance. Deep learning models have achieved excellent performances in different computer vision tasks [23, 44, 47, 49] , such as denoising [5] , detection [15] and segmentation [53] . Accordingly CNN has been important to medical image analysis [20] .

Kermany et al. [26] established a deep learning framework to diagnose common treatable blinding retinal diseases. Kumar et al. [28] proposed a novel CNN feature extractor that had been pre-trained from natural images, and then the model was finetuned and optimized for the specific medical task. Tajbakhsh et al. [45] used pre-trained CNNs for the classification, detection, and segmentation in four distinct medical image datasets. Effective and reasonable fusion can improve the recognition accuracy [29, 30] . Chang et al. [11] fused the CT image domain information and the corrected images based on a CNN structure to generate hybrid images. Savelli et al. [42] presented a multicontext ensemble CNN to detect small lesions in CT images, different image patches were trained individually, and then these features were combined. With these combined features, CNN can better discover the details in medical images. Qureshi et al. [40] utilized 3D CNN architecture and brain MRI to detect schizophrenia discriminate patients with schizophrenia and normal healthy control subjects. Their approach achieved a high detection accuracy. Huang et al. [24] developed a two-stage scheme based on stationary wavelet transform and residual neural network to denoise the low dose CT images. Eun et al. [18] presented a novel ensemble CNN for nodule detection, and a non-nodule categorization was proposed to extend the learning capability of the network. Liu et al. [33] developed a novel cascaded deep CNN to complete two challenging tasks simultaneously, namely the automatic segmentation of brainstem gliomas and the prediction of genotype mutation status based on MRI. Liu et al. [34] developed a deep multi-task multi-channel learning framework for brain disease classification and clinical score regression, which used MRI data and demographic information. Jodeiri et al. [25] studied the Mask R-CNN on radiography image segmentation, meanwhile, transfer learning, multi-task learning and data augmentation were used to improve the learning ability of the network. Reena and Ameer [41] proposed a deep learning-based method to localize and recognize the leukocytes in peripheral blood.

Different from the other vision tasks, it is highly difficult to collect the database of medical image due to two main reasons. 1. The number of the patients with a specific disease is small. 2. The images contain much private information, so patients and their family are usually unwilling to share their pathological samples, and it is highly difficult to obtain the approval agreements from the patients, patients' families and hospitals. Hence, the scarcity of feces sample seriously hinders the research progress of macroscopic feces examination. Thus few studies focus on this field even though it is an important clinical examination.

Hachuel et al. [21] utilized ResNet [22] to recognize feces traits. They collected a large database, the images were labeled into 'Constipation', 'Loose' and 'Normal'. However, these images were collected by the patients in the uncontrolled environments, accordingly this method must be designed as a deep structure to overcome the interference in the uncontrolled environment. Yang et al. [51] proposed StoolNet to recognize the color of the feces images. StoolNet as a shallow CNN was carefully designed for the feces samples collected in real hospitals. Inspired by StoolNet, Leng et al. [31] collected a feces database with various traits, and developed a shallow CNN that was suitable for feces trait recognition.

As discussed above, the existing works mainly rely on a uniform distribution data, so the class-imbalanced problem is ignored. In addition, these methods cannot recognize the color and the trait of the feces images simultaneously, so the complementarity between two tasks was not leveraged. Meanwhile, the hand-craft-based classifiers are not robust enough, it is difficult to employ them into the real clinical environments. Besides, the weight selection in CNN-based multi-task learning is empirical, there is not a practical guidance to select appropriate weights for different tasks.

Automatic medical clinical diagnosis is one of the most important research fields in smart city [3, 17, 52] . In order to reduce the burden of the doctors, feces clinical diagnosis is studied in this paper. A multi-task light-weight CNN is developed for the color and trait recognition of macroscopic feces images. A practical data augmentation method and an over-sampling strategy are employed to solve the sample scarcity of minority classes. Meanwhile, the weights for the losses of the tasks are determined according to the task difficulties. The whole framework is shown in Fig. 3 .

Data augmentation is necessary because the number of pathological samples, especially minority class samples, is very limited. In the typical augmentation methods, the images are rotated or added noise; however, they are unsuitable for our machine. In our machine, the tube localization is fixed, so the rotated images are unreal, such as 90°, 180°and 270°rotation. Hence, the images of the minority classes are mirrored. Figure 4 shows the sample of a mirrored image.

Unfortunately, even after data augmentation, the augmented database is still classimbalanced because the occurrence probabilities of the feces classes are highly different. Random sampling is widely used in training, but it would degrade convergence and the recognition accuracy. If the database is significantly class-imbalanced, it is highly probable that the training batch does not contain enough samples of minority classes, then the training Fig. 3 The framework of our method Fig. 4 A mirrored image for data augmentation batch forces the network to focus on the majority classes with enough samples. A statistic over-sampling is proposed and employed for training batch selection to alleviate the classimbalance. P i is the probability that the i-th class is selected, P i, j is the probability that the j-th sample of the i-th class is selected, then

where m is the number of classes, m=3 in this paper. To ensure class-balance, i.e. each class is selected with the same probability, then

Thus

where c i is the number of the samples of the i-th class. With the over-sampling, the minority classes are selected more frequently, and the majority classes are selected less frequently. Accordingly, the numbers of all classes in the training batch are the same. In order to help readers to trace our oversampling strategy, the pseudo code is shown as follows:

Algorithm 1: Over-sampling Strategy

The loss of multi-task network is the weighted sum of the losses of two tasks. The weighs are determined according to the learning abilities of the tasks. The higher weight should be given to the relatively difficult task. Generally, the learning ability can be measured by the learning speed and the accuracy. The accuracy increment in training reflects the learning speed, so a mathematical model based on accuracy increment is established for quantitative analysis on the task difficulty.

The accuracies and corresponding epochs in training are used as the ordinate values and abscissa values, respectively, for linear function fitting. The least square method is adopted to calculate the slope k and the bias b.

where x is the epoch, y is the accuracy.

where y is the average accuracy and x is the average epoch value.

If the epoch number of the durative accuracy oscillation is less than t, then the network can be considered as converged. If the accuracies of continuous t epochs are greater than 95%, it is considered as similar convergence. The number can be set as different values for different tasks. The main reason is that if the training epoch is set very great, the influence of epoch number for the fitted function is much greater than that of accuracy. Hence, if the network is converged before the epoch number reaches the maximum epoch, N is the epoch where the network reaches convergence; or else N is the maximum epoch.

It is easy to understand that different tasks have different difficulty level. If one task is more difficult than the other one, the network should pay more attention to the more difficult one. Hence, a quantitative difficulty assessment is necessary. The fitted linear function reflects the learning speed and the learning ability of the network for a specific task; while the slope and the bias can jointly reflect the learning ability of a specific task. It is obvious that the greater the two values are, the better the learning ability of the network for the task is. The network should focus on the difficult task more than the easy one, so the loss weight of the difficult task should be higher than that of the easy one.

Compared with the traditional recognition methods, CNN is typically more robust and accurate. For the traditional methods, it is almost impossible to intuitively know which features are ideal to the specific tasks, especially multi-task task. It is difficult to extract discriminate features that are generally abstract and contain semantic information. CNN is free from handcrafted design and could ensure robustness. Meanwhile, the nonlinear multi-layer structure can effectively learn extracting strong semantic information, so the recognition performance of CNN is impressive. Hence, CNN is selected as the basic structure in our method.

A multi-task light-weight CNN is developed to recognize macroscopic feces images, the network structure has been shown in Fig. 3 . The input of the network is resized as 100 × 100, and the original image size is 480 × 640. The network is composed of three convolutional layers and a fully connected layer, the numbers of filters on the three convolutional layers are 32, 64 and 128, respectively. The sizes of convolutional filters are all 3 × 3. The input of the fully connected layer is a 256-dimentional vector, the numbers of the outputs for the color recognition and trait recognition are both 3. Batch size is set to 64 and the learning rate is set to 0.0001. In order to conveniently reproduce the structure, the key functions are defined as follows:

ReLU is the activation function:

Dropout is used on fully connected layers to avoid the over-fitting problem. Dropout is a regularization approach in the training CNNs by randomly dropping out some nodes during the training of the network [43] . With dropout, the training is done with a set of networks. This method, similar to ensemble learning, is powerful in reducing over-fitting. Dropout forces the neurons to learn the features of the input dependently, so each neuron do not rely on the other neurons. Here the dropout rate is set as 0.5. The outputs of the network are normalized by softmax function:

where i is the class index, y i is the probability of the i-th class, m is the number of classes. 

The experiment configuration includes Intel Core i5-7400 U CPU, 8GB internal storage, NVIDA GeForce GTX 1050 GPU. All codes were written in Python, and the deep learning framework was TensorFlow. It is highly difficult to collect medical feces images and find professional doctors who can label the samples both in color and trait. We collected a feces image database containing 435 images, in which all samples were labeled by professional doctors in color and trait. 75% images are used as the training samples; while the other images are used as testing samples. Our method is trained on the training data, and then our approach is tested on the testing data to assess the accuracy and efficiency.

It is important to measure the learning abilities of the model for different single tasks, and the weights for different tasks are roughly selected by their learning abilities. As the discussion in Section 3.3, t=5 and the maximum epoch is set to 300. Figure 5 shows the fitting results of two single tasks, namely color and trait recognition, on the non-augmented database. Color recognition is more difficult than trait recognition according to the slopes and the biases in Table 1 . However, the level of unbalanced data probably affect the results. Unbalanced data are high unfriendly to minority classes. In the following experiments, we explore the influence of unbalanced data. Figure 6 shows the fitting results of two tasks with data augmentation, which demonstrates that our data augmentation improves the convergence, including accelerating convergence speed and reducing the convergence difficulty. The slopes and the biases with different weights are shown in Table 2 . The difficulty levels of the two tasks in the multi-task lightweight CNN are similar, so their weights should also be similar.

If the class-imbalance problem is not solved, it is difficult to extract the discriminate features of minority classes. Data augmentation can relieve class-imbalance problem. Benefit from the data augmentation strategy, the single-task networks can converge stably and fast. However, the single task cannot converge without data augmentation strategy, which can be observed in Fig. 5 .

In order to intuitively compare the regressed values, Fig. 7 shows the observed values of the fitted function at 300-th epoch with different weights. In general, the accuracies of the multitask light-weight CNN for two tasks are satisfactory when the weights for two tasks are similar. Besides, it also indicates that the two clinical examinations are highly related, because the learning performance of the network is very similar. The single-task learning cannot converge very fast, while the multi-task learning can converge much faster. Because the learned knowledge from the multi-task training simultaneously improves the accuracies of both two tasks. The accuracies on testing set are shown in Table 3 .The multi-task strategy can significantly improve the performance, especially in the clinical diagnose tasks, because there is a latent relationship between different clinical examinations, such as feces color examination and feces trait examination.,The information of different tasks can mutually complement each other and improve the recognition performance. Because the network is shareable for different tasks, the neurons jointly extract more robust and general features. Figure 8 reveals the relationship between the network depth and the accuracy of multi-task network with augmentation. In this experiment, all the hyperparameters are fixed. The results demonstrate that "deep is not always good". Even two tasks have similar difficulty levels, their suitable structures are possible various. Besides, even a task is fixed, its difficulty level in different network structures are also possible various. For the color recognition task, a shallow network is more suitable; while, for the trait recognition task, a deep network is more appropriate. Hence, for multitask recognition, especially there are some latent relationships between the two tasks, a difficulty level assessment is necessary.

In addition, the efficiencies are compared in Table 4 . The running time is strongly related with the hardware. Our running time is satisfactory and there is an obvious space to improve the performance if higher performance hardware is used. Our method is a little slower than the existing state-of-the-art feces diagnosis methods. However, both [31, 51] are for single task, our method can diagnose the trait and color simultaneously and has higher accuracies, so our running time is much less than the sum running time of the two methods. Feces recognition is an important clinical examination., A multi-task light-weight automatic approach is proposed to recognize abnormal color and trait of the macroscopic feces images. A practical data augmentation method and an over-sampling training strategy are developed to overcome the scarcity of the pathological samples. At the same time, the threshold strategy is proposed based on the linear functions. The multi-task light-weight structure requires much less parameter number than the total parameter number of several single-task structures. Besides, the recognition accuracies of two tasks are both improved by the multi-task network. Our method can be easily modified to embed into the other tasks, such as biometric recognition, clinical diagnosis and image classification. If the two tasks are related, i.e., there is a latent relationship between different tasks, the multi-task network can achieve satisfactory performance. Meanwhile, the proposed data augmentation and oversampling method can also be utilized to other unbalanced-data tasks. However, our method is designed especially for a controlled environment of a medical clinical machine. For users',self-detection, the imaging devices and the environments are different and complexity. Hence, in the future works, we would try to develop the method that can be used for users' self-detection.

Multi-task CNN model for attribute prediction

Automatic detection and quantification of WBCs and RBCs using iterative structured circle detection algorithm

Impact of digital fingerprint image quality on the fingerprint recognition accuracy

A prototype photoplethysmography electronic device that distinguishes congestive heart failure from healthy individuals by applying natural time analysis

Half quadratic splitting method combined with convolution neural network for blind image deblurring

Diagnostic dilemmas in helminthology: what tools to use and when

Variations in helminth faecal egg counts in Kato-Katz thick smears and their implications in assessing infection status with Schistosoma mansoni

Global burden of irritable bowel syndrome: trends, predictions and risk factors

Lung-nodule classification based on computed tomography using taxonomic diversity indexes and an SVM

A Bayesian approach for coincidence resolution in microfluidic impedance cytometry

A CNN hybrid ring artifact reduction algorithm for CT images

Comparison of stool examination techniques to detect Opisthorchis viverrini in low intensity infection

Low-dose CT with a residual encoder-decoder convolutional neural network

Sparse range-constrained learning and its application for medical image grading

Object detection based on multi-layer convolution feature fusion and online hard example mining

Automatic classification of cells in microscopic fecal images using convolutional neural networks

Blockchain-based authentication and authorization for smart city applications

Single-view 2D CNNs with fully automatic non-nodule categorization for false positive reduction in pulmonary nodule detection

Active shape model segmentation with optimal features

Reliable label-efficient learning for biomedical image recognition

Augmenting gastrointestinal health: a deep learning approach to human stool recognition and characterization in macroscopic images

Deep residual learning for image recognition

An efficient CNN model based on object-level attention mechanism for casting defects detection on radiography images

Two stage residual CNN for texture denoising and structure enhancement on low dose CT image

Fully automatic estimation of pelvic sagittal inclination from anterior-posterior radiography image using deep learning framework

Identifying medical diagnoses and treatable diseases by image-based deep learning

Next-generation robotics in gastrointestinal surgery

An ensemble of fine-tuned convolutional neural networks for medical image classification

Alignment-free row-co-occurrence cancelable palmprint fuzzy vault

Dual-source discrimination power analysis for multi-instance contactless palmprint recognition

A light-weight practical framework for feces detection and trait recognition

FecalNet: automated detection of visible components in human feces using deep learning

Joint classification and regression via deep multi-task multichannel learning for Alzheimer's disease diagnosis

A cascaded deep convolutional neural network for joint segmentation and genotype prediction of brainstem gliomas

Deep iterative reconstruction estimation (DIRE): approximate iterative reconstruction estimation for low dose CT imaging

Manifestations and prognosis of gastrointestinal and liver involvement in patients with COVID-19: a systematic review and meta-analysis

A lightweight and robust secure key establishment for internet of medical thins in COVID-19 patients care

Automatic the clinical stools exam using image processing integrated in an expert system

Augmenting data with GANs to segment melanoma skin lesions

3D-CNN based discrimination of schizophrenia using resting-state fMRI

Localization and recognition of leukocytes in peripheral blood: a deep learning approach

A multi-context CNN ensemble for small lesion detection

Dropout: a simple way to prevent neural networks from overfitting

JPEG steganalysis based on ResNeXt with gauss partial derivative filters

Convolutional neural networks for medical image analysis: full training or fine tuning

Cryptanalysis of novel ultra-lightweight mutual authentication protocol for IoT devices using RFID tags

Image denoising using deep CNN with batch renormalization

Tracking missing data in community health studies using additive LS-SVM classifier

Multi-task CNN for restoring corrupted fingerprint images

Predicting CT images from MRI data through feature matching with learned nonlinear local descriptors

StoolNet for color classification of stool medical images

Four-image encryption scheme based on quaternion Fresnel transform, chaos and computer generated hologram

Mask-refined R-CNN: a network for refining object details in instance segmentation

Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations