key: cord-0527070-helvtwea
authors: Zhang, Jinghua; Li, Chen; Yin, Yimin; Zhang, Jiawei; Grzegorzek, Marcin
title: Applications of Artificial Neural Networks in Microorganism Image Analysis: A Comprehensive Review from Conventional Multilayer Perceptron to Popular Convolutional Neural Network and Potential Visual Transformer
date: 2021-08-01
journal: nan
DOI: nan
sha: 2b62629eedfde673362296a5cdeacf60ffc8da50
doc_id: 527070
cord_uid: helvtwea

Microorganisms are widely distributed in the human daily living environment. They play an essential role in environmental pollution control, disease prevention and treatment, and food and drug production. The analysis of microorganisms is essential for making full use of different microorganisms. The conventional analysis methods are laborious and time-consuming. Therefore, the automatic image analysis based on artificial neural networks is introduced to optimize it. However, the automatic microorganism image analysis faces many challenges, such as the requirement of a robust algorithm caused by various application occasions, insignificant features and easy under-segmentation caused by the image characteristic, and various analysis tasks. Therefore, we conduct this review to comprehensively discuss the characteristics of microorganism image analysis based on artificial neural networks. In this review, the background and motivation are introduced first. Then, the development of artificial neural networks and representative networks are presented. After that, the papers related to microorganism image analysis based on classical and deep neural networks are reviewed from the perspectives of different tasks. In the end, the methodology analysis and potential direction are discussed.

Microorganisms are tiny living organisms that can appear as unicellular, multicellular, and acellular types [101] . Their examples are shown in Fig. 1 . Some microorganisms are benefiting, such as Lactobacteria can decompose substances to give nutrients to plants [75] , Actinophrys can digest the organic waste in sludge and increase the quality of freshwater [180] , and Rhizobium leguminosarum can help soybean to fix nitrogen and supply food to human beings [8] . However, there are also many harmful microorganisms, such as Mycobacterium tuberculosis can lead to disease and death [46] and the novel coronavirus disease 2019 (COVID-19) constitutes a public health emergency globally [124] . Therefore, microorganism research plays a vital role in pollution monitoring, environmental management, medical diagnosis, agriculture, and food production [75, 85] , and the analysis of microorganisms is essential for related research and applications [88] . Fig. 1 The environmental microorganism images provided in [93] .

In general, microorganism analysis methods can be summarised into four categories: chemical (e.g., chemical component analysis), physical (e.g., spectrum analysis), molecular biological (e.g., DNA and RNA analysis), and morphological (e.g., manual observation under a microscope) methods [85] . Their main advantages and disadvantages are compared in Table 1 . The chemical method is highly accurate but often results in secondary pollution of chemical reagents [85] . The physical method also has high accuracy, but it requires expensive equipment [85] . The molecular biological method distinguishes microorganisms by sequence analysis of the genome [171] . This strategy needs expensive equipment, plenty of time, and professional researchers. The morphological method is the most direct and brief approach, where a microorganism is observed under a microscope and recognized manually based on its shape [117] . The morphological method is the most cost-effective of the above methods, but it is still laborious, tedious, and time-consuming [180] . Besides, the objectivity of this manual analyzing process is unstable, depending on the experience, workload, and mood of the biologist significantly. Therefore, developing an automatic microorganism image analysis system is of great signifi-cance. Nevertheless, it faces many challenges. As the microorganism examples are shown in Fig. 1 , It can be found that the backgrounds of Epistylis contain a lot of noise and impurities. Actinophrys has lots of filopodia that are easy to be under-segmented. The examples of Arcella and Noctiluca demonstrate that different light conditions can lead to entirely different color features. The high transparency of Noctiluca results in inconspicuous texture features. However, broad application occasions require the development of flexible and robust Microorganism Image Analysis (MIA) algorithms. Additionally, the related algorithms are various due to the numerous analysis tasks. Table 1 A comparison of traditional methods for microorganism analysis [82] .

Advantages Disadvantages Artificial Intelligence (AI) technology has developed rapidly in recent years [185] . It achieves outstanding performance in many fields of image analysis and processing, such as autonomous driving [137, 21, 161] , face recognition [98, 160, 32] , and disease diagnosis [146, 170, 183, 2] . AI can undertake laborious and time-consuming work and quickly extract valuable information from image data. Therefore, AI shows potential in MIA. Besides, AI has a robust objective analysis ability in MIA and can avoid subjective differences caused by manual analysis. To some extent, the misjudgment of biologists can be reduced, and the efficiency can be improved.

As an essential part of artificial intelligence technology, Artificial Neural Network (ANN) is originally designed according to the biological neuron [103] . Due to the limitation of computer performance, the difficulty of training, and the popularization of Support Vector Machine (SVM), the early ANN development once fall into a state of stagnation [172] . After that, with the improvement of computer performance, Convolutional Neural Network (CNN) shows an overwhelming advantage in image recognition [72] , and ANN is paid attention to again and developed rapidly. We find that ANNs are widely used in MIA thorough investigation because they can learn valuable patterns from enormous data and features.

AI is an umbrella term encompassing the techniques for a machine to mimic or go beyond human intelligence, primarily in cognitive capabilities [128] . Fig. 2 provides the structure of AI technology [95, 185] . AI has several important sub-domains, such as Machine learning (ML), computer vision, natural language processing. Among them, ML technology has been widely used in the MIA task, such as Environmental Microorganism (EM) segmentation [88, 180] , Herpesvirus detection [35] , and Tuberculosis Bacilli (TB) classification [133, 114] . As shown in Fig. 2 , ML can be grouped into conventional methods and ANNs. In conventional methods, SVM, k-Nearest Neighbor (KNN), Random Forest (RF), and other methods have been applied to the MIA task. For example, work [84] proposes an automatic EM classification system based on content-based image analysis techniques. Four features (histogram descriptor, geometric feature, Fourier descriptor, and internal structure histogram) are extracted from the Sobel edge detector-based segmentation result for training SVM to perform the classification task. For ten EM classes tested in this work, the mean of average precisions obtained by the system amounts to 94.75% [75] . An automatic identification approach of TB is proposed in [43] . The Canny edge technique is applied for edge detection, followed by non-maxima suppression and hysteresis threshold operations. After that, a morphological closing operation is applied. Then compactness and eccentricity features are extracted in one branch, and the same segmented images are passed through k-means clustering in the second branch. Each branch independently goes to the classification part using the nearest neighbor classifier. The average results obtained from the two branches are 93.30% and 100% sensitivity for the first and second branches [75] . In [36] , a MIA work based on the ZooImage automated system is introduced, where 1437 objects of 11 taxa are classified using four shape features and a RF classifier. The classification accuracy of 83.92% is finally achieved [82] .

ANNs also play a vital role in the MIA task. Fig. 2 shows that they include classical neural networks and deep neural networks. In the early years, due to the computer performance limitation, classical neural networks, represented by MLP, Radial Basis Function Neural Network (RBF), and Probabilistic Neural Network (PNN), are applied to the MIA tasks. For example, in [30] , human experts' performance in identifying Dinoflagellates is compared to that achieved by two ANN classifiers (MLP and RBF) and two other statistical techniques, KNN and Quadratic Discriminant Analysis (QDA). The data set used for training and testing comprised a collection of 60 features that are extracted from the specimen images. The result shows that the ANN classifiers outperform classical statistical techniques. Extended trials show that the human experts achieve 85% accuracy while the RBF achieves the best performance of 83%, the MLP 66%, KNN 60%, and the QDA 56%. The work [76] uses PNN to select the best identification parameters of the features extracted from the microorganism images. PNN is then used to classify the microorganisms with a 100% accuracy using nine identification parameters.

Later, with the significant improvement of computer performance, the development of neural network theory, and the proposal of CNN, deep neural networks show an overwhelming advantage in image analysis, including microbial image analysis. For example, the work [102] uses U-Net to perform the Rift Valley virus segmentation and achieves a Dice score of 90% and Intersection Over Union (IOU) of 83.1%. In [157] , transfer learning based on Xception is applied to perform the bacterial classification. Seven varieties of bacteria for recognition that might be lethal for humans are chosen for the experiment. The performance is evaluated on 230 bacteria images of seven varieties from the test dataset, which shows promising performance with approximately 97.5% prediction accuracy in bacteria image classification.

In conclusion, we can find that conventional machine learning methods and classical neural network methods have similar workflows in the MIA task. They typically rely heavily on feature engineering [4] . These workflows usually contain image acquisition, image preprocessing, segmentation, feature extraction, classifier design, and evaluation. The reliability of accuracy depends on the design and extraction of features [4] . In recent years, with the development and popularization of CNN, which is one of the most important parts of deep neural networks, the MIA task can work without feature engineering. Compared with the classical ANN method, CNN can directly extract valuable features from the image through the convolutional kernel. This kind of ability makes the research and application of CNN in MIA increase rapidly and obtain overwhelming advantages.

This paper focuses on the development and application of ANNs in the MIA task. A comprehensive overview of techniques for the image analysis of microorganisms using classical and deep neural networks is presented. The motivation is to clarify the development history of ANNs, understand the popular technology and trend of ANN applications in the MIA field. Besides, this paper also discusses potential techniques for the image analysis of microorganisms by ANNs. To our best knowledge, some review papers summarize research related to the MIA task, such as papers [75, 83, 85, 86, 181] . In the following part, we go through these review papers.

The review [75] comprehensively analyzes the various studies focusing on microorganism image segmentation methods from 1989 to 2018. These methods include classical methods (e.g., edge-based, threshold-based, and regionbased segmentation methods) and machine learning-based methods (supervised and unsupervised machine learning methods). About 85 related papers are summarized in this review. The ANN-based segmentation method is only one part of this review. In the review [83] , to clarify the potential and application of different clustering techniques in MIA, the related works from 1997 to 2017 are summarized while pinning out the specific challenges on each work (area) with the corresponding suitable clustering algorithm. More than 20 related research papers are summarized in this review. The review [85] summarizes the development history of microorganism classification using contentbased microscopic image analysis approaches. It introduces the classification methods from different application domains, including agricultural, food, environmental, industrial, medical, water-borne, and scientific microorganisms. Around 240 related works are summarized. The classification methods discussed in this review contain ANNs and many ML methods like SVM, KNN, and RF. [181] proposes a comprehensive review focusing on microorganism counting based on image analysis approaches. It summarizes more than 144 related papers from 1980 to 2020. The image analysis approaches include both classical image analysis methods and deep learning methods. ANN is only a part of the methods discussed in this review.

To sum up, the review [83] only summarizes the MIA method based on clustering techniques. The review papers [75, 85, 181] focus on segmentation, classification, and counting tasks. Although the methods discussed include some ANN methods, they are not the central part of these papers. Hence, to comprehensively understand the development history, popular technology, and trend of ANNs in the MIA field, we conduct this review based on our previous work [86] .

Our previous work [86] proposes a brief review for content-based MIA using classical and deep neural networks. It briefly summarizes 55 related papers from 1992 to 2017. To illustrate the differences, the characteristics of our current review and previous work are summarized in Tab. 2. The current review summarizes 96 papers related to classification, segmentation, detection, counting, feature extraction, image enhancement, and data augmentation from 1992 to 2020. The current review comprehensively analyzes the applications of ANN in the MIA field by providing the introduction of representative ANNs and the discussion of the method in each paper. Besides, it also provides summary tables, statistic analysis, and potential directions. 

To illustrate the collection process of related papers, Fig. 3 provides the details. Based on our previous review, there are 55 related papers from 1992 to 2017. We collect 51 papers from several databases, including Google Scholar, IEEE, Elsevier, Springer, ACM, and other academic databases. The keywords used for searching contains "microorganism image analysis", "classification", "detection", "segmentation", "counting", "artificial neural network", "convolutional neural network", and "deep learning". Their combinations are used to search related papers in these databases. Besides, a specific naming rule based on the publication year and article title is designed to avoid duplicate paper downloads, e.g., "2010-Bacteria classification using neural network". After carefully reading, 17 papers on other topics are excluded, and six related paper are added. Finally, a total of 95 research papers are retained for our review. The popularity and trend of ANN for the analysis of microorganism images are provided in Fig. 4 . Prior to the popularity of deep ANNs, most computer vision research works focus on developing feature engineering, which is usually based on specific domain knowledge [97] . After the proposal of AlexNet, deep learning methods (especially deep CNNs), which can learn powerful feature representations with multiple levels of abstraction directly from raw images, start to lead the research trend of computer vision [97] . Many pieces of research are devoted to studying new network structures, loss functions, attention mechanisms, and other hotspot issues. The development trend of ANNs also can be reflected in MIA tasks. Before 2014, most works adopted classical ANNs based on feature engineering. After, most researches employ deep ANNs to perform MIA tasks, especially deep CNNs. 

This review is structured as follows: we begin by introducing the development of ANN and some representative networks used in MIA tasks in Sec. 2; Sec. 3 introduces the MIA work using classical ANN methods; in Sec. 4, deep ANN methods employed in the MIA tasks are summarized; Sec. 5 presents the method analysis and potential directions; Sec. 6 concludes this paper.

In this section, to better understand the development of ANN in the MIA task, we briefly introduce the evolution history of ANN. Besides, some representative ANN structures in MIA tasks are introduced.

The development of ANN has a long history. As shown in Fig. 5 , its course can be divided into three stages [172] . The ANN research can be traced back to M-P Neuron, designed according to the biological neuron, proposed in [103] in 1943. This model is a physical model made up of elements such as resistors. Perceptron, whose learning rule is based on the original M-P Neuron, is proposed in [131] . After training, perceptron can determine the connection weights of neurons [172] . Since then, the first upsurge of ANN has started. However, Minsky et al. point out that perceptron can not be applied to solve the XOR problem (the linearly inseparable problem) in [105] , which makes the ANN research fall into the trough. In the second stage, with the proposal of the Hopfield network [61] in 1982, the study of ANN attracts attention again. After that, with the proposal of the Back-Propagation (BP) algorithm [134] , the XOR problem can be solved by training MLP using the BP algorithm. Besides, LeCun et al. propose CNN by introducing the convolutional layer, inspired by the biological primary visual cortex, into the neural network [79, 80] . However, limited by the computer performance and neural network theory at that time, although BP enables CNN to be trained, there are still problems such as too long training time and easy over-fitting. With the popularization of SVM, the ANN research falls into the trough again.

Although the ANN research falls into the trough, Hinton and Bengio et al. still focus on the research of ANNs [53, 54, 55, 108, 136, 13, 14, 49, 78] . Benefit from their research progresses, ANNs show overwhelming advantages in speech and image recognition [72] . Since then, the third rise of ANN has begun. Different from the second stage, the computer performance is significantly improved. Training a deep neural network does not need such a long time as before. Besides, with the popularization of the internet, more and more data can be used for training, reducing the over-fitting problem.

The ANN-based MIA research also follows the development trend of ANN. Some representative ANN structures in MIA tasks are introduced in the following part.

ANN plays an essential role in the MIA task. We investigate related research papers and find that early MIA tasks based on classical neural networks rely on the "Feature Engineering + Classifier" workflow. They usually extract features from images by the existing experience or domain knowledge. After extraction, these features are used for training classifiers to perform corresponding tasks. Among these classifiers, MLP is the most widely used. With the widespread use of CNNs, the MIA task can no longer rely on feature engineering. Among these CNNs, AlexNet [72] , VGGNet [145] , ResNet [52] , and Inception [149] are popular in the microorganism image classification task, U-Net [130] and its improved networks are also widely used for the microorganism segmentation task, and YOLO [126] is also widely used in the microorganism detection task. To understand the characteristics of these networks, we provide brief descriptions of MLP, AlexNet, VGGNet, ResNet, Inception, U-Net, and YOLO below.

As is shown in Fig. 6 , an MLP usually consists of three layers: an input layer, a hidden layer, and an output layer. The early MLP is a class of feed-forward ANN. Like the perceptron, the early MLP can determine the connection weight between two layers by the error correction learning, which adjusts the connection weight according to the error between the expected output and the actual output. However, error correction learning cannot work between multiple layers. For this reason, the early MLP uses the random number to determine the connection weight between the input layer and the hidden layer, and the error correction learning is used to adjust the connection weight between the hidden layer and the output layer. With the proposal of the BP algorithm, MLP can adjust the connection weight layer by layer. 

AlexNet adopts an architecture with consecutive convolutional layers. It is the first ANN to win the ILSVRC 2012 [123] . After its triumphant performance, the champion architectures in the following years are all deep CNNs. As shown in Fig. 7 (a), AlexNet consists of eight layers, including five convolutional layers and three fully connected layers. The significance of AlexNet is that they use the Rectified Linear Unit (ReLU) as an activation function instead of the sigmoid or hyperbolic tangent function [123] . AlexNet is trained by multi-GPU, which can cut down on the training time. Besides, different from the conventional pooling methods, overlapping pooling is introduced to AlexNet. AlexNet has 60 million parameters [135] . To avoid the overfitting problem, data augmentation and dropout are employed.

[145] proposes a CNN model named VGGNet, which wins second place in ILSVRC 2014. VGGNet is characterized by its simplicity, using only the 3 × 3 convolutional filter, which is the smallest size to capture the spatial information of left/right, up/down, center. The layer number of VGGNet could be 16 or 19. As the architecture of VGG-16 is shown in Fig. 7 (b), the network consists of 13 convolutional layers, five max pooling layers, three fully connected layers, and a softmax layer. VGG-19 has three more convolution layers than VGG-16. VGG-16 and VGG-19 comprise 138 and 144 million parameters [145] . The significance of VGGNet is the use of 3 × 3 convolutional filters, whose stack could obtain the same receptive filed as the bigger convolutional filter used in AlexNet. Besides, this stack, which has fewer parameters than the bigger convolutional filter used in AlexNet, allows VGGNet to have more weight layers. It can make better performance [123] . [149] proposes GoogLeNet, which wins first place in ILSVRC 2014. Fig. 7 (c) shows that it consists of 22 convolutional layers and five pooling layers. Nevertheless, this network has only 7 million parameters [123] . The significance of GoogLeNet is that it first introduces the Inception structure. As the Inception structure is shown in Fig. 7 (e), it first uses the idea of using the 1 × 1 convolutional filter to reduce the channel number of the previous feature map to reduce the total number of parameters of the network. Besides, the structure uses convolutional filters of different sizes to obtain multi-level features to improve the performance [180] .

In the Inception-V2, batch normalization is added, and a stack of two 3 × 3 convolutional filters is employed instead of a 5 × 5 convolutional filter to increase the network depth and reduce the parameters [65] . In Inception-V3, a stack of 1 × n and n × 1 convolutional filters is used to replace the n × n convolutional filter [150] . In Inception-V4, the idea of residual learning block is incorporated [148] .

From AlexNet (five convolutional layers), VGGNet (16 or 19 convolutional layers), to GoogLeNet (22 convolutional layers), the structure of CNNs is getting deeper and deeper. Deep CNNs allow more complex feature extraction, which leads to better results in theory. However, extending the depth of CNN by simply adding layers may lead to gradient vanishing or explosion problems [15, 49] . To solve these problems, [52] proposes ResNet, whose structure is shown in Fig. 7 (d). ResNet employs a novel approach, called identity mapping, to increase network depth for improving performance. Fig. 7 (f) shows the residual learning block-based on identity mapping. In this block, H(x) denotes an underlying mapping to be fit by the stacked layers, and the input feature map of these layers is denoted as x. The residual function can be denoted as F (x) = H(x) − x. The residual network's primary purpose is to make a deeper network from a shallow network by copying weight layers in the shallow network and setting other layers in the deeper network to be identity mapping.

U-Net is a CNN initially used to perform the microscopic image segmentation task [179] . As the structure is shown in Fig. 8 , U-Net is symmetrical, consisting of a contracting path (left side) and an expansive path (right side), which gives it the u-shaped architecture. The training strategy of U-Net relies on the strong use of data augmentation to make more effective use of the available annotated samples [130] . Besides, the end-to-end structure of U-Net can retrieve the shallow information of the network [130] . 64 

[126] proposes a novel framework called YOLO, which uses the whole topmost feature map to predict confidences for multiple categories and bounding boxes [184] . YOLO's main idea is to divide the input image into an S × S grid, and the grid cell is responsible for detecting the object centered in that cell [126] . Each grid cell predicts B bounding boxes and confidence scores for those boxes [126] . These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts [126] . Each grid cell also predicts C conditional class probabilities. It should be noticed that only the contribution from the grid cell containing an object is calculated [184] . The structure of YOLO is shown in Fig. 9 . It consists of 24 convolutional layers and two fully connected layers. Instead of the Inception used by GoogLeNet, they use 1 × 1 reduction layers followed by 3 × 3 convolutional layers.

An overview of the MIA task using classical neural networks is discussed in this section. We divide the tasks into different categories according to their tasks. Due to the classification works occupying most of all MIA tasks using classical neural networks, the related works are analyzed according to their used networks. For other tasks, the work is analyzed independently because each task contained a limited number of papers. Besides, we provide a summary to summarize the characters of the MIA based on classical neural networks and a table for readers to find relevant research works conveniently.

Through the investigation of MIA based on classical neural networks, it can be found that MLP is the most widely used network. Around 30 papers adopt MLP-based methods to perform the microorganism image classification task, but they involve various microorganism categories, with differences in image pre-processing, feature extraction, and MLP structure. Therefore, we critically and comparably analyze these papers according to the details of their methods.

Since the direct use of MLP to process image data requires a large number of neurons to connect each pixel in the image, most of the related work adopts the MLP based on feature engineering. Before feature extraction, image data usually need to be pre-processed. Therefore, to comprehensively understand the characteristic of MLP-based microorganism image classification methods, the analysis of the image pre-processing, feature extraction, and MLP implementation detail are provided, respectively.

Image Pre-processing: Most MLP-based microorganism image classification methods employ image pre-processing before extracting features. In most of these related works, the purpose of image pre-processing is to segment the Region of Interest (ROI). The segmentation methods can be summarized into thresholding-based, edge detection-based, region growing-based, active contour-based, and manual methods. [47, 5, 92, 89, 90, 19, 91, 159, 73] employ thresholding-based segmentation methods to split the microorganism objects from the images. [47] proposes a framework to recognize protozoa and metazoa. The rough ROIs are selected manually after the image contrast enhancement and denoising first. Then, the accurate ROIs are generated by thresholding-based approach, whose threshold value can be defined by manual method, Otsu, or Entropy method. As an extension work, [5] uses a similar segmentation approach to identify stalked protozoa. [92, 89] propose a framework to perform the bacteria classification task. An approach named ITSMM is presented to detect the edges of bacteria, which combines the iterative threshold value segmentation with mathematical morphology edge detection. As the extension works, [90, 19, 91] use similar segmentation approaches for different microorganism edge detection tasks. [159] focus on the identification task of powdery mildew spores. The image segmentation process contains Otsu-based binarization, image smoothing, connected domain analysis. Before segmentation, sequential pre-processing operations are applied first, including illumination compensation, greying, and image enhancement. Analogously, [73] uses Otsu-based segmentation method before the feature extraction in the soil microorganism recognition task. [30, 16, 107] adopt edge-based segmentation methods. [30] compare different Dinoflagellates classification methods, including human experts, two ANN methods (MLP and RBF), and two ML methods (KNN and QDA). Sobel edge detection is applied in ANN and ML classification methods to segment the Dinoflagellate specimen from the background, debris, and clutter. [16] utilizes the Marr-Hildreth operator to perform edge detection-based segmentation for classifying individual bacteria and non-bacteria. [107] introduces a preliminary study on automated freshwater algae recognition and classification system, in which image segmentation is applied to isolate the individual objects by using Canny edge detection algorithm after image contrast enhancement, image greying, and image binarization. [29, 139, 111, 114] use region-based methods to perform the segmentation tasks.

[29] develops a recognition system to automatically categorize marine Dinoflagellate, which uses local maximum detection and region growing algorithm to perform pre-processing operations after the wavelet transform of the input image. [139] develops an automated analysis system for the identification of phytoplankton. It uses a region growing approach to separate the organisms from the background of the microscopic image. The edge detection using the Sobel operator and contrast adjustment is performed first to generate better segmentation results. Then, the watershed segmentation is used to obtain the rough segmentation results. Finally, region growing is applied for accurate segmentation. [111, 114] develop methods based on the image analysis technique and ANN to detect the Mycobacterium tuberculosis in the tissue section. CY-based color filter and k-mean clustering are first applied to remove pixels unrelated to red color. Then, the median filter and region growing are used to obtain the segmentation results.

In addition to using thresholding-based, edge detection-based, and region growing-based for segmentation, some works use active contour-based methods and manual methods. For the active contour-based method, [121] proposes a digital tuberculosis object-level and image-level classification method based on MLP activated by the SVM learning algorithm. In the pre-processing operations, the non-uniform illumination correction technique is applied to reduce the influence caused by non-uniform illumination. Then, the active contour segmentation method is used to identify the presence of bacilli and outliers. For manual methods, [41] develops a phytoplankton identification and counting system, which uses manually selected threshold values to segment the ROIs. Additionally, the pre-processing applied in [163, 164] also contains image segmentation, but the detailed methods are not mentioned.

Although most papers utilize image pre-processing for object segmentation, some papers use pre-processing without segmentation before feature extraction. For instance, [28] proposes an automatic taxonomic classification method, in which the feature extraction is applied after the Cymatocylis image pre-processing, which only includes digitization, binarization, and manual noise reduction. Besides, [155] introduces an automatic method to detect TB in sputum smears. The pre-processing operations, including edge detection, region labeling and removal, edge pixel linking, and boundary tracing, are applied one by one for subsequent shape feature extraction. Additionally, [166] proposes a MLP-based method for the identification task of these two protozoa. The image data is directly clipped to fit the input of the networks.

Through the comparative analysis of the pre-processing of classification tasks based on MLP above, it can be found that although the methods implemented are variable, most of the pre-processing methods aim to achieve the segmentation of microorganism objects. Most of the remaining pre-processing methods focus on reducing noise in the image data and highlighting the objects, which can facilitate effective feature extraction. However, there are some MLP-based works [10, 133, 70] performing the microorganism image classification task without image pre-processing. In addition, some works [45, 68, 48, 177] do not indicate whether image pre-processing operations are used.

Feature Extraction: Most MLP-based microorganism image classification methods employ feature extraction in their workflows, which is a crucial step for classification performance. Therefore, to summarize their characteristics, we analyze them from different feature categories, including shape, texture, and color.

The shape is an important visual feature, which is widely utilized to describe microorganism image content in MLP-based classification methods. The shape features used in MLP-based methods mainly include geometrical descriptors, spectral transform, invariant moments, and junction descriptors. Geometrical descriptors are common and simple shape features [178] , which usually contain area, perimeter, eccentricity, circularity, and other descriptors. Spectral descriptors can overcome the problem of noise sensitivity and boundary variations by analyzing shape in the spectral domain [178] . Fourier descriptor is one of the most widely used spectral descriptors. The invariant moment is also widely used to describe the shape information. Besides, some works extract junction descriptors as the shape features.

Geometrical descriptors are widely used in MLP-based microorganism classification tasks, but the implementation details are various. Some MLP-based classification works only or mainly adopt geometric descriptors. For instance, [45] adopts simple and common geometrical descriptors (area, eccentricity, circularity, and length of the skeleton) as the microorganism features. [133] extract geometrical features for TB classification, including perimeter, area, radii, circularity, compactness, eccentricity, and tortuosity. Besides, [114, 143] also employ geometrical features for TB classification. Additionally, different from simple measurement of objects, [68] measures the geometrical details of microorganism structures, including hooks, dorsal bar, and other structures. Nevertheless, some works adopt multiple types of features, which contain not only geometrical features but also other features. For example, [41] extracts not only geometrical descriptors but also color and other shape features, like color distribution and Fourier descriptor, for the sequential analysis. [163, 164] also extract multiple types of features as the original features, which include the axis, compactness, circularity, and other geometrical descriptors. After extracting the original features, a classification tree algorithm is applied to select the effective features from original features to train the neural networks. [92, 89] extract geometrical features, contour invariant moments, and texture features in their works. Principal Component Analysis (PCA) is used for dimensionality reduction of features. [48, 47] adopt not only geometrical descriptors (area, perimeter, length, width, etc.) but also other shape features to perform the semi-automatic recognition of several protozoa and metazoa commonly used in sewage treatment. [10, 90, 19, 5, 91, 70, 107, 159] also use geometric features mixed with other features in their works.

Spectral descriptors contain Fourier descriptors and wavelet descriptors, which are derived from spectral transforms on 1-D shape signatures [178] . Fourier descriptor is mainly used in MLP-based microorganism classification works. [28] adopts the 2-D Fast Fourier transform for shape feature extraction in Cymatocylis classification tasks. [155] uses the discrete Fourier transform to calculate the Fourier coefficients needed to represent the shape of each object for TB classification. [16] employs LabVIEW's built-in signal-processing library to perform Fourier transform to obtain shape features for bacteria classification. [121] also utilizes 15 Fourier descriptors to describe the shape information of TB objects. Nevertheless, different from previous works, fuzzy entropy is applied to select prominent Fourier descriptors. Additionally, [30, 29, 163, 139, 48, 47] extract multiple types of features, which also contain Fourier descriptors to represent shape features. In addition to geometrical and spectral features, some works employ invariant moments and junction descriptors as shape features. [92, 89] extract Hu invariant moments for bacteria classification, which are independent of changes of shift, measurement, and rotation. [159] also uses Hu invariant moments for the identification of powdery mildew spores. [111] extract six affine moment invariants to train ANN to perform the TB classification task. Besides, [30, 29] adopt junction descriptor as one of their features.

The texture is a vital element of human visual perception, and texture feature has widespread applications in many computer vision systems [63] . It also is widely used in MLP-based classification tasks. In addition to using shape features, [30] also extract texture features for Dinoflagellates classification. Similarly, [29] adopts the same texture features. [163, 164] extracts multiple types of features, which contain texture features. [92, 89] extract not only the shape but also texture features. Analogously, [107] extract both shape and texture features for algae identification.

In addition to shape and texture features, the color feature also plays an essential role in MLP-based microorganism classification tasks. For example, [10] uses an optical plankton analyzer to measure color features to distinguish Cyanobacteria from other algae. Besides extracting shape and texture features, [163, 164] also use color features in their works. [73] adopts color features to assist the identification of soil microorganisms.

Through the analysis of the feature extraction of MLP-based microorganism classification tasks, it can be found that shape features are the most frequently used feature type. Texture and color features are also used in some works. Most extracted features are directly fed to the subsequent classifiers. This strategy cannot explore the effective features and is inefficient for the classification of microorganisms. However, there are some exceptions. Some works adopt the tree algorithm to select effective features from original features. It is of great significance to explore efficient features for microorganism image classification.

MLP Implementation Details: MLP is the most widely-used network among classical neural networks in MIA tasks. The basic information and characteristic of MLP are provided in Sec. 2.2.1. Through investigation of MLP-based microorganism classification tasks, it can be found that most of them adopt the three-layer structure based on the BP algorithm, which contains an input layer, a hidden layer, and an output layer. Nevertheless, there are some exceptions. In this part, the MLP-based works are introduced by their implementation details.

Among microorganism classification tasks, most three-layer MLPs use the BP algorithm in their training process. For instance, [28] adopts a three-layer MLP based on the BP algorithm to perform the Cymatocylis classification task. The network consists of 15 input, three hidden, and five output neurons. [16] employs the BP-based three-layer MLP network, which has 15 input, five hidden, and three output neurons, to classify individual bacteria and non-bacteria objects. [92, 89] also adopt the three-layer MLP using the BP algorithm for training to perform bacterial image classification. [45] adopts the three-layer MLP based on the BP algorithm to perform the classification task for investigating the reactors' influence on the fungal morphology. Besides, [107, 143, 159] adopt BP-based 21-8-5, 2-15-2, and 7-3-1 (Input-Hidden-Output neurons) MLP structures for algae, TB, and powdery mildew spores classification tasks, respectively. Unlike directly using the BP algorithm for MLP training, [90, 19, 91] adopt an adaptive accelerated BP algorithm to train MLP for bacteria image classification tasks.

Through the above discussion, it can be found that the MLP structures are various. This is because the number of input neurons usually depends on the dimension of the feature vector [41] . However, for the work without feature engineering, the number of input neurons depends on how many pixels the input image contains. For instance, [166] adopts MLPs to identify Cryptosporidium parvum and Giardia lamblia. Due to the input image sizes (40 by 40 and 95 by 95 pixels for Cryptosporidium parvum and Giardia lamblia, respectively), the MLPs have 1600 and 9025 input neurons, respectively. The number of output neurons often depends on the category number of classification [163] . Never-theless, there are some exceptions, such as [41] adopting the BP-based MLP, whose output layer includes only one neuron, for phytoplankton classification. The output neuron value ranges from -1 to 1, which can indicate the classification result. Besides, [159] utilizes a similar output layer. Nevertheless, for the hidden layer, most works do not provide the details for using the relevant number of neurons.

The configuration setting of MLP aims to achieve good performance. Therefore, some works evaluate the performance of different MLP configurations. [163, 164] evaluate the classification performance of sedimentary organic matter based on different four-layer MLP configurations. Similarly, in the classification and retrieval of macroinvertebrate images, [70] evaluate the classification accuracy of different MLP configurations, which include not only three-layer but also four-layer structures. In addition to [163, 70] , [139] develops an automatic image analysis system to recognize phytoplankton, which employs the four-layer MLP. Besides the four-layer and three-layer structures, there are some works that adopt the two-layers MLP without any hidden layer. [47] proposes a semi-automatic image analysis program for protozoa and metazoa classification, which uses a two-layer feed-forward MLP based on the BP algorithm. [5] introduces a semi-automatic method to perform the recognition task of stalked protozoa species in sewage. It also uses the two-layer MLP with 15 input and ten output neurons.

In addition to the above works, there are some works that compare the performance of MLP with other ML methods. For example, in [30] , human experts' performance in identifying Dinoflagellates is compared to that achieved by two ANN classifiers (MLP and RBF) and two other statistical techniques, KNN and Quadratic Discriminant Analysis (QDA). [155] proposes an automatic method to detect TB in sputum smears, in which the classifiers from discriminant methods of statistics or neural networks are compared. The results show that the MLP based on the BP algorithm performs best. [68] develops a classification system based on statistical methods, which can successfully discriminate closely pathogen species within the same host. The statistical methods contain linear discriminant analysis, nearest neighbors, MLP, and projection pursuit regression. [29] compares the four methods of MLP, KNN, RBF, and QDA in the classification of Dinoflagellates. The experimental results show that human performance declines while the performance of BPN rises most obviously when the number of species increases. Similar comparisons works contain [48, 70, 73, 121] .

The above discussion shows that the three-layer MLP based on the BP algorithm occupies the leading position in microorganism image classification tasks. Besides that, some works adopt two-layer or four-layer MLP, but there is no work employing the MLP more than four layers. This is because the deep structure has more training parameters and may lead to the over-fitting problem under insufficient training data. However, at that time, the performance of computer hardware is limited, and the microorganism image data is difficult to collect. It is worth noting that the number of MLP input and output neurons usually depends on the dimension of the feature vector and classifi-cation category number. However, the setting of hidden layers in most works does not go through detailed theoretical or experimental analysis. In addition, some works do not provide the MLP implementation details. [10, 177] do not explain whether they adopt the BP algorithm in the training process. [133, 111, 114] do not provide the detailed configuration of MLP.

In addition to MLP, RBF is also one of the most popular networks in MIA tasks based on classical neural networks. More than ten papers employ RBF-based methods to perform the microorganism image classification task. These methods involve several microorganism categories, with differences in image preprocessing, feature extraction, and MLP structure. Therefore, like Sec. 3.1.1, we analyze the image pre-processing, feature extraction, and MLP implementation detail, respectively.

Image Pre-processing: Like the characteristic of the pre-processing in MLPbased microorganism image classification methods, most RBF-based methods apply pre-processing operations before feature extraction. Almost all preprocessing operations are aimed to obtain the ROI. Since there are only about ten related works and some belong to a series of works, the pre-processing details are directly introduced in sequence.

[56] proposes a method for the classification of different cell growth phases. In this method, the adaptive global thresholding is applied to segment the cell images after several morphological operations, erosion, reconstruction, and dilation. As the extension works, [59, 58] employ the same segmentation method. [59] introduces an image analysis method for cocci bacterial cell classification. [58] focuses on the classification of spiral bacterial cells, which contain three categories: vibrio, spirillum, and spirochete. However, as a series of works, [57] adopts the active contour method to obtain the segmented images in the analysis of cocci bacterial cells. [60] also adopts the active contour method in the segmentation part of spiral bacterial cell analysis. Besides, [120] uses the active contour segmentation method using level set formulation and Mumford-Shah technique in the identification of TB in digital sputum smear images. There are some works that compare multiple ANN classifiers in the microorganism classification tasks. For the works involving both RBF and MLP, including [30, 29, 73] , the pre-processing details are provided in Sec. 3.1.1. Besides, the preprocessing applied in [164] also contains image segmentation, but the detailed methods are not mentioned.

Same as MLP, although the pre-processing operations are variable, all aim to achieve the segmentation of microorganism objects and reduce the influence of noise. Additionally, [70] performs the macroinvertebrate image classification task without image pre-processing.

Feature Extraction: Most RBF-based microorganism image classification methods extract image features after the pre-processing. The feature extrac-tion and selection have a significant influence on the performance of RBF. Same as the analysis of pre-processing, we also analyze the feature extraction of related works in sequence. [56] uses geometrical features to train RBF and other classifiers. The geometrical features contain circularity, compactness, eccentricity, tortuosity, and length-width ratio. As the extension works, [57, 59, 58, 60] also adopt geometrical features, but there are some differences. [57, 59, 58] employ the same five geometrical features, but [60] only uses three of them: circularity, eccentricity, and tortuosity. [120] also uses the geometrical feature. Nevertheless, unlike other works, it selects the most significant features from the original features by student's 't' test. Finally, compactness, eccentricity, circularity, and tortuosity are chosen for training the following classifier. As the extension of [163] , [164] extracts the same original features, which contains shape(geometrical descriptors, Fourier descriptors, and others), texture, and color features. After the initial extraction, a classification tree algorithm is applied to select the effective features. [70] uses both geometrical statistical features. Moreover, due to [30, 29, 73] evaluating both MLP and RBF in their works, the feature extraction details are provided in Sec. 3.1.1.

After the summary of the feature extraction in RBF-based microorganism classification tasks, it can be found that most works directly extract the features from the pre-processed image data without more refined and efficient feature screening. However, [120, 164] adopt some methods to screen significant features from the original features. Using more specific features can improve the efficiency of the microorganism classification system. RBF Implementation Details: RBF is the second widely-used ANN in microorganism image classification tasks. It is a commonly used ANN for function approximation problems [142] . The popular RBF form is a three-layer feed-forward neural network, which contains input, hidden, and output layers. The hidden layer comprises several radial basis function nonlinear activation units [42] . Activation functions in RBFs are conventionally implemented as Gaussian functions [42] . It distinguishes itself from other ANNs due to its universal approximation and faster learning speed [142] . Same as Image Preprocessing and Feature Extraction, the implementation details are introduced in sequence. [56] evaluates four kinds of classifiers in the classification of bacilli bacterial cell growth phases. RBF classifier is compared with 3σ, KNN, and fuzzy classifiers. The experimental results show that all classifiers but the fuzzy classifier perform worse than RBF. As the extension works, [57, 59] evaluate 3σ, KNN, and RBF classifiers in the classification of cocci bacterial cells. The results show that RBF outperforms others. [58] evaluates RBF, 3σ, KNN, and fuzzy classifiers in analyzing spiral bacterial cell groups. RBF, 3σ, and fuzzy classifiers achieve 100% classification accuracy for different bacterial cells. Different from [58] , the adaptive neuro-fuzzy inference system is used to replace the fuzzy classifier in [60] . The results show that RBF and the neuro-fuzzy classifier obtain 100% accuracy for different spiral bacterial cells classification.

[120] adopts RBF to identify TB in digital sputum smear. RBF is compared with the Adaptive Neuro-fuzzy Inference System (ANFIS) and Complex-valued Adaptive Neuro-fuzzy Inference System (CANFIS). The results show that the CANFIS achieves better accuracy than RBF and ANFIS. [164] uses RBF to replace some MLPs in [163] to classify sedimentary organic matter. The experimental results show that in the best case, the correct recognition rate is improved by 4% to just over 91%. [70] focuses on the automatic classification and retrieval of macroinvertebrate images. Four classifiers are evaluated in this work, which contain MLP and RBF. The results show that MLP has the best performance. Besides, [30, 29, 73] also evaluate both MLP and RBF in their works, in which all the RBFs outperform MLPs.

Through the above discussion, we can find that the RBF performs better than the MLP in most works. However, in [70] , MLP achieves the best performance. It's worth noting that several configurations of MLP, including different layer structures, are evaluated to find out the best configuration. However, RBF only contains one hidden layer. That may be the reason that MLP outperforms RBF.

In addition to MLP and RBF, some works adopt other classical ANNs in the classification of microorganism images. Nevertheless, due to the limited number of each network, we directly analyze each related paper and provide detailed discussions about the research motivation, contribution, data, workflow, and result.

Image recognition methods are behindhand with the wide application of optical imaging samplers in plankton ecology research. However, most methods need manual post-processing to correct their results. To optimize this situation, [62] proposes a dual-classification method. In this method, four kinds of shape features, including moment invariants, morphological measurements, Fourier descriptors, and granulometry curves, are used to train a Learning Vector Quantization Neural Network (LVQ-NN) in the first classification, and texturebased features (co-occurrence matrix) are used to train the SVM to achieve the second classification. The LVQ-NN has two layers, namely, a competitive layer and a linear output layer. The complexity of the network depends on the number of training images and the number of taxa in the classifier. Only when the two classifiers' results are consistent will the data to be classified be divided into one class, which effectively reduces the false positive rate. The image data used in the experiments contain seven plankton categories and about 20000 images. The results show that compared with previous methods, the dual-classification system after correction can effectively achieve abundance estimation and reduce the error by 50% to 100%.

The conventional manual method for detecting Mycobacterium tuberculosis is an ineffective but necessary part of diagnosing tuberculosis disease. In [112] , a method based on the image analysis technique and ANN is introduced to detect the Mycobacterium tuberculosis in the tissue section. Fifteen tissue slides are collected in the Department of Pathology, USM Hospital, Kelantan, and each slide is captured to generate 30 to 50 images. A total of 607 objects, which contains 302 definite TB and 305 possible TB, are obtained. In the experiments, the moving k-means clustering is applied to segment the images, and geometrical features of Zernike moments are extracted. Then, the Hybrid Multilayered Perceptron (HMLP) is used to test the performance of different feature combinations in the detection task. The experimental results show that the HMLP with the best feature combination can achieve an accuracy of 98.07%, a sensitivity of 100%, and a specificity of 96.19%.

TB detection in tissue is more complex and challenging than detection in sputum. As the extension of [111] , [110] introduces a method based on the image processing technique and ANN to detect and classify TB in tissue. The 1603 objects consist of three categories: TB, overlapped TB, and non-TB, which are collected from 150 tissue slide images. The captured image is segmented first, and then the six affine moment invariants are extracted to train the network to perform the classification task. It uses the HMLP network trained by integrating both Modified Recursive Prediction Error algorithm and Extreme Learning Machine to replace single-layer feed-forward neural network trained by Extreme Learning Machine in [111] . The results show that this network can achieve the highest testing accuracy of 77.33% and average testing accuracy of 74.62% for 35 hidden nodes.

The traditional microorganism detection and identification method is laborious and time-consuming. In [76] , a rapid and cost-effective method based on PNN is applied to classify the microorganism image classification task. The features, including various geometrical, optical, and textural parameters, are extracted from the pre-processed images to train the network in this method.

In the experiments, the dataset includes five categories: Bacillus thuringiensis, Escherichia coli K12, Lactobacillus brevis, Listeria innocua, and Staphylococcus epidermis. The experimental results show that the PNN based on nine kinds of features (45 • run length non-uniformity, width, shape factor, horizontal run length non-uniformity, mean grey level intensity, ten percentile values of the grey level histogram, 99 percentile values of the grey level histogram, sum entropy, and entropy) can classify the microorganisms with 100% accuracy.

The classification of algae is one of the fundamental problems in water resources management. In [25] , an automatic real-time microalgae identification and enumeration method based on image analysis is introduced. This method integrates segmentation, shape features extraction, pigment signature determination, and ANN classification. The ANN used here is a Self-Organizing Map (SOM), whose details can be found in [127, 138, 144] . The neuron number of SOM used in this study is fixed as the estimate of category number, and the neurons are interconnected through neighborhood relations. The data shown in the experiment is private. The experimental results show that the accuracy of this method is 98.6% on 23 categories of 53869 images.

Continuous monitoring of cells during the fermentation process can significantly increase the fermentation yield. In [141] , a prototype Self-Tunning Vision System (STVS) is developed to monitor morphologic changes of the yeast-like fungus Aureobasidium pullulans during the fermentation of pullulan polysaccharide. The workflow of this system contains several steps: preprocessing, segmentation, intermediate processing, feature extraction, and classification. A SOM network is employed for image segmentation. The experimental results show that the system is reliable, accurate, and versatile.

The Zeihl-Neelsen tissue slide image segmentation is a crucial part of diagnosing tuberculosis disease by the computer-aided method. A method based on HMLP is introduced to perform the segmentation task in [113] . In the experiments, there are a number of 5608 data used for training and numbers of 2268, 1634, and 2800 used for samples A, B, and C for testing. In the segmentation process, the hue and saturation components of each pixel in the kernel of 3 × 3 pixels are used as the inputs of the HMLP network. The center pixel of the kernel is used as the output of the HMLP network. If the testing results show that the pixel belongs to the segmentation object, the center pixel will be assigned as '1'; otherwise, it will be '0'. The experimental results show that this method can obtain the accuracy of 98.72%, 99.45%, and 97.75% for samples A, B, and C, respectively.

The conventional tuberculosis diagnosis method based on the microscope is widely used in developing countries. Nevertheless, it is time-consuming. Therefore, in [26] , a novel method based on image processing techniques is proposed to assist the diagnosis. It contains three steps: image acquisition, segmentation, and post-processing. In the segmentation part, SVM and ANN (threelayer feed-forward neural network) are used. The input variables of classifiers are combinations of pixel color features selected from four color spaces. The best feature is selected by a scalar feature selection technique. The output of the segmentation part is to determine whether it is a bacilli. The function of the post-processing step is to remove the non-bacilli parts. The training data used in the segmentation part consists of 1200 bacilli pixels and 1200 background pixels from 120 images. The experimental results show that SVM achieves the best sensitivity of 96.80% with an error rate of 3.38%. The ANN classifier achieves the best sensitivity of 91.53% and an error rate of 5.20%.

In [167] , a feed-forward MLP, which contains 81 input neurons, ten hidden neurons, and one output neuron, is proposed for the image enhancement and subsequent enumeration of microorganisms adhering to the solid substrate. Unlike traditional classical ANNs, this network does not rely on feature engineering. The network's input directly employs grey values from 9 × 9 pixel sub-images taken from the microscope image (512 × 512 pixel). These grey values are processed to yield an output value for every pixel in the original image, whereafter microorganisms are indicated with a marking circle. For high-contrast images, nearly all adhering organisms can be detected correctly. However, in the lower-quality images, for metal or silicone rubber substrata, image enhancement by the ANN yielded enumeration of the adhering bacteria with an accuracy of 93%-98%. After ANN enhancement, 98% of the yeast cells adhering on silicone rubber substrata were enumerated correctly. It can be concluded that for low quality and complicated images, ANNs have outstanding performance.

The manual method for bacteria classification is complex and ineffective. In [186] , a bacteria image classification method based on the features generated by Pulse Coupled Neural Network (PCNN), which is a biologically inspired ANN based on the work [40] , is introduced. In the workflow, the acquired images are pre-processed by cropping the image and saving the region of interest first. Then, the entropy sequence features are generated by PCNN. In the end, the classification is done by a classifier based on Euclid distance. The experimental results show that this method is feasible and efficient.

According to the above discussion, MIA based on the classical neural network has the following characteristics. Firstly, due to the wide range of microorganism applications, such as medical diagnosis, environmental management, food production, and other fields, most datasets used in related papers are privately collected data. The lack of public datasets makes it challenging to develop the general MIA algorithm. It also makes the comparative evaluation of different algorithms difficult. Secondly, classical neural networks usually cannot be applied directly to the image data, so the workflow of MIA based on the classical neural network usually contains three steps: pre-processing, feature extraction, and classification. Therefore, feature engineering has a significant influence on the algorithm's performance. Besides, classical neural networks used in MIA tasks usually have shallow structures. This is because deep networks have numerous connections between different layers, which can result in abundant parameters and hard training problems. Thirdly, the most tasks of MIA based on classical neural networks are classification. Among other tasks, there are examples of directly applying classical neural networks to process image data, which are different from CNN that appears later.

In addition, to help readers find and understand relevant works quickly, we briefly summarize these papers of MIA based on classical neural network in Tab. 3, in which years, application tasks, references, datasets, object species, category number, methods, and results are provided. 

The MIA tasks based on deep neural networks are introduced in this section. The discussion is provided according to different tasks, including classification, segmentation, feature extraction, detection, counting, and data augmentation. Same as the MIA works based on classical neural networks, papers related to the classification task occupy most of the MIA works based on the deep neural network. Therefore, the detailed analysis is presented according to different neural networks. In the segmentation part, U-Net is the most popular network. Thus, the details are provided from U-Net and other ANNs. Nevertheless, the networks used in other tasks are more fragmented, so the related papers are discussed individually. Additionally, a summary is provided to summarize the characters of the MIA based on deep neural networks and a table for readers to find relevant papers conveniently.

Among the MIA tasks based on deep neural networks, the networks consisting of convolutional filters are divided into the category of CNN, excluding already named networks, such as AlexNet, ResNet, and DeepSea. Due to the application of convolutional filters in CNNs, the MIA analysis is able to no longer rely on feature engineering, and the workflow usually includes little or no image pre-processing and feature extraction. Therefore, the introduction of microorganism image classification based on CNN mainly focuses on microorganism category, methodology, and performance.

The recognition system of plankton is challenging to design, mainly because most traditional methods rely on feature engineering. When the recognition task relates to multiple categories, extracting and selecting practical features will be a tough challenge. To optimize this situation, [38] proposes a recognition system, called SYRACO2, to perform the classification task of coccoliths. A simple CNN containing two convolutional layers and two fully connected layers is designed in this system. Unlike classical neural networks, the application of the convolutional filter enables the system to no longer rely on feature engineering. The coccolith data used in the experiment contains 13 species and one non-coccolith class. Each coccolith species has about 100 images, and the non-coccolith class has about 600 images. The average effective recognition rate of the system in 13 species is 86%. As an extension, [11] introduces a novel CNN with a parallel structure to improve the identification rate. The structure is shown in Fig. 10 . Based on the network proposed in SYRACO2 [38] , motor modules are introduced in the parallel neural network. The motor modules can dynamically achieve the translation, rotation, dilatation, contrast, and symmetry of the images through training. Besides, the novel system introduces secondary classification, which can achieve better classification. The results show that it recognizes approximately 96% of coccoliths belonging to 11 Pleistocene taxa during routine work. Fig. 10 The novel CNN in [11] .

Besides, [81, 122, 173, 100] also employ CNNs for the classification of plankton. [81] adopts a plankton image dataset, WHOI-Plankton, which consists of 3.4 million expert-labeled plankton images with 103 classes. Nevertheless, It has a seriously imbalanced problem. Therefore, a CNN based on transfer learning is introduced to solve it. A balanced sub-dataset is extracted from WHOI-Plankton to pre-train the network. Then, the original dataset is used to fine-tune the network. The results show that this method achieves the classification accuracy and F1-score of 92.80% and 33.39%. [122] converts the original image into multiple sizes to train a simple CNN in parallel so that it can adapt to different inputs. The dataset used in this study is Plankton Set 1.0. The best model achieves a softmax loss of 0.613. The performance needs to be further improved. Kaggle data science competition in 2015 released a plankton dataset, which has 30336 images of 121 categories. [173] proposes a novel CNN based on the exploration of classical CNN models, such as CaffeNet, VGG-19, and ResNet. The structure is shown in Fig. 11 . The results show that the CNN significantly reduces the model size and improves the frame rate while achieving similar performance. [100] uses a CNN, which consists of 13 convolutional and 12 pooling layers, to perform plankton classification. The Kaggle plankton dataset is used to evaluate the proposed CNN. The results show that an utterly random classification assessment obtains an average accuracy of 84% and a recall of 40% for all groups. Fig. 11 The proposed CNN in [173] .

Bacteria classification plays an essential role in many applications, such as disease diagnosis and food production. However, manual methods are ineffective. Thus, there are some works using CNNs to assist this process. [9] develops a smartphone optical device, which uses a CNN-based method to detect bacteria and analyze their motion. The CNN consists of two convolution layers and two fully connected layers. The results show that it can achieve an accuracy of 83% in the classification of E.coli and B.subtilis. [119] introduces a model based on region covariance and CNN to classify bacteria. The region covariance is used to segment the input image, in which texture comparison is used. These segments are fed into CNN for classification. The experimental data contain three categories: rod-shaped bacteria, round or nearly round shape bacteria, and mixed bacteria strains. The results show that the classification accuracy values in the rod-shaped bacteria and spherical or nearly spherical shape bacteria exceed 91% and 78%, respectively. [17] proposes a system that uses 3D CNN to identify bacterial growth. Different from others, laser speckle images are the objects of analysis. The 3D CNN can encode spatial speckle variance and their changes in time. The results show that it reaches an accuracy of 0.95. [24] evaluate several different CNN training strategies to find out the most suitable one in classifying bacteria images in four categories. The evaluated models contain simple CNN, simple CNN optimized by particle swarm optimization, autoencoders pre-training the CNN, autoencoders pre-training the CNN optimized by particle swarm optimization, and CNN optimized by genetic algorithms. The results show that autoencoders pre-training the CNN optimized by particle swarm optimization performs best with the accuracy and F1-score of 94.9% and 95.6%. In [104] , a method based on 3D CNN is presented to perform bacterial image classification. The results show that it obtains an accuracy of 95%. [152] proposes a CNN-based method to classify bacterial cells. B. subtilis image data is used in this study. In the CNN-based method, the original image is first processed with a binary segmentation algorithm and annotated manually, and then these images are augmented to train and test the CNN. The results show that the proposed CNN can achieve 86% accuracy.

Besides, some bacteria classification works adopt both CNNs and other networks. In [109] , a framework for bacteria segmentation and classification is proposed. In this framework, the original images are cropped into patches, and Convolutional Deep Belief Network (CDBN) is used to extract the patch-level features for training SVM to achieve the patch-level segmentation, which aims to distinguish foreground and background. After that, the foreground patches are further classified by a simple CNN. The predicted labels vote for the final bacterial category. The data consists of 17 species. The results show that the framework achieves an accuracy of 97.14% in patch-level segmentation and an accuracy of 62.10% in classification. In [66] , a food-borne pathogenic bacteria classification method based on CNNs and hyperspectral microscope imaging is proposed. The data contains five species. Hyperspectral microscope imaging technology can obtain both spatial and spectral information. There are two CNN models used in this work: U-Net and 1D CNN. U-Net is first applied to segment the ROIs, and 1D CNN is used to classify the bacteria using the spectral information extracted from the ROIs. The results show that U-Net achieves the average accuracy of 96% and the average mIOU of 88%, and 1D CNN obtains the classification accuracy of 90%. As an extension of [66] , [67] proposes a hybrid deep learning framework defined as "Fusion-Net". In the proposed framework, the first step is data acquisition, and then three features, including morphological features, intensity images, and spectral profiles, are extracted from the image data. These features are used to train three networks: LSTM, ResNet, and 1D CNN. The schematic diagram is shown in Fig. 12 . The results show that the three networks achieve classification accuracies of 92.2%, 93.8%, and 96.2%, respectively. After the fusion, the classification accuracy is increased to 98.4%. In addition to plankton and bacteria, CNN is also used in TB classification. TB is the pathogenic bacterium that causes tuberculosis. [147] proposes a framework for automatic and rapid classification of TB in sputum images. In this framework, the image is pre-processed by applying noise reduction and intensity modification, and then the segmentation is done by the Channel Area Thresholding (CAT). After that, the HOG and SURF features are extracted from the segmented images to train the CNN classier. The experimental results show that the proposed framework achieves an accuracy of 99.5%, a sensitivity of 94.7%, and a specificity of 99%.

Through the above discussion, it can be found that some CNN-based works [38, 11, 9] adopt simple and shallow structures. This is because of the limitation of hardware performance and available image data. Due to the significant improvement of computer performance and available data, most CNN structures adopted in the recent decade are deeper and more complex, such as the CNNs in [173, 100] . Besides the difference in network structures, [81] introduces CNN-based transfer learning to solve the imbalance problem of microorganism image data. Additionally, one of the most significant advances of CNN is that the convolutional filter enables the network to no longer rely on feature engineering. It should be noted that although the convolutional filter can make CNN no longer rely on feature engineering, there are still some works that use feature engineering to further improve the performance of CNN, such as [147] .

AlexNet is one of the most classical deep neural networks. Its basic information and structure are provided in Sec. 2.2.2. It is also widely used in MIA tasks, especially the classification task. In [31] , a hybrid CNN model is proposed for plankton classification. The network consists of three branches based on AlexNet, and their inputs are original images, global feature images, and local feature images. At the bottom of the network, fully connected layers connect these branches, and softmax is applied for classification. The dataset used here is WHOI-Plankton. The experimental results show that the hybrid CNN based on AlexNet achieves an accuracy of 95.83%. Similarly, [27] introduces a novel texture feature extraction method and a hybrid CNN based on AlexNet for plankton classification. The original image is used to generate texture and shape images, and they are concatenated with the original image. The concatenated image is used for AlexNet training. The dataset used in this study is also WHOI-Plankton, and the experimental results show that the network with three inputs obtained the best accuracy of 96.58% in 30 categories and 94.32% in 103 categories. [115] utilizes the transfer learning technology based on AlexNet pre-trained by ImageNet to perform the classification of diatom. The performance of the network fine-tuned by different forms of data is studied. The results show that when the original data are combined with the normalized data, the network achieves the best average accuracy of 99.51%.

VGGNet is also popular in many image analysis tasks. Its characteristic is provided in Sec. 2.2.3. VGGNet has several different types of structures. Their related classification works are summarised in this part. In [3] , a CNN based on VGG-16 is proposed to classify plankton, and its structure is shown in Fig. 13 . To evaluate the network's performance, three sub-datasets from SIPPER are used. The experimental results show that it achieves an accuracy of 80.54% on SIPPER-77. [158] presents a transferred parallel neural network for large-scale imbalanced plankton dataset classification. It consists of a pre-trained model based on small classes and an untrained model. In the training process of the network, the data is fed into the network for training, and the pre-trained model is used as a feature extractor to enhance the features of small classes to improve the classification ability on the imbalanced dataset. The dataset is WHOI-Plankton, and the results show that the transferred parallel neural network based on VGG-16 achieves the best accuracy of 94.98% and F1-score of 54.44%. Besides, in [176] , several popular CNN models with or without pretrained weights are compared and analyzed in the microorganism (bacteria and fungi) classification task. These models include VGG-16, VGG-19, Xception, ResNet-50, Inception-V3, MobileNetV2, and DenseNet201. The experimental results show that Xception and ResNet-50 perform well, and it is not always beneficial to use the pre-trained weights of CNN models. 

Inception series also play an essential role in many image analysis tasks. The introduction of Inception is provided in Sec.2.2.4. Some works use Inceptionrelated networks to perform the microorganism image classification task. For instance, [156] uses the modified deep CNN model based on Inception-V1 to realize bacterial classification. The network is pre-trained by a million images, and then the network is re-trained by the dataset, which has five bacteria species selected from several online resources. Results show that the network achieves an accuracy of about 95%. As an extension, the Xception-based bacteria classification method is tested in [157] . Xception is pre-trained by ImageNet and then fine-tuned by experimental data, which collects a total of 740 images in seven categories from several online datasets. The experimental results show that the Xception-based method achieves an accuracy of 97.5%. Besides, [176] also compares Xception and Inception-V3 with other networks. The details are provided in the last part.

In addition to the above networks, some works adopt other deep ANNs to classify microorganism images. Nevertheless, due to the limited number of each network, we directly analyze each related paper and provide detailed discussions about the research motivation, contribution, data, workflow, and result.

Rapid identification of microbial pathogens is of great significance in treating infection. A method to distinguish bacterial species using 3D refractive index images is presented in [69] . DenseNet and Wide Residual Network (WRN) [175] are compared in this study. The data used in the experiment consisted of seven categories, each containing more than a thousand images. The experimental results show that the highest accuracy obtained by WRN is 73.2%, and the highest accuracy obtained by DenseNet is 85%.

Plankton image classification is of great significance to the study of plankton. In [96] , a CNN called PyramidNet for plankton image classification is proposed. The dataset used in this study is WHOI-Plankton. The experimental results show that PyramidNet performs better than AlexNet, GoogLeNet, VGG-16, and ResNet. It gets the best performance with 86.30% accuracy and 41.64% F1-score.

Analysis of viruses using transmission electron microscopy images is an essential step in understanding the formation mechanism of infectious virions. In [35] , an investigation for comparing the performance of a CNN that was trained from scratch with pre-trained CNN models as well as existing image analysis methods are produced. The private data, which contains 190 images for training and verification and 21 images for testing, is used in this study. The experimental results show that the pre-trained ResNet50 fine-tuned by virus image data obtains the best performance with an accuracy of 95.44% and an F1-score of 95.22%. Different species of bacteria have different effects on humans, so it is essential to distinguish them. In [132] , deep CNN is used to classify 33 bacteria species in the DIBaS dataset. In this work, to improve the classification accuracy, some pre-processing operations, including color masking and image augmentation, are applied first, and then the MobileNetV2 based on Ima-geNet weight is used as the base model for classification. The experimental results show that the model achieves an average accuracy of 95.09%.

The possibility of using image classification and deep learning methods to recognize the standard and high-resolution bacteria and yeast images is studied in [153] . The standard resolution data used in this study is provided by the authors and includes three classes of bacteria and one class of yeast, each with more than 200 images. High-resolution images are taken from similar bacteria and yeasts in [187] . The network used in the experiment is LeNet. The experimental results show that more than 80% accuracy can be obtained from standard resolution data.

In addition to classification task, segmentation task plays a vital role in MIA tasks based on deep neural networks. U-Net is the most popular network used in the segmentation task. Besides, there are some works adopting other networks for segmentation. To facilitate the analysis, we introduce the segmentation from U-Net and others.

U-Net is widely used in many image segmentation tasks, including microorganism image segmentation. Its essential information and structure are provided in Sec. 2.2.6. Due to the expensive image segmentation annotation, [102] proposes a simple U-Net, which uses the minimum annotation, to perform the Rift Valley virus segmentation task. The dataset contains 143 TEM images [77] . The minimal annotation requires only simple annotation of the virus center, and then the ground truth images are generated by dilating operation. The results show that the network achieves a Dice of 90% and an IOU of 83.1%. In [37] , a method for boundary identification of irregular yeast cells is proposed. In this method, U-Net is used to segment the cell objects from the background, but it cannot solve the adhesion problem of cells. Therefore, the watershed algorithm is combined with the prediction scores of U-Net to achieve the instance segmentation. The experimental results show that the proposed method achieves a mean accuracy of 94%. [88] develops a Multiple Receptive Field U-Net (MRFU-Net) to perform the environmental microorganism image segmentation task. This network is inspired by GoogLeNet and U-Net. The EMDS-5 dataset [94, 93, 74] , which contains 21 microorganism categories, is used to evaluate the segmentation performance. Experimental results show that the Dice, Jaccard, recall, accuracy, and VOE obtained by MRFU-Net are 87.23%, 79.74%, 87.65%, 97.30%, and 20.26%, respectively. As an extension of work [88] , a CCN-CRF framework is proposed to perform the environmental microorganism image segmentation task in [180] . Fig. 14 provides the details of this framework. The framework includes two parts: pixel-level segmentation and patch-level segmentation. In pixel-level segmentation, a low-cost U-Net (called mU-Net-B3 or LCU-Net) is proposed to perform the pixel-level segmentation. It shows better performance and has less than one-third memory requirement than U-Net. In the patch-level segmentation, the transfer learning based on VGG-16 is used to perform the patch classification. These predicted labels are used to reconstruct the patch-level segmentation results. The experimental results show that the Dice, Jaccard, recall, accuracy, and VOE obtained in pixel-level segmentation are 87.13%, 79.74%, 87.12%, 96.91%, and 20.26%, respectively. 

Accurate segmentation in cell microscopy is one of the critical steps in cell analysis. In [6] , a SegNet-based segmentation method is proposed and applied to the multi-modal fluorescent microscopy image data of yeast cells. The data used in the experiment is the fluorescence microscope images of yeast cell division, consisting of 6000 training samples, 1200 validation samples, and 1200 test samples. Experimental results show that the method achieves a mIOU of 71.72%. Tuberculosis is one of the top 10 causes of death worldwide. In [140] , a method based on CNN and a mosaic image approach is proposed. The data used in this study includes positive and negative patches. These patch data are used to generate a total of 5000 mosaic images. Each image comprises 100 patches, about half of which are negative, and another half are positive. Fig. 15 provides a mosaic image example. In the experiments, three CNNs are proposed to perform the segmentation task to achieve the bacillus detection, which aims to perform the bacilli count. Fig. 16 provides these networks' structures. The network performance is evaluated by counting the number of segmented bacilli. The experimental results show that the deepest CNN1 obtain the best performance with an accuracy of 99.665%. 

Due to microorganism image detection works adopting different deep ANNs and each network containing a limited number of works, the detailed analysis of each paper is presented individually. In [64] , a two-stage method of detection and classification is proposed to analyze malaria parasite images. In the detection stage, Faster RCNN is used to detect the target, which only distinguishes whether it is the red blood cell or not. In the second stage, AlexNet is used to classify the targets labeled as non-red blood cells. All networks are pre-trained by ImageNet and fine-tuned by the malaria parasite image data. The results show that an accuracy of 59% is achieved in the detection stage, and 98% is obtained in the second classification stage. Similarly, [7] introduces a framework based on Fast RCNN and CNN to classify and quantify five species of cyanobacteria. The framework is shown in Fig. 17 . Fast RCNN is used to detect and classify targets, and then CNN is used to perform the counting. The experimental results show that the classification accuracy values of fast RCNN for five species of cyanobacteria are 0.929, 0.973, 0.829, 0.890, and 0.890, and CNN obtained the R 2 value of 0.85, and RMSE of 23 cells in the counting task. Fig. 17 The procedure of algae classification and cell counting in [7] .

[151] proposes a new fungus dataset and develops a CNN-based method for fungus detection. The fungus dataset contains 40800 images of five classes of fungi and one extra class of dirt. This data is divided into 30000 training data and 10800 test data. The CNN proposed in this paper is shown in Fig. 18 , which can perform the detection of fungi and the classification of fungi. The experimental results show that the accuracy of detection is 94.8%. In [116] , a comparison is conducted to test whether the latest deep learning network, including RCNN and YOLO, can adapt to the detection of diatoms. The data used here is private and contains nearly 11000 images in ten categories. The experimental results show that YOLO is more effective with an F1-score of 72%, while the RCNN is only 10%. 

Besides the classification, segmentation, and detection tasks, some works employ deep neural networks to perform the image feature extraction task. The details are provided individually.

In [4] , a hybrid plankton classification algorithm based on CNN is proposed. The illumination of the hybrid algorithm is shown in Fig. 19 . First, the plankton image data is used to train the CNN, which has three hidden layers. Then, the features generated by each hidden layer are combined with different classification methods, which include RF and SVM. The experimental results show that the SVM trained by the features generated by the first hidden layer achieves the best performance with an accuracy of 96.70%. Fig. 19 The illumination of the hybrid algorithm in [4] . [71] proposes a classification framework based on the deep neural network and CRF to classify environmental microorganisms. The EMDS-4 dataset is used, which contains 400 images of 20 categories. The framework combines the global and local features for building the CRF model. The local features are generated by DeepLab-VGG-16, which is a reorganized network of DeepLab and pre-trained VGG-16. The DeepLab-VGG-16 is trained by EMDS-4 to generate a feature vector for each pixel. Then, these feature vectors are fed into RF to perform as the unary potential of CRF. Compared with SIFT and simple features, the results show that the DeepLab-VGG-16 feature is superior.

In [129] , transfer learning technology is used to remedy the problem of the small dataset. Two public datasets and one private dataset are used in this study. The ISIIS dataset is used to pre-train a DeepSea and the first AlexNet, while ImageNet is used to pre-train the second AlexNet. These pre-trained networks are used as feature extractors to obtain the image features of the small private dataset, and the SVM is used to evaluate these features. The re-sults show that DeepSea with ISIIS performs best, achieving 84% classification accuracy.

As an extension of works [156] , a bacteria classification system based on Inception-V3 and SVM is presented in [1] . The data used in this study consists of seven categories. The pre-trained Inception-V3 is fine-tuned by about 800 training images in the system. After that, SVM performs the classification task with the fine-tuned Inception-V3 as the feature extractor. The experimental results show that the system achieves an accuracy of around 96%.

In [22] , an end-to-end framework for plankton image identification and enumeration is presented. An algorithm is proposed to extract and enhance the ROI from the input image first. Then, the local grayscale values are used to enhance the local features of ROIs. After that, CNN is used to extract the features from the enhanced ROIs. These features are fed into SVM for multiclass classification. In the experiments, CNNs, including AlexNet, VGGNet, GoogLeNet, and ResNet, are compared. The private data used in this study contains six plankton categories and one 'other' category. The experimental results show that the best accuracy (94.13%) and recall (94.52%) are obtained by ResNet50 with SVM.

In [125] , a generic framework for plankton classification is proposed. The data used in this study contains 235 images in five categories. In this experiment, Inception-V3, VGG-16, and VGG-19 are used to extract common features, and then CNN, Logistic Regression, SVM, and KNN are used as the classifiers to test the performance of different features. The results show that the CNN with Inception-V3 feature extractor achieves the best accuracy of 99.5%.

Plankton is one of the essential components in marine ecosystems. In [99] , different transfer learning methods based on CNNs are investigated, aiming to design an ensemble plankton classifier based on their diversity. The transfer learning methods include one round tuning, two rounds tuning, and preprocessing tuning. The datasets used in this study include three public datasets (WHOI, ZooScan, and Kaggle) and one dataset used in two rounds of tuning.

In the experiments, three transfer learning methods and the features extracted from the one round tuning are combined with SVM are tested. The experimental results show that the ensemble of models generated by one round tuning and two rounds tuning obtains the best performance. It achieves an accuracy of 0.9527 in the WHOI dataset, an accuracy of 0.8826 in the ZooScan dataset, and an accuracy of 0.9413 in the Kaggle dataset.

Since the convolution module is usually translational invariant, when the target rotates by a certain angle, the network will not recognize it. In [23] , a method combining the translational and rotational features is proposed to address this problem. In this method, the original images (images in Cartesian coordinates) are transformed into polar images (images in polar coordinates) first. Then, the CNN models trained by polar coordinates and original images are used as a feature extractor. The classification is performed by SVM. In this study, the proposed method is tested on the in situ plankton dataset and the CIFAR-10 dataset. The experimental results show that the Densenet201+Polar+SVM model obtains the highest classification accuracy (97.989%) and recall rate (97.986%) on the in situ plankton dataset. On the CIFAR-10 dataset, it obtains the highest classification accuracy (94.91%) and the highest recall rate (94.76%).

Environmental microorganism image datasets usually contain limited data. In [168] , an enhanced framework of GANs is proposed to perform the environmental microorganism image data augmentation task. The dataset used in the experiment is EMDS-5, which contains 21 classes of microorganisms, with 20 images and their corresponding ground truth images for each class [93] . Due to the different directions of microorganism images, it is challenging to generate images in various directions directly through GAN. Considering the small dataset, the images directly generated by GAN may lose many details. This framework transforms the original image in the same direction by combining it with the ground truth image to address the above problems. Besides, color space transformation is performed to increase the amount of training data and enable GAN to generate more details. To evaluate the framework's effectiveness, VGG-16 is trained by the generated images for classification, and the results show that the data generated by the proposed framework can effectively improve the classification performance.

After the above review, compared with the study based on classical neural network, we find that the MIA based on the deep neural network has the following characteristics: First, different from MIA based on classical neural networks, more open-access microorganism datasets are used, such as SIPPER, WHOI-Plankton, and EMDS series. Second, the deep neural networks used in MIA tasks are mainly CNN, whose convolutional filter can be applied directly to the image data. Therefore, unlike MIA based on classical neural networks, MIA based on deep neural networks usually do not rely on feature engineering. The neural network can automatically extract the effective features from the image through the convolutional filter, which is more flexible than artificial features. Besides, the structure of deep neural networks is usually deeper than that of classical neural networks because the convolution layer has fewer parameters than the fully connected layer. However, deep neural networks require lots of training data and high performance of the computer. Third, due to the flexibility of deep neural network architecture, the MIA tasks based on deep neural networks are more varied, such as microorganism image data augmentation, detection, and counting. In addition, compared with MIA based on classical neural networks, more MIA works based on deep neural networks focus on optimizing and improving the deep neural networks.

To help readers find and understand relevant works quickly, we briefly summarize the papers related to MIA based on deep neural networks in Tab. 4, in which years, application tasks, references, datasets, object species, category number, methods, and results are provided.

The papers related to MIA based on classical and deep neural networks are analyzed in Sec. 3 and Sec. 4. To further understand the characteristics of these works, a corresponding in-depth analysis is provided in this section. Besides, we also discuss the research limitations and potential development directions.

The works related to MIA based on classical neural networks are analyzed according to their target tasks, including classification, segmentation, counting, and feature extraction tasks. Due to most classical neural networks relying on feature engineering, the analysis of most works is provided from three parts: image pre-processing, feature extraction, and network analysis.

In the classification tasks, the image pre-processing operations mainly include noise reduction, image enhancement, and segmentation. For example, the pre-processing operations used in [28, 47, 107] contain noise reduction. These noise reduction methods have both manual methods and image processing algorithms. Image enhancement technology is used in the pre-processing of [159] . Besides, most works adopt image segmentation in their pre-processing to highlight the object regions and reduce the influence caused by noise. This step is vital for subsequent feature extraction. The segmentation methods contain several types, such as edge-based, region-based, and thresholding-based methods. [16, 163, 92, 56] employ edge detection, Iterative thresholding, and adaptive global thresholding segmentation methods. In feature extraction, the shape, texture, and color features are the most commonly used features. For example, shape features of images are extracted in [155, 45, 62, 89] , in which Fourier descriptors are extracted in [155, 62] , area, eccentricity, circularity, and other geometrical features are used in [45] , and the invariant moment is extracted in [62, 89] . The texture features of images are extracted for training classifiers in [30, 62, 89, 107] . [10, 163, 164] use the color features for training the networks. In the network analysis, it can be found that classical neural networks used in MIA tasks usually have shallow structures. This is because the performance of computer hardware is limited in the early years, and deep structures have numerous connections between different layers, which can result in abundant parameters and hard training problems. In the other tasks, the workflow is different from the classification tasks. Image segmentation is usually performed by conventional image processing algorithms or manual methods in the classification tasks, but ANNs are applied to perform it in the segmentation tasks. For example, the SOM is applied to segment yeast-like fungus images in [141] , and HMLP is used to perform the Zeihl-Neelsen tissue slide image segmentation in [113] . In addition, different from the shape and texture features used in the classification tasks, some works adopt ANNs to generate the features for the subsequent analysis in the feature extraction tasks. For instance, in [186] , PCNN extracts the entropy sequence features for the classification based on Euclidean distance.

After discussing MIA tasks based on classical neural networks, the statistic analysis is presented. As the statistic shown in Fig. 20 , we can find that MLP and RBF are the most widely used networks. The basic information and structure of MLP are provided in Sec. 2.2.1. Among the MLP structures used in MIA, MLP with three layers is the most famous structure. This is because MLP with three layers is suitable for classifying nonlinearly separable patterns and approximating functions [106] . It has been proven that three-layer MLPs can approximate any continuous and discontinuous function [20] . Besides, MLP is the most widely used network in early research about ANNs [118] . RBF is an ANN that uses the radial basis function as the activation function. The output of RBF is a linear combination of radial basis functions of the inputs and neuron parameters. It also is one of the most famous classical neural networks due to its short training process and efficient mapping any nonlinear input-output relationships [12] . Fig. 20 The statistics of the classical neural networks used in MIA task.

In conclusion, it can be found that most MIA works based on classical neural networks adopt feature engineering. The performance relies not only on the networks but also on feature extraction and selection. Therefore, this character makes it difficult to directly transplant the methods developed in these works to different microorganism analysis tasks. However, there are also cases where the classical neural network is directly applied to the microorganism image without any feature engineering among these classical neural networks. For example, a feed-forward MLP used in [167] does not rely on feature engineering but is directly applied to the image for training and classification. This innovation provides a new idea for the subsequent microorganism image analysis based on CNN.

As the overviews of works related to MIA based on deep neural networks are introduced in Sec. 4, we can find that the analysis tasks include classification, segmentation, detection, counting, feature extraction, and data augmentation. Unlike the works related to MIA based on classical neural networks, the deep neural networks are mainly CNNs, which can directly extract the potentially useful feature from the image data by the convolutional filter. That means MIA based on deep neural networks is not limited by feature engineering. Therefore, the CNN structure plays a vital role in related task performance.

A statistic of the deep neural networks used in MIA tasks is conducted in Fig. 21 . In this statistic, we divide the self-designed CNN into the CNN category. The network optimized based on the public network is divided into the original category of this public network. As we can find from Fig. 21 , CNNs are the most widely used. Among them, some characteristic self-designed networks are proposed. For example, the motor modules proposed in the parallel neural network can dynamically achieve translation, rotation, dilatation, contrast, and symmetry of the images by training in [11] , and a CNN is proposed to perform not only the detection but also the classification of fungi in [151] . CNN can be widely applied in MIA due to the proposal of convolutional filter, which can directly take image data as input and no longer need manual feature extraction. Moreover, the parameter number of CNN is usually less than that of a fully connected neural network with the same depth. In addition, various optimization algorithms and training approaches are proposed to make it easier to train CNNs. This makes the self-design and training of CNNs more flexible.

When it comes to the public networks, the top five widely used networks are VGGNet, Inception, U-Net, AlexNet, and ResNet. There are some characteristic optimized networks based on these public networks. For example, a network called LCU-Net, which optimizes the original U-Net by increasing the diversity of the receptive field and decreasing the memory requirement of the network, is proposed to perform the microorganism image segmentation task in [180, 179] , and a transferred parallel neural network, which combines a pre-trained deep learning model using as a feature extractor and an untrained model, is introduced in [158] . These public networks can be widely used due to their outstanding performance in image analysis. AlexNet is the first CNN to win the champion place in ILSVRC 2012, and it is one of the symbols of the rise of CNN. GoogLeNet (Inception-V1) and VGGNet win the first and second places in ILSVRC 2014, respectively. ResNet is the champion in ILSVRC 2015. U-Net is one of the most popular segmentation networks. These networks have a significant influence on the development of deep neural networks. Their characteristics are provided in Sec. 2.2. In addition to optimizing these networks, there are also some other innovative points in the MIA tasks based on deep neural networks. For instance, in [27] , the original plankton images are converted into texture and shape images to improve the feature diversity. These three kinds of images are concatenated for the subsequent AlexNet training. Besides, in [23] , the translational and rotational features are applied to address the unrecognizable problem caused by the target angle change. The rotational features are extracted from the polar images transformed from the original images (from Cartesian to polar coordinates) by CNN.

Besides, transfer learning also plays an essential role in the MIA tasks based on deep neural networks. According to the statistic, we find more than 15 papers involving transfer learning technology. Transfer learning focuses on storing knowledge learned from one task and applying it to a different but related task [165, 50] . Transfer learning has many applications, such as it can be used to perform the task that has limited training data [180] , and it can be applied as the feature extractor for extracting the potential high-level fea-tures [158] . Among the MIA tasks based on transfer learning, there are some characteristic works. For example, in [81] , to address the imbalanced problem, a balanced sub-dataset is made from the original dataset for pre-training the CNN, and then the original dataset is used for fine-tuning the network. In [129] , the plankton dataset ISIIS and the public dataset ImageNet are used to pre-train the deep learning models to determine the influence of different pre-trained datasets.

In this part, we present three potential directions from the perspectives of fusion of existing MIA methods, dataset characteristics, and advanced methods in other fields.

In existing methods, the enhanced framework of GANs is proposed to achieve the data augmentation of microorganism images in [168] , but it suffers from the influence caused by different object directions. To solve this problem, it uses the corresponding segmentation ground truth images to uniform the directions of these objects. This approach is limited for the datasets without segmentation ground truth images. The work [23] also faces the same challenge caused by different object directions. However, another approach is proposed to address this problem. The method used in [23] converts the original images (images in Cartesian coordinates) into polar images (images in polar coordinates) to reduce the influence of the object direction. Therefore, this thought can be applied to improve further the framework used in [168] . Besides, in [115, 88, 180] , data augmentation is one of the steps in their workflows, the methods used in these works are rotation and flipping. The enhanced frameworks of GANs [168, 169] may be helpful in these works. Although some existing MIA methods can be fused for optimization, many advanced techniques should be imported from other computer vision fields.

From the dataset's perspective, most microorganism image datasets used in MIA tasks based on classical neural networks are private, and a few open-access datasets are used in MIA tasks based on deep neural networks. Some datasets, such as the EMDS series, have limited data samples. The existing methods using the EMDS series usually applied data augmentation to solve this problem. In recent years, few-shot learning and domain adaptation have become hot topics in the research about AI. Few-shot learning aims to develop some methods from data (augment the training data set by prior knowledge), model (constrain the hypothesis space by prior knowledge), or algorithm (alter the search strategy for the parameter of the best hypothesis in hypothesis space by prior knowledge) to achieve the good performance with limited training data [162] . Domain adaptation aims to learn a model for the target domain by leveraging knowledge from a labeled source domain [34, 33] . Although few-shot learning and domain adaptation may help solve the limitation caused by insufficient data, they still face data acquisition challenges. Ideal few-shot learning and domain adaptation algorithms can address the insufficient data problem. However, there is still a performance gap between current few-shot learning and domain adaptation technologies and conventional supervised (deep learning) algorithms.

In recent years, the self-attention mechanism-based transformer [154] , which is widely used in natural language processing, has become a new and hot spot in computer vision. That is because CNN is concerned with detecting certain features but does not consider their positioning with respect to each other. Besides, the pooling operations used in CNN lose much valuable information, such as the precise location of the adequate feature descriptor. However, the transformer-based self-attention mechanism has a more robust ability of global information representation. Additionally, the transformer has fewer parameters and low computation but performs well with the image analysis tasks compared with CNN. Therefore, it also can be a development direction in MIA tasks.

Considering the advantage of the transformer, Vision Transformer (ViT) is proposed to perform the image classification task in [39] . It is one of the most remarkable visual transformer methods, which directly applies sequences of image patches (with position information) as input first. The ViT projects the patches to the original transformer encoder and classifies the images with a multi-head attention mechanism as it works in natural language processing tasks. Fig. 22 provides the architecture of ViT. It is a potential method for microorganism image classification.

In addition to the classification task, transformer also performs well in the image detection task. In [18] , a novel network named DEtection TRansformer (DETR) is proposed for object detection. As the architecture of DETR is shown in Fig. 23 , it consists of a CNN backbone for feature extraction, an encoder-decoder transformer, and a detection prediction network based on a feed-forward network. Experiment results on COCO indicate that DETR and Faster R-CNN are comparable. It means DETR has the potential to be employed in microorganism image detection.

Although transformer shows great potential in some computer vision tasks, it still faces data acquisition challenge. The most well-known flaw of the transformer is that it requires massive training data. The original transformer is proposed to perform natural language processing tasks, and it is easier to obtain sufficient training data than computer vision tasks. Many computer vision tasks adopt pre-trained transformers in their tasks. Therefore, the applications of pure transformers in MIA still have limitations. Fig. 22 The architecture of ViT in [39] . Fig. 23 The architecture of DETR in [18] .

In this paper, we conduct a review of the MIA based on classical and deep neural networks. A total of 95 papers are collected and reviewed in this paper. In Sec. 1, the background of MIA based on ANNs, motivation of this paper, literature collection process, and organization of this paper are introduced. In Sec. 2, the development trend of ANNs and some representative networks, including MLP, VGGNet, Inception, ResNet, U-Net, and YOLO, are presented. Sec. 3 introduces the MIA based on classical neural networks. Related papers are summarized according to their applied tasks. Besides, a table summarized the characteristics of related papers is provided in the end. In Sec. 4, the MIA based on the deep neural network is introduced from the perspectives of different tasks. The summary table of related papers is presented at the end of this section. Sec. 5 provides analysis and statistics of methods used in MIA tasks based on classical and deep neural networks and discusses the potential development directions from the perspectives of fusion of existing methods, characteristics of datasets, and advanced methods in other fields.

Through the summary of MIA methods in this review, it can be found that this review does not have a dataset summary part. This is because most of the datasets used in related research are private data, making there no unified standard dataset for evaluating different MIA methods. Our group has been developing open-access microorganism image datasets for several years to address this limitation. Recently, several open-access microorganism image datasets have been released by our group, including EMDS-6 [51, 182] , EMDS-7 [174] , and a SARS-CoV-2 (COVID-19) microscopic image dataset [87] . In the future, developing more open-access microorganism image datasets and practical MIA algorithms based on ANNs is significant.

Combining deep convolutional neural network with support vector machine to classify microscopic bacteria images

A state-of-the-art review for gastric histopathology image analysis approaches and future development

Intelligent plankton image classification with deep learning

Performance evaluation of hybrid cnn for sipper plankton image calssification

Stalked protozoa identification by image analysis and multivariable statistical techniques

Cnn based yeast cell segmentation in multi-modal fluorescent microscopy data

Identification and enumeration of cyanobacteria species using a deep neural network

Agricultural microbiology

Detecting and discriminating between different types of bacteria with a low-cost smartphone based optical device and neural network models

Automatic identification of algae: neural network analysis of flow cytometric data

Automatic recognition of coccoliths by dynamical neural networks

Woven textile structure: Theory and applications

Learning deep architectures for AI

Greedy layer-wise training of deep networks

Learning long-term dependencies with gradient descent is difficult

Rapid determination of bacterial abundance, biovolume, morphology, and growth by neural network-based image analysis

Embedded neural network system for microorganisms growth analysis

End-toend object detection with transformers

A new wastewater bacteria classification with microscopic image analysis

A robust backpropagation learning algorithm for function approximation

Multi-view 3d object detection network for autonomous driving

Enhanced convolutional neural network for plankton identification and enumeration

Method for training convolutional neural networks for in situ plankton image recognition and classification based on the mechanisms of the human eye

Novel methods based on cnn for improved bacteria classification

Water monitoring: automated and real time identification and classification of algae using digital microscopy

Automatic identification of tuberculosis mycobacterium

Texture and shape information fusion of convolutional neural network for plankton image classification

Automatic categorisation of five species of cymatocylis (protozoa, tintinnida) by artificial neural network

Dicann: a machine vision solution to biological specimen categorisation

Automatic classification of field-collected dinoflagellates by artificial neural network

A hybrid convolutional neural network for plankton classification

Arcface: Additive angular margin loss for deep face recognition

Joint clustering and discriminative feature alignment for unsupervised domain adaptation

Informative feature disentanglement for unsupervised domain adaptation

Detection of herpesvirus capsids in transmission electron microscopy images using transfer learning

Using zooimage automated system for the estimation of biovolume of copepods from the northern argentine sea

A convolutional neural network segments yeast microscopy images with high accuracy

Fat neural network for recognition of position-normalised objects

An image is worth 16x16 words: Transformers for image recognition at scale

A neural network for feature linking via synchronous activity

Automated counting of phytoplankton by pattern recognition: a comparison with a manual counting method

Evolving radial basis function networks using mothflame optimizer

Automatic identification techniques of tuberculosis bacteria

Identification of tuberculosis bacteria based on shape and color

Influence of reactor systems on the morphology of aspergillus awamori. application of neural network and cluster analysis for characterization of fungal morphology

Medical microbiology and infection at a glance

Recognition of protozoa and metazoa using image analysis tools, discriminant analysis, neural networks and decision trees

Recognition of protozoa and metazoa using image analysis tools

Understanding the difficulty of training deep feedforward neural networks

Deep learning

Deep residual learning for image recognition

Training products of experts by minimizing contrastive divergence

A fast learning algorithm for deep belief nets

Reducing the dimensionality of data with neural networks

Automatic identification and classification of bacilli bacterial cell growth phases

Digital image analysis of cocci bacterial cells using active contour method

Digital microscopic image analysis of spiral bacterial cell groups

Identification and classification of cocci bacterial cells in digital microscopic images

Spiral bacterial cell image analysis using active contour method

Neural networks and physical systems with emergent collective computational abilities

Accurate automatic quantification of taxa-specific plankton abundance using dual classification with correction

Texture feature extraction methods: A survey

Applying faster r-cnn for object detection on malaria images

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Classification of foodborne bacteria using hyperspectral microscope imaging technology coupled with convolutional neural networks

Single-cell classification of foodborne pathogens using hyperspectral microscope imaging coupled with deep learning frameworks

Towards an automated system for the identification of notifiable pathogens: using gyrodactylus salaris as an example

Automated identification of bacteria using three-dimensional holographic imaging and convolutional neural network

Classification and retrieval on macroinvertebrate image databases

Environmental microorganism classification using conditional random fields and deep convolutional neural networks

Imagenet classification with deep convolutional neural networks

Computerized classification system for the identification of soil microorganisms

A new pairwise deep learning feature for environmental microorganism image analysis

A state-ofthe-art survey for microorganism image segmentation methods and future potential

Rapid detection of microorganisms using image processing parameters and neural network

Segmentation of virus particle candidates in transmission electron microscopy images

Representational power of restricted boltzmann machines and deep belief networks

Backpropagation applied to handwritten zip code recognition

Gradient-based learning applied to document recognition

Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning

Content-based microscopic image analysis

A review of clustering methods in microorganism image analysis

Application of content-based image analysis to environmental microorganism classification

A survey for the applications of content-based microscopic image analysis in microorganism classification domains

A brief review for content-based microorganism image analysis using classical and deep neural networks

A sars-cov-2 microscopic image dataset with ground truth images and visual features

Mrfu-net: A multiple receptive field u-net for environmental microorganism image segmentation

A novel bacteria recognition method based on microscopic image analysis

A novel wastewater bacteria recognition method based on microscopic image analysis

An improved bp neural network for wastewater bacteria recognition based on microscopic image analysis

A novel bacteria classification scheme based on microscopic image analysis

Emds-5: Environmental microorganism image dataset fifth version for multiple image analysis tasks

A new microorganism dataset for image segmentation and classification evaluation

Machine learning in safety critical industry domains

Deep pyramidal residual networks for plankton image classification

Deep learning for generic object detection: A survey

Sphereface: Deep hypersphere embedding for face recognition

Deep learning and transfer learning features for plankton classification

Automated plankton image analysis using convolutional neural networks

Brock biology of microorganisms

Minimal annotation training for segmentation of microscopy images

A logical calculus of the ideas immanent in nervous activity

A 3d convolutional neural network for bacterial image classification

Perceptrons: An introduction to computational geometry

Magnetic optimization algorithm for training multi layer perceptron

A preliminary study on automated freshwater algae recognition and classification system

Rectified linear units improve restricted boltzmann machines

A deep framework for bacterial image segmentation and classification

Hybrid multilayered perceptron network trained by modified recursive prediction error-extreme learning machine for tuberculosis bacilli detection

Tuberculosis bacilli detection in ziehl-neelsenstained tissue using affine moment invariants and extreme learning machine

Detection of mycobacterium tuberculosis in ziehl-neelsen stained tissue images using zernike moments and hybrid multilayered perceptron network

Segmentation of tuberculosis bacilli in ziehlneelsen tissue slide images using hibrid multilayered perceptron network

Online sequential extreme learning machine for classification of mycobacterium tuberculosis in ziehl-neelsen stained tissue

Automated diatom classification (part b): a deep learning approach

Lights and pitfalls of convolutional neural networks for diatom identification

Environmental microbiology

21st European Symposium on Computer Aided Process Engineering

Bacteria shape classification by the use of region covariance and convolutional neural network

Automated identification of tuberculosis objects in digital images using neural network and neuro fuzzy inference systems

Automated object and image level classification of tb images using support vector neural network classifier

Plankton classification with deep convolutional neural networks

A survey for cervical cytopathology image analysis using deep learning

Identification of covid-19 samples from chest x-ray images using deep learning: A comparison of transfer learning approaches

A deep learning based cnn framework approach for plankton classification

You only look once: Unified, realtime object detection

Rough set theory-fundamental concepts, principals, data extraction, and applications

Digital image analysis in breast pathology-from image processing techniques to artificial intelligence

Evaluation of transfer learning scenarios in plankton image classification

U-net: Convolutional networks for biomedical image segmentation

The perceptron: a probabilistic model for information storage and organization in the brain

Bacteria classification using image processing and deep convolutional neural network

Automatic classification of tuberculosis bacteria using neural network

Learning representations by backpropagating errors

Imagenet large scale visual recognition challenge

Deep boltzmann machines

Deep reinforcement learning framework for autonomous driving

Hybrid self organizing map for overlapping clusters

Planktovision-an automated analysis system for the identification of phytoplankton

Automatic bacillus detection in light field microscopy images using convolutional neural networks and mosaic imaging approach

Monitoring micorbial morphogenetic changes in a fermentation process by a self-tuning vision system (stvs)

Chapter 7 -numerical modeling and simulation

Development of algorithm tuberculosis bacteria identification using color segmentation and neural networks

A hybrid parallel som algorithm for large maps in data-mining

Very deep convolutional networks for large-scale image recognition

Gastric histopathology image segmentation using a hierarchical conditional random field

Convolutional neural network based automated detection of mycobacterium bacillus from sputum images

Inception-v4, inception-resnet and the impact of residual connections on learning

Going deeper with convolutions

Rethinking the inception architecture for computer vision

A fungus spores dataset and a convolutional neural network based approach for fungus detection

Automated classification of bacterial cell subpopulations with convolutional neural networks

Microorganism image recognition based on deep learning application

Attention is all you need

Image processing and neural computing used in the diagnosis of tuberculosis

Classification of microscopic images of bacteria using deep convolutional neural network

Deep convolutional neural network for microscopic bacteria image classification

Transferred parallel convolutional neural network for large imbalanced plankton database classification

The identification of powdery mildew spores image based on the integration of intelligent spore image sequence capture device

Cosface: Large margin cosine loss for deep face recognition

Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving

Generalizing from a few examples: A survey on few-shot learning

The semi-automated classification of sedimentary organic matter in palynological preparations

Two supervised neural networks for classification of sedimentary organic matter images from palynological preparations

Spring research presentation: A theoretical foundation for inductive transfer

Use of artificial neural networks to accurately identify cryptosporidium oocyst and giardia cyst images

Application of an artificial neural network in the enumeration of yeasts and bacteria adhering to solid substrata

An enhanced framework of generative adversarial networks (ef-gans) for environmental microorganism image augmentation with limited rotation-invariant training data

Microscopic image augmentation using an enhanced wgan

An application of transfer learning and ensemble learning techniques for cervical histopathology image classification

In situ dna-hybridization chain reaction (hcr): a facilitated in situ hcr system for the detection of environmental microorganisms

An illustrated guide to deep learning

A more efficient cnn architecture for plankton classification

Emds-7 dataset

Wide residual networks

Deep learning approach to the classification of selected fungi and bacteria

Automated quality assessment of autonomously acquired microscopic images of fluorescently stained bacteria

Review of shape representation and description techniques

Lcu-net: A novel low-cost u-net for environmental microorganism image segmentation

A multiscale cnn-crf framework for environmental microorganism image segmentation

A comprehensive review of image analysis methods for microorganism counting: from classical image processing to deep learning approaches

A comparative study of deep learning classification methods on a small environmental microorganism image dataset (emds-6): From convolutional neural networks to visual transformers

A survey of sperm detection techniques in microscopic videos

Object detection with deep learning: A review

A comprehensive review for breast histopathology image analysis using classical and deep neural networks

Bacteria classification using neural network

Deep learning approach to bacterial colony classification

This work is supported by the "National Natural Science Foundation of China" (No. 61806047) and the "Fundamental Research Funds for the Central Universities" (No. N2019003). We also thank Miss. Zixian Li and Mr. Guoxian Li for their important discussion in this work. Chen Li is both the co-first author and corresponding author of this paper.

The authors declare that they have no conflict of interest.