key: cord-0943990-jsf91bry authors: Rey, Juan S.; Li, Wen; Bryer, Alexander J.; Beatson, Hagan; Lantz, Christian; Engelman, Alan N.; Perilla, Juan R. title: Deep-learning in situ classification of HIV-1 virion morphology date: 2021-10-05 journal: Comput Struct Biotechnol J DOI: 10.1016/j.csbj.2021.10.001 sha: 2cd577774867277ebe7b346ccc38bb75d50309d4 doc_id: 943990 cord_uid: jsf91bry Transmission electron microscopy (TEM) has a multitude of uses in biomedical imaging due to its ability to discern ultrastructure morphology at the nanometer scale. Through its ability to directly visualize virus particles, TEM has for several decades been an invaluable tool in the virologist’s toolbox. As applied to HIV-1 research, TEM is critical to evaluate activities of inhibitors that block the maturation and morphogenesis steps of the virus lifecycle. However, both the preparation and analysis of TEM micrographs requires time consuming manual labor. Through the dedicated use of computer vision frameworks and machine learning techniques, we have developed a convolutional neural network backbone of a two-stage Region Based Convolutional Neural Network (RCNN) capable of identifying, segmenting and classifying HIV-1 virions at different stages of maturation and morphogenesis. Our results outperformed common RCNN backbones, achieving 80.0% mean Average Precision on a diverse set of micrographs comprising different experimental samples and magnifications. We expect that this tool will be of interest to a broad range of researchers. Transmission electron microscopy (TEM) has a multitude of uses in biomedical imaging due to its ability to discern ultrastructure morphology at the nanometer scale. Through its ability to directly visualize virus particles, TEM has for several decades been an invaluable tool in the virologist's toolbox. As applied to HIV-1 research, TEM is critical to evaluate activities of inhibitors that block the maturation and morphogenesis steps of the virus lifecycle. However, both the preparation and analysis of TEM micrographs requires time consuming manual labor. Through the dedicated use of computer vision frameworks and machine learning techniques, we have developed a convolutional neural network backbone of a twostage Region Based Convolutional Neural Network (RCNN) capable of identifying, segmenting and classifying HIV-1 virions at different stages of maturation and morphogenesis. Our results outperformed common RCNN backbones, achieving 80.0% mean Average Precision on a diverse set of micrographs comprising different experimental samples and magnifications. We expect that this tool will be of interest to a broad range of researchers. Transmission electron microscopy (TEM) has long been used as a diagnostic tool in virology. Investigations of fluid samples from patients' skin lesions in the 1940s enabled the variola virus, which is the poxvirus that causes smallpox, to be discerned from the much larger varicella-zoster virus, which is a herpesvirus that causes chickenpox [1] . The introduction of negative stain materials, such as uranyl acetate and phosphotungstic acid, in the late 1950s, significantly improved ultrastructure resolution and thus was a springboard development for the use of TEM in modern day virology [2] . TEM has been invaluable to the discovery and diagnosis of many viral diseases that still plague the world today. For example, TEM data was instrumental in the initial classification of the AIDS virus, since named HIV-1 for human immunodeficiency virus 1, as a retrovirus [3] . TEM-based techniques are today used to diagnose pathologies associated with infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is the cause of the worldwide COVID-19 pandemic (reviewed in [4] ). HIV-1 is taxonomically a lentivirus, which is one of six genera that comprise the Orthoretrovirinae subfamily of Retroviridae. As such, the viral structural proteins and replication enzymes are expressed in infected cells as Gag and Gag-Pol polyproteins that become cleaved by the viral protease (PR) enzyme during the process of virus maturation (reviewed in [5] ). Typical mature HIV-1 particles, which are approximately 90 to 120 nm in diameter, harbor an internal core that is composed of a conical shell of capsid protein that houses the viral ribonucleoprotein complex (RNP) composed of two copies of the viral RNA genome, the structural protein nucleocapsid (NC), and reverse transcriptase and integrase (IN) enzymes [6] (Fig. 1) . The RNP is the most electron-dense component of HIV-1 particles [7] . In immature particles, which are non-infectious, the electron density presents as a toroidal structure in proximity to the viral membrane [7] (Fig. 1) . TEM has been invaluable to studies of HIV-1 inhibitors that interfere with proper virion maturation. Compounds that inhibit https://doi.org/10.1016/j.csbj.2021. 10 HIV-1 PR activity block polyprotein processing and hence arrest HIV-1 replication at the maturation step (reviewed in [8] ). A second class of HIV-1 maturation inhibitor, which is typified by bevirimat, binds to the protein substrate to inhibit the final cleavage of Gag processing between capsid and spacer peptide 1 (reviewed in [9] ). Removal of the IN domain from the C-terminus of Gag-Pol can also increase the frequency of immature particles in HIV-1 virion preparations [10, 11] . IN missense mutations can moreover elicit eccentric HIV-1 particle formation, where the electron-dense RNP appears outside the viral core, often in association with the viral membrane [10] [11] [12] [13] (Fig. 1) . The allosteric IN inhibitor (ALLINI) class of preclinical HIV-1 compounds elicits eccentric particle for- The color scheme cartoons the following components from exteror to interior: blue, envelope glycoproteins; green, lipid bilayer; yellow, matrix protein; black, capsid protein; red, RNA. B Schematic representation showing the configuration of a mature virion and samples from TEM micrographs. C Schematic representation showing the configuration of an eccentric virion and samples from TEM micrographs. mation (reviewed in [9] ). In this way, the inhibitors hypermultermerize IN to preclude its binding to RNA in the virus particle [14, 13] . Over the last five years, the application of Machine Learning (ML) in biomedical image processing has increased significantly [15] . For instance, image classification has enabled diagnostic prediction of Alzheimer's disease in patients from brain MRIs [16] and SARS-CoV-2 detection from chest X-ray scans [17] . For microscopy image analysis, where individual detection and classification of substructures of images are necessary, two main frameworks have been applied: object segmentation and object detection. While the first aims to classify the pixels in an image predicting the probability that they belong to a certain class, object detection uses a perregion approach for classifying object instances. These techniques have been applied for the detection of cancer cell nuclei [18] , the segmentation of neural membranes [19] , segmentation of feline calcivirus [20] and virus classifications [21] . So-called Convolutional Neural Networks (CNNs) have proven useful for the semantic segmentation of small extracellular vesicles (sEVs) from TEM micrographs [22, 23] . These approaches include U-Net [24] , a CNN based on the combination of downsampling and upsampling layers with connections between the convolutional layers. Although so-called deep learning models like Sparse Autoencoders (SAE) and Recurrent Neural Networks (RNNs) have been applied to medical imaging [25, 26] , the most popular approach continues to be CNNs. The latter class of models have proven their usefulness in classification, detection and segmentation tasks across a broad range of fields and applications, producing results up to par with medical experts [27, 28] . In the present manuscript, we present an end-to-end Deep Learning based method for the automated detection and classification of HIV-1 virion morphologies from input TEM micrographs. Our pipeline is composed of two main components, an object detection pipeline based on a Faster RCNN [29] architecture, and a stratification layer that can employ different classification backbones. The classification backbones supported include ResNets and a novel CNN, named TEMNet; the latter, designed and trained from scratch for the detection of HIV-1 virions from TEM micrographs. ResNets are pretrained using the ImageNet database, in our pipeline we support ResNet101, ResNet101v2 and InceptionResNetv2. Although ResNets are pretrained with photon-based photographs, here we show that ResNet101, ResNet101v2 and Inception-ResNet-v2 accurately derive morphology distributions across mutant and clinical isolate samples after fine-tuning. In addition, our novel-classifier (TEMNet) proves to be a competitive classification backbone, trading off some accuracy (2% less accurate) for an 18% increase in speed and a 94% decrease in memory usage. Altogether, we find that our method is efficient and robust for in situ HIV-1 virion detection across different morphotypes, predicting statistical distributions agreeing with results from end-user visual inspection, while being up to three orders of magnitude faster. The application of our method to TEM micrographs achieved a mean Average Precision (mAP) of up to 80:0%. Our implementation is based on the Tensorflow [30] Keras framework. The source-code has been made available through https://github.com/Perilla-lab/TEMNet. All viruses analyzed in this study were generated from proviral DNA molecular clones. HIV-1 strain NL4-3 (HIV À 1 NL4À3 ) was generated from pNL4-3 [31] or pNL43/XmaI [32] while HIV-1 YU-2 and HIV-1 JR-CSF were generated from respective plasmids pYU-2 [33] and pYK-JRCSF [34] . Mutations in pol corresponding to IN changes D116N, N184L, and delIN, as well as PR active site mutation D25A, were introduced into pNL43/XmaI by site-directed mutagenesis using the primers listed in Table 1 . The presence of desired mutations and absence of unwanted secondary changes were verified by Sanger sequencing. Previously described wild type (WT) HIV À 1 NL4À3 produced in the presence of dimethyl sulfoxide or the ALLINI BI-D, as well as IN mutant L241A, E96A, and N18I viral micrographs [7, 13, 35] , were additionally used for RCNN training in this study. Viruses were generated from plasmid DNAs by transfecting HEK293T cells, which were grown in Dulbecco's modified Eagle's medium supplemented to contain 10% fetal bovine serum, 100 IU=ml penicillin, and 100 lg=ml streptomycin at 37 C in the presence of 5% CO 2 . Briefly, cells grown in two 15-cm dishes (10 7 cells per dish) were transfected with 30 lg plasmid DNA using PolyJet DNA transfection reagent as recommended by the manufacturer (SignaGen Laboratories). Two days after transfection, cell supernatants were filtered through 0.22 lm filters and pelleted by ultracentrifugation using a Beckman SW32-Ti rotor at 26,000 rpm for 2 h at 4 C. Virus pellets were fixed with 1 mL fixative (2.5% glutaraldehyde, 1.25% paraformaldehyde, 0.03% picric acid, 0.1 M sodium cacodylate, pH 7.4) overnight at 4 C. The following steps were conducted at the Harvard Medical School Electron Microscopy core facility. Samples were washed with 0.1 M sodium cacodylate, pH 7.4, and postfixed with 1% osmium tetroxide and 1.5% potassium ferrocyanide for 1 h, washed twice with water, once with maleate buffer (MB), and incubated in 1% uranyl acetate in MB for 1 h. Samples washed twice with water were dehydrated in ethanol by subsequent 10 min incubations with 50%, 70%, 90%, and then twice with 100%. The samples were then placed in propyleneoxide for 1 h and infiltrated overnight in a 1:1 mixture of propyleneoxide and TAAB Epon (Marivac Canada Inc.). The following day, the samples were embedded in TAAB Epon and polymerized at 60 C for 48 h. Ultrathin sections (about 60 nm) were cut on a Reichert Ultracut-S microtome, transferred to copper grids stained with lead citrate, and examined in a JEOL 1200EX transmission electron microscope with images recorded Table 1 Oligonucleotides used to introduce indicated changes into pNL43/XmaI DNA. on an AMT 2 k CCD camera. Images were captured at 30,000Â, 25,000Â or 20,000Â magnification. Micrographs were stored on a 8-bit single-channel TIFF lossless format. In contrast to photonbased microscopy, where each pixel in the TIFF files encodes the wavelength of the photon, the TIFF files used in the present study contained electron intensities. In order to build a robust neural network capable of identifying HIV-1 virions across different experimental conditions, we built training and validation datasets from micrograph samples using IN and PR mutant viruses to mimic eccentric and immature particle morphologies, respectively. In total, 59 micrographs imaged at 30,000Â magnification were assigned morphology labels. This dataset can be found freely at 10.5281/zenodo.5149062. The raw TEM micrographs were then pre-processed. First, the TEM micrographs were cropped, removing the image labeling information added by the microscope and standardizing the micrograph size to 4,000 Â 2,620 pixels. Additionally, since object detection tasks perform segmentation and classification of image-based objects by passing regions of interest through a convolutional network, bounding box coordinates ðx i ; y i ; w i ; h i Þ were assigned to each of the labeled virions in each micrograph. A reasonably sized dataset is vital for training a deep neural network, which especially applies with CNNs, where the number of learnable parameters can reach millions and can quickly overfit if the number of training samples is too small. A known paradigm to solve this issue is via transfer learning, where a network is first pre-trained on a massive dataset like ImageNet [36] and then trained on the smaller target dataset. However, it has been shown that for object detection tasks [37] , results on par with ImageNet pre-trained networks can be achieved when training from a random initialization (from scratch) with a dataset as low as 10k samples [38] , given sufficient training time. Two approaches were implemented to effectively increase the size of our data: First, each micrograph was cropped into overlapping regions of 1,024 Â 1,024 pixels. To generate the virion classes and box coordinates inside each cropped region, HIV-1 particles were counted as ground truth only if at least 75% of the area spanned by their respective bounding box was inside the cropped region. The features of the detected particles accordingly remained consistent across the datasets and misclassification noise, where a small section of a virion is mistakenly classified, was downplayed. This method generated between 1 to 48 regions where particles were present and increased the number of images in the dataset to 2,730. The second approach consisted of applying offline augmentations to the cropped images as follows: each input image was transformed applying horizontal flipping, vertical flipping, 180 rotations and gaussian noise with a mean of 0 and a standard deviation of 1. This increased the dataset by a factor of four, generating new images and labels that were consistent with the features and morphologies of the HIV-1 virions in the micrographs, while the modifications of the images reduced overfitting on the training process. Together, the strategy described above, which is represented in Fig. A.9 , increased our dataset to 13,650 images. The latter dataset was divided for network optimization, yielding 10,725 images for training and 2,925 images for validation. Developing an algorithm to identify and classify HIV-1 virions from TEM micrographs is in essence an object detection problem, where the goal is to classify individual object instances in an image and localize each one using a bounding box. For this task we employed the Region-based Convolutional Neural Network [39] (RCNN) architecture. RCNNs are based on applying a Convolutional Neural Network (CNN) to evaluate classification on a number of candidate Regions of Interest (RoI) delimited by bounding boxes. In this sense, RCNNs are two-stage object detection architectures since a network proposing the candidate RoIs is first necessary before the backbone CNN can be applied for classification. To efficiently generate RoIs, Ren, et.al [29] proposed the Faster RCNN architecture where, as shown in Fig. 2A) , a Region Proposal Network (RPN) shares the convolutional backbone used for classification and outputs a set of rectangular RoIs along with an objectness score indicating the probability of an object inside the RoI belonging to a class vs the background. The RPN works by generating anchors, i.e., sliding windows of different sizes and scale ratios over the last Convolutional Feature Map output of the backbone CNN. Each anchor is mapped to an intermediate low dimension feature map and then connected to two fully connected layers, for regression of bounding box coordinates and computation of a class score that determines whether or not there is an object in the region. The RoIs proposed by the RPN are then passed through a RoI pooling layer [40] or RoIAlign layer [41] where their features are extracted via average or max pooling and then passed to the classifier heads. In this way, the final fully connected and softmax layers assign a per-class probability to each of the proposed RoIs. Since the RCNN is a two-stage method that predicts both the bounding box localization ðx i ;ŷ i ;ŵ i ;ĥ i Þ as well as the classification probabilitiesp i of object instances in an image with ground truth classes u i and bounding box localization ðx i ; y i ; w i ; h i Þ, the error function to minimize network training is a multi-task loss consisting of two parts [40] : where the normalizing parameters are the number of classes N class and the number of regions of interest proposed N RoI . The first loss is the cross entropy log loss that the RoI proposal i belongs to the class u with a probabilityp i;u . While the second loss is calculated only when the predicted region is not classified as background (u ¼ 0), as indicated by the Iverson bracket function ½u P 1, and it's given by a smooth L1 loss L loc ðt i ; t i Þ ¼ X As shown in Fig. 2B , TEMNet is a sequential architecture composed of four convolutional blocks and max pooling layers. Each convolutional block consists of a 2D convolution followed by a normalization and a ReLu activation. Because convolutional blocks use padding to conserve the tensor size of the previous feature map, the feature map size is reduced by a factor of only 1=2, applying a max pooling layer of kernel size 2 after each convolutional block. Convolutional blocks 1, 2, 3 and 4 use kernel sizes of 13 Â 13; 9 Â 9; 7 Â 7 and 5 Â 5, respectively; sequentially decreasing in order to adapt to the reduced feature map size after max pooling on each one of the networks stages. As a way to mitigate overfitting, we added a Gaussian noise layer with standard deviation of 0:1 after the first max pooling layer, to act as a regularization layer for training while being inert for inference. Normalization is essential for convergence of a deep network during training. However, batch normalization requires a sufficiently large batch size [45] which is not beneficial in object detection tasks where a small batch size is necessary to keep a high image resolution. In this case, batch normalization can lead to inaccurate batch statistics. On the one hand, pretraining and transfer learning are crucial techniques to facilitate the convergence of the network loss. In the case of ResNet backbones, weights trained on ImageNet [36] , a massive-scale image classification database, are readily available online. These pretrained weights were used as a starting point for training on our dataset where batch normalization layers were frozen, effectively transforming them to linear layers and the batch statistics learned on the massive-scale dataset were transferred to the new network. Results with and without pretraining for ResNet101 are shown in Fig. A.10 . On the other hand, normalization on a network trained from scratch cannot benefit from transfer learning. Instead, for TEMNet we implemented group normalization [46] , which normalizes along the channel axis instead of the batch axis. In order to generate multi-scale feature maps on which to generate predictions, we used a Feature Pyramid Network [47] (FPN) for both our TEMNet and ResNet backbones. The ResNet implementation was done according to the original FPN paper [47] and zero padding was added to make the layers of Inception-ResNet-v2 compatible; while for TEMNet, in a similar manner the output of each of the max pooling layers fC1; C2; C3; C4g was used to generate the pyramid feature maps fP2; P3; P4g. For this procedure, every layer was passed through a 1 Â 1 convolution to standardize the number of filters (256), this convolution is known as a lateral connection. The top-down pathway was then built, starting with the coarsest resulting feature map P4 (generated from C4). The latter was upsampled with a 2 Â 2 kernel and added to the underlying C3 feature map to generate P3, afterwards P3 itself was upsampled and added to C2 to generate the feature map P2. Finally, a 3 Â 3 convolution was applied on each feature map fP2; P3; P4g. These feature maps work as pyramid ''levels" to which RoIs were mapped according to their size. Specifically, following [47] a RoI with height h and width w will be assigned to the pyramid level P k where 2,620 is the number of pixels constituting the smaller side of a micrograph and 4 is the single scale level for a RoI. Predictions on each of the pyramid levels were then funneled to two fully connected layers with 64 neurons for TEMNet and 1,024 for ResNet and Inception-ResNet, then finally to a softmax layer where per-class probability was assigned on 3 + 1 channels: three for our virion classifications (eccentric, mature, immature) and one for background. For the training procedure, ResNets were initialized from Ima-geNet pretrained weights and then fine-tuned on a dataset of 1,806 isolated HIV-1 virion samples. TEMNet was trained from scratch on the same dataset. Afterwards, the CNN backbones were initialized on their fine-tuned weights and trained individually on the RPN network for RoI proposal generation for 50 epochs on the cropped micrograph dataset consisting of 13,650 images. Finally, the RPN trained weights were used as initialization for training the CNN backbones on the full RCNN architecture for 50 epochs using the cropped micrograph dataset. The input image was resized to 512 Â 512 pixels allowing a batch size of 8 images on a single NVIDIA V100 GPU. Stochastic Gradient Descent (SGD) was used to train the model with a starting learning rate of 0.01 which decreased by a factor of 10 every time a learning plateau was encountered on the validation loss. Weight decay was set to 0.0001, learning momentum to 0.9 to avoid the training getting stuck on a local minimum and gradient clipping norm to 5.0 to avoid exploding gradients. We used 50 RoIs per image for training and 20 for validation. Training Faster RCNN with TEMNet took less than 9 h on our cropped dataset. Training error with ResNet and TEMNet backbones are shown in Fig. 3 . From this training the weight checkpoint, which achieves the lowest validation error, was chosen to avoid overfitting (Early stopping). The feature map activations learned by the convolutional blocks and FPN levels of TEMNet are presented in Fig. 4 . As a result of training the network on a dataset composed of micrograph croppings, our Faster RCNN network can generate predictions on 1; 024 Â 1; 024 pixel croppings of TEM micrographs. In order to generate end-to-end predictions on raw TEM images we devised a method to segment a micrograph via a sliding window. As illustrated in Fig. 5 , we scanned an input micrograph by translating a sliding window across the image and generating overlapping segmented regions. The segmented regions were compiled into batches and used as input for the RCNN network, which generated RoI (rectangular bounding box) coordinates and classification probability predictions for each virion instance detected in the segmented regions (see Fig. A.11 . for details). Then, the predicted RoIs were shifted by the position of the sliding window and gathered on the input full scale micrograph. Since the sliding window generated overlapping segmented regions, the network predicts multiple times on the virion instances localized in overlapping regions, generating overlapping RoIs with different classification probabilities that describe the same virion. To glean final predictions, non-max suppression was applied to the predicted RoIs to eliminate the RoIs whose area overlapped more than a 30% threshold by retaining the RoI with the highest confidence score (i.e., prediction probability) and discarding the overlapping regions with lower confidence. In the case of confidence score ties, final RoIs were chosen by a larger area criterion due to a larger RoI being generally better at comprehending a viral instance in the full scale micrograph and providing better feature extraction through the Feature Pyramid Network. Finally, the resulting RoIs and class probability scores were displayed and a per class count was performed on the processed predictions. The default sliding window size for 30,000Â magnification micrographs was 1; 024 Â 1; 024 pixels (569 Â 569 nm) consistent with the cropping size and magnification used for building the training dataset. The sliding window approach provided advantages to the prediction pipeline. For instance, translation variance of the predicted class for a given viral instance was handled by considering predictions from different segmented regions and keeping the predicted RoI with the highest confidence score. Furthermore, this approach allowed for prediction generation on multi-magnification micrograph sets. Image size was linear with magnification, therefore a window size to magnification ratio r ¼ W t =M t ½px can be calculated based on the cropping size W t and magnification M t used for training the network. This ratio was used internally by the network to calculate the appropriate sliding window size W new for an input micrograph with a given magnification M new Fig. 5 . Micrograph segmentation via a sliding window: A A windowed region was translated across the image and predictions were generated on the segmented regions. B The predictions were gathered on the full scale micrograph and C Nonmax suppression (NMS) was applied to determine classifications with highest confidence from overlapping Regions of Interest (RoIs), to glean final predictions. Numbers above each bounding box correspond to prediction ''confidence" or certainty, which may ultimately be used to filter predictions (see Fig. 7) . Scale bars are shown in the lower left portions of panels B and C. ð6Þ the latter preserves the physical dimensions (in nm) of the segmented regions by the RCNN process, allowing consistent predictions across multiple magnifications. Samples of predicted micrographs at 30,000Â and 20,000Â magnification are presented in Fig. 6 . Additional predictions on 25,000Â magnification micrographs are presented in supplemental Fig. A.12. To evaluate the performance of our network, we measured the mean Average Precision (mAP) as is traditional for object detection models [48] . mAP computes the average precision over all classes for a recall value from 0 to 1. Precision, recall and F1 scores were calculated as Precision ¼ [48] and the F1 score is the harmonic mean between precision and recall. Precision and recall pairs of values were calculated for increasing subsets of detections such that precision vs recall curves could be built from pair plotting. The precision vs recall curve was interpolated so dips in precision were replaced by the maximum precision for a given recall value. The mAP was then calculated as the area under the precision (p) vs recall (r) curve mAP ¼ since mAP takes into account precision, recall and the IoU overlapping of predicted RoIs to ground truth, it is regarded as the de facto 'gold standard' to evaluate accuracy on Object Detection tasks across many datasets [49] [50] [51] . mAP scores for each of our Convolutional Neural Network backbones were measured on a validation dataset composed of 13 full scale TEM micrographs pertaining to different experimental conditions. As presented in Table 2 000Â) . B A magnification lower than is discernible by a trained expert (20; 000Â). Our RCNN network calculates the appropriate sliding window size to segment a micrograph according to its magnification. Scale bars are shown beneath the micrographs. TEMNet achieved the highest recall score. Interestingly, the Incep-tionResNetv2 architecture, which was the most complex, achieved the highest mAP among the ResNet backbones with also the highest recall. However, Inception-ResNet-v2 lacked the precision of other ResNet modules, perhaps due to the fact that zero padding was necessary to make the shapes of the inception blocks compatible with the upscaling and adding layers present in the FPN. In addition to being accurate, the network prediction pipeline derived in the present work significantly improved workflow times. While 30 min on average was required to manually ascribe 200 HIV-1 particle classifications on a single micrograph, our network offered a significant speedup, processing 130 micrographs in five minutes on one GPU, generating bounding box coordinates and classification probabilities as well as count histograms for each of the micrographs processed. Furthermore, in order to evaluate the statistical distribution of classifications predicted by our Faster RCNN + FPN implementation, we measured the percentage of each particle morphology class across different experimental conditions. These morphology distributions were compared between the in situ ground truth classification counts from manually tabulated micrographs compared to the predicted distributions from the virion detection counts performed by our AI. Side by side histograms are shown in Fig. 7 for predictions using the TEMNet backbone filtering predictions whose confidence score was above a 0.5 threshold. Ground truth labeled micrograph samples as well as their detection and classification predictions for each experiment are also presented in supplemental Fig. A.13 . Additionally, we compared the morphology distributions for an independent set of NL4-3 viruses and primary isolate samples that were not used as part of the training or validation micrographs. Side by side histograms are shown in Fig. 8 for the TEMNet backbone with a confidence threshold of 0.5; (D116N, N184L, delIN) and PR mutant (D25A) viruses. A Ground truth distribution from manually ascribed micrograph sets. B Resulting distributions from TEMNet's predictions on the same micrographs. Predictions with a confidence score c above 0.5 were counted while those under this confidence threshold were rejected. Numbers over each distribution indicate the number of virus particles counted and the number of independent micrographs analyzed (*). Error bars represent standard deviation from experimental replicates. histograms for other backbones with increasing confidence scores are presented in supplemental Fig. A. 14. As summarized in Tables 3 and 4 , ascribed morphology distributions were in accordance with the associated ground truth measures with a root mean squared error (RMSE) lower than 10% for each virus type independent of the CNN backbone used in the Fas-ter RCNN architecture. Among the analyzed viruses, WT NL4-3 and delIN were the most challenging to classify, which corresponded to micrographs that presented the lowest contrast between particle instances and background along with the most image noise out of the validation samples (see Fig. A.13 ). By contrast, PR D25A mutant viral samples, which consisted of only immature virions, Table 3 Root Mean Square Error (RMSE) calculated between the predicted and ground truth distributions for each mutant virus. All predicted distributions were in accordance with the ground truth showing an error lower than 10% for all experiments independent of the convolutional backbone used. Among the convolutional backbones, ResNet101v2 provided the least error across all mutant viruses, followed by ResNet101, Inception-ResNet-v2 and TEMNet. Increasing the confidence threshold for which generated predictions were counted as True Positives for the distributions in general reduced the average RMSE across mutants for the TEMNet, ResNet101 and Inception-ResNet-v2 backbones and increased it for the ResNet101v2 backbone, helping in the first two cases to reduce the error for the WT and delIN mutants, which proved to be the most challenging micrographs to predict while PR D25A distributions were perfectly predicted (no error) due to the homogeneity of immature virions across these samples. Table 5 Pearson's v 2 test p-values calculated between the predicted and ground truth distributions for each experimental condition. Within the probability threshold of p ¼ 5% there was no statistical difference between the distributions calculated from the predictions and the distributions calculated from ground truth counts with the exception of the WT virus with the TEMNet and Inception-ResNet-v2 backbones, as well as for N184L at the lower confidence threshold using Inception-ResNet-v2. In accordance with the RMSE values ( We have developed an end-to-end deep learning solution to the automated detection and classification of HIV-1 virion particle morphologies from TEM micrographs across different maturation stages. Our overall methodology is not limited to HIV-1 particles and can be extended to other enveloped viruses provided that enough training data are available. In our approach, we have overcome the limitations of comparatively small datasets to produce reliable particle classifications and counts. Our network, named TEMNet, is a new CNN architecture for object detection and has been trained from scratch as a backbone for a two-stage Faster RCNN [29] object detection network. In line with [37, 38] , we demonstrated that our model converges when trained from scratch thanks to Group Normalization [46] techniques and building a reasonably sized dataset consisting of 13,650 labeled croppings of TEM micrographs for training and validation. Importantly, the training dataset was built from different experiments. Outcomes of particle classification were pitted head-to-head versus manual ascriptions of the same micrographs, allowing the model to be robust and generalizable for HIV-1 virions under diverse experimental setups. We have demonstrated that networks developed to handle photon-based images are competent at identifying and classifying objects from electron-based imaging. Comparing TEMNet with ImageNet pretrained ResNet [42] [43] [44] backbones we found that while both networks worked with a high accuracy on validation micrographs from different experiments, TEMNet reported the highest mAP score at 80:0% surpassing ResNets by over 1 mAP point ( Table 2 ). All backbones predicted statistically significant data when comparing the predicted morphology percentages for in situ micrographs for different IN and PR mutant viruses, with manually ascribed ground truth distributions. In this regard, however, ResNet backbones outperformed TEMNet, presenting the lowest RMSE and the highest p-values for the Pearson v 2 test as summarized in Tables 3 and 5 . The WT and delIN mutant viruses, whose samples had the most noise, preformed the poorest across techniques, while the PR D25A active site mutant virus performed best, owing to the uniformity of the immature particle morphology across samples (Figs. 7, A. 15 and A.16) . Faster RCNN combined with our TEMNet backbone also proved to be a highly efficient method for generating predictions on raw TEM micrographs, offering a significant speedup to manual classification. While 30 min on average was required to manually ascribe each micrograph that contained approximately 200 viral particles, the model processed and generated predictions of 130 micrographs in five minutes with a single GPU. This translates to an approximate 780-fold improvement in speed. In addition, our prediction method could also handle particle predictions for multi-magnification micrograph sets, demonstrated in Figs. 6 and A.12. Finally, the TEMNet backbone was accurate, efficient and also light. The memory footprint of TEMNet's training weights was only 15 MB compared to ResNet's 235 MB and Inception-ResNet-v2's 292 MB, which renders TEMNet appropriate for software implementations under hardware constraints and therefore useful for web and mobile deployment. Summarizing, here we present a robust Convolutional Neural Network for the automated detection and classification of HIV-1 particle morphologies from TEM micrographs. Our TEMNet backbone has the capability to accurately and efficiently detect HIV-1 virions and classify them according to their maturation stage across varying experimental conditions. Furthermore, the statistical distributions across experimental conditions agreed with manually ascribed results while being significantly faster. Given that Gag-interacting maturation inhibitors and ALLINIs, each of which disrupt particle maturation, are in preclinical development, our Table 6 Pearson's v 2 test p-values calculated between the predicted and ground truth distributions for WT NL4-3 and primary isolate YU2 and JR-CSF HIV-1 viruses. TEMNet is the only backbone that predicts a distribution with no statistical significance for both primary isolates within a probability threshold of p ¼ 5% when increasing the confidence threshold to 0.9. The highest average p-values are encountered for ResNet101v2, TEMNet, ResNet101 and Inception-ResNet-v2 in descending order. p-values greater than 0.05 indicate no significant statistical difference. methodology could prove useful in highly promising antiretroviral drug development programs. We moreover expect that our tool could prove useful to a broader range of scientists including virologist and medical researchers, as long as there is sufficient raw data on which to first train the machine learning methodology. The latter could especially apply to histopathological detection of SARS-CoV-2 infection (see [52] for review), where cell organelles that are similar in size to virus particles often confound data interpretation. The use of the electron microscope in diagnosis of variola, vaccinia, and varicella A negative staining method for high resolution electron microscopy of viruses Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (aids) Detection methods for SARS-CoV-2 in tissue HIV-1 assembly, budding, and maturation. Cold Spring Harbor Isolation of human immunodeficiency virus type 1 cores: Retention of vpr in the absence of p6gag Distribution and redistribution of HIV-1 nucleocapsid protein in immature, mature, and integrase-inhibited virions: a role for integrase in maturation Viral Protease Inhibitors HIV-1 maturation: Lessons learned from inhibitors Multiple effects of mutations in human immunodeficiency virus type 1 integrase on viral replication Allosteric integrase inhibitor potency is determined through the inhibition of HIV-1 particle maturation Noncatalytic site HIV-1 integrase inhibitors disrupt core maturation and induce a reverse transcription block in target cells Integrase-RNA interactions underscore the critical role of integrase in HIV-1 virion morphogenesis HIV-1 integrase binds the viral RNA genome and is essential during virion morphogenesis A survey on deep learning in medical image analysis A multimodel deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer's disease A deep learning approach to detect COVID-19 coronavirus with X-ray images Automatic nuclei segmentation in h&e stained breast cancer histopathology images An automatic learning-based framework for robust nucleus segmentation Virus particle detection by convolutional neural network in transmission electron microscopy images Deep learning and handcrafted features for virus image classification Deep-learning-based segmentation of small extracellular vesicles in transmission electron microscopy images ScanEVa neural network-based tool for the automated detection of extracellular vesicles in tem images U-net: Convolutional networks for biomedical image segmentation Stacked sparse autoencoder (SSAE) based framework for nuclei patch classification on breast cancer histopathology Spatial clockwork recurrent neural network for muscle perimysium segmentation Deep learning in microscopy image analysis: a survey Statistical and machine learning techniques in human microbiome studies: Contemporary challenges and solutions Faster R-CNN: towards real-time object detection with region proposal networks Tensorflow: A system for large-scale machine learning Production of acquired immunodeficiency syndrome-associated retrovirus in human and nonhuman cells transfected with an infectious molecular clone Structure-based mutagenesis of the human immunodeficiency virus type 1 DNA attachment site: Effects on integration and cDNA synthesis Molecular characterization of human immunodeficiency virus type 1 cloned directly from uncultured human brain tissue: identification of replication-competent and -defective viral genomes Dual infection of the central nervous system by AIDS viruses with distinct cellular tropisms Dominant negative MA-CA fusion protein is incorporated into HIV-1 cores and inhibits nuclear entry of viral preintegration complexes ImageNet: a large-scale hierarchical image database Object detection from scratch with deep supervision Rethinking ImageNet pre-training Rich feature hierarchies for accurate object detection and semantic segmentation IEEE International Conference on Computer Vision (ICCV) Deep residual learning for image recognition Identity mappings in deep residual networks Inception-v4, inception-resnet and the impact of residual connections on learning Batch normalization: Accelerating deep network training by reducing internal covariate shift Group normalization Feature pyramid networks for object detection The PASCAL visual object classes challenge: a retrospective Microsoft COCO: Common objects in context The Pascal visual object classes (VOC) challenge ImageNet large scale visual recognition challenge Hunting coronavirus by transmission electron microscopy -a guide to SARS-CoV-2-associated ultrastructural pathology in COVID-19 tissues The authors acknowledge funding from the US National Institutes of Health awards R01AI070042 (to A.N.E.), P50AI1504817 (to A.N.E. and J.R.P.) and P20GM104316 (to J.R.P.). Funding for methodology development for the present work was provided by the National Science Foundation award MCB-2027096, funded in part by Delaware Established Program to Stimulate Competitive Research (EPSCoR). This research is part of the Frontera computing project at the Texas Advanced Computing Center. Frontera is made possible by NSF award OAC-1818253. This work used the Extreme Science and Engineering Discovery Environment, which is supported by the National Science Foundation (Grant ACI-1548562). Christian Lantz was supported by NSF award CHEM-1560325. The present work was also funded by the Gordon and Betty Moore Foundation and the Research Corporation. The authors also thank Gulcin Pekurnnaz and Abhishek Chaterjee for insightful discussions. Supplementary data associated with this article can be found, in the online version, athttps://doi.org/10.1016/j.csbj.2021.10.001.