key: cord-0274955-u13s0qh3
authors: Whytock, Robin; Świeżewski, Jędrzej; Zwerts, Joeri A.; Bara-Słupski, Tadeusz; Pambo, Aurélie Flore Koumba; Rogala, Marek; Bahaa-el-din, Laila; Boekee, Kelly; Brittain, Stephanie; Cardoso, Anabelle W.; Henschel, Philipp; Lehmann, David; Momboua, Brice; Opepa, Cisquet Kiebou; Orbell, Christopher; Pitman, Ross T.; Robinson, Hugh S.; Abernethy, Katharine A.
title: High performance machine learning models can fully automate labeling of camera trap images for ecological analyses
date: 2020-09-24
journal: bioRxiv
DOI: 10.1101/2020.09.12.294538
sha: 6fd3ead4e4394bceaf7c06d7fe0d598bd7d45705
doc_id: 274955
cord_uid: u13s0qh3

Ecological data are increasingly collected over vast geographic areas using arrays of digital sensors. Camera trap arrays have become the ‘gold standard’ method for surveying many terrestrial mammals and birds, but these arrays often generate millions of images that are challenging to process. This causes significant latency between data collection and subsequent inference, which can impede conservation at a time of ecological crisis. Machine learning algorithms have been developed to improve camera trap data processing speeds, but these models are not considered accurate enough for fully automated labeling of images. Here, we present a new approach to building and testing a high performance machine learning model for fully automated labeling of camera trap images. As a case-study, the model classifies 26 Central African forest mammal and bird species (or groups). The model was trained on a relatively small dataset (c.300,000 images) but generalizes to fully independent data and outperforms humans in several respects (e.g. detecting ‘invisible’ animals). We show how the model’s precision and accuracy can be evaluated in an ecological modeling context by comparing species richness, activity patterns (n = 4 species tested) and occupancy (n = 4 species tested) derived from machine learning labels with the same estimates derived from expert labels. Results show that fully automated labels can be equivalent to expert labels when calculating species richness, activity patterns (n = 4 species tested) and estimating occupancy (n = 3 of 4 species tested) in completely out-of-sample test data (n = 227 camera stations, n = 23868 images). Simple thresholding (discarding uncertain labels) improved the model’s performance when calculating activity patterns and estimating occupancy, but did not improve estimates of species richness. We provide the user-community with a multi-platform, multi-language user interface for running the model offline, and conclude that high performance machine learning models can fully automate labeling of camera trap data.

The urgent need to understand how ecosystems are responding to rapid environmental change has driven a 'big data' revolution in ecology and conservation (Farley, Dawson, Goring, & Williams, 2018) . High resolution ecological data are now streamed in real-time from satellites, Global Positioning System tags, bioacoustic detectors, cameras and other sensor arrays. The data generated offer considerable opportunities to ecologists, but challenges such as data processing, data storage and data sharing cause latency between data gathering and ecological inference (i.e. creating derived ecological metrics, testing ecological hypotheses and quantifying ecological change), sometimes in the order of years or more. Overcoming these challenges could open the gateway to ecological 'forecasting', where directional changes in ecological processes are detected in real time and near-term responses are predicted effectively using an iterative data gathering, model updating and model prediction approach (Dietze et al., 2018) .

Digital camera traps or wildlife 'trail cams' have revolutionized wildlife monitoring and are now the 'gold standard' for monitoring many medium to large terrestrial mammals (Glover-Kapfer, Soto-Navarro, & Wearn, 2019). Animals and their behavior are identified in images either by manual labeling, using citizen science platforms (Swanson et al., 2015) or, more recently, by using machine learning models (Norouzzadeh et al., 2018; Tabak et al., 2019; Willi et al., 2019) . Machine learning models can at minimum separate true animal detections from non-detections (Wei, Luo, Ran, & Li, 2020) or in the most advanced examples identify species, count individuals and describe behavior (Norouzzadeh et al., 2018) . These recent advances in machine learning have increased the speed at which camera trap data are analyzed but, in all cases we are aware of, the outputs (e.g. species labels) are not used to make ecological inference directly. Instead, machine learning models are typically used as a 'first pass' to identify and group images belonging to individual species for full or partial manual validation at a later stage, or to cross-validate labels from citizen science platforms (Willi et al., 2019) . This can substan-tially reduce manual labeling effort but many hundreds or thousands of photos might still need to be labeled manually. Thus, although machine learning models are reducing manual data processing times, ecologists are not yet comfortable using the outputs (e.g. species labels) as part of a completely automated workflow. This is despite the development of advanced machine learning models that classify species in camera trap images with accuracy that matches or exceeds humans (Norouzzadeh et al., 2018; Tabak et al., 2019) .

One significant challenge limiting the application of machine learning models to camera trap data is that models rarely generalize well to completely out-of-sample data (i.e. data from new, spatially and temporally independent studies), particularly when used to classify animals to species level (Beery, Van Horn, & Perona, 2018) . Models can quickly learn the features of specific camera 'stations' (the spatial replicate in camera trap studies) such as the general background instead of learning features of the animal itself. This problem is further amplified by the fact that rare species in the training data might only ever appear at a limited number of camera stations, so training and validation data are rarely independent. Various approaches can be used to reduce these biases, such as carefully ensuring that training and validation data are independent (e.g. by using data from multiple studies), and by using data augmentation such as adding noise to training data in the form of image transformations. Until the problem of generalization can be overcome, machine learning models for classifying camera trap images will remain an important tool for reducing manual labeling effort, but they will not achieve their full potential for creating fully automated pipelines for data analysis.

Machine learning models also have the potential to be deployed inside camera trap hardware in the field at the 'edge' (i.e. on micro-computers installed inside hardware that collects data), with summarized results (e.g. species labels) transmitted in real-time via a Global System for Mobile Communications networks or via satellite (Glover-Kapfer et al., 2019) . In geographically remote areas or time-sensitive situations (e.g. law enforcement) this would greatly reduce the latency between data capture and interpretation, and reduce the expense and effort required to collect data in remote regions by removing the need to transfer data-heavy images across wireless networks. However, before 'smart' cameras become a reality, it is essential that users understand how uncertainty in machine learning model predictions might impact derived ecological metrics and analyses, which are often sensitive to biases (e.g. false positives in occupancy models). To achieve this, there is a need to develop workflows that test the performance of machine learning models in an ecological modeling context that goes beyond simple measures of precision and accuracy.

Ideally, if machine learning models had 100% precision and accuracy (e.g. for species identification), camera trap data could be collected, labeled automatically using the model and the results used to directly calculate ecological metrics or as variables in ecological models. However, the reality is that machine learning models are imperfect. It is therefore uncertain what levels of precision and accuracy are needed to meet the requirements of ecological analyses. This is particularly the case for the spatial and temporal analayses of animal distributions in camera trap data, which require specialized ecological models (e.g. occupancy models) that account for imperfect detection (MacKenzie et al., 2002) .

In this paper, we describe the approach used to build a new high-performance machine learning model that identifies species in camera trap images (26 species/groups of Central African forest mammals and birds) that generalizes to spatially independent data. To evaluate how well the machine learning model labeling precision and accuracy performs in an ecological modeling context, we (1) evaluate how uncertainties in the precision and accuracy of machine learning labels affect ecological inference (derived metrics of species richness, activity patterns and occupancy) compared to the same metrics calculated using expert, manually generated labels, and (2) propose a workflow to 'ground truth' the performance of machine learning models for camera trap data in an ecological modeling context. We discuss the implications of these results for making fully automated ecological inference from camera trap data using the outputs of machine learning models. We also provide the user community with an easily installed, open-source graphical user interface that needs no understanding of machine learning to run the model offline on both camera trap images and videos.

As a case study, the model was developed for classifying terrestrial forest mammals and birds in Central Africa (see Table S1 for further details on species and groups), where camera traps are now frequently deployed over large spatial scales to survey secretive birds and mammals in remote and inaccessible landscapes (Bahaa-el-din & Cusack, 2018; Bessone et al., 2020; O'Brien et al., 2020) . Training data were obtained from multiple countries and sources (c.1.6 million images; reduced to n = 347120 images after data processing; Table 1 ). Each source used different camera trap models (Reconyx, Bushnell, Cuddeback, Panthera Cams) and images were diverse in resolution, quality (e.g.

sharpness, illumination) and color. Individual studies also used different field protocols for camera deployment but all were focused on detecting terrestrial forest mammals, with cameras installed on trees approximately 30 -40 cm above ground level. The exception to this was data from (Cardoso et al., 2020) who installed cameras at a height of approximately 1 m for the primary purpose of detecting forest elephants Loxodonta cyclotis. Camera trap configuration was set to be highly sensitive in some cases and images were often captured in a series of rapid, short bursts (e.g. taking 10 images consecutively). This resulted in long sequences of very similar images, for example showing an animal walking in front of the camera ( Figure S1 ). It was important to account for image sequences when selecting a validation set during the model training phase, since there was a risk of highly similar images being present in both the training and validation sets. To address this issue, the training and validation split was performed based on image metadata (timing of images and image source) to identify unique 'events' and camera locations that were not replicated across the training and validation split (Norouzzadeh et al., 2018) . This solution posed a challenge for maintaining class balances in the training and validation sets, but it reduced the risk nonindependent training and validation sets. A total of 27 classes were used to train the model, which were mostly mammals or mammal groups (n = 21), birds (n = 4), humans (n = 1) and 'blank' images (i.e. no mammal, bird or human). Details of taxonomy and justification for species groups are in Table S1 .

Our 'real-life' training data had not been pre-processed or professionally curated for the purposes of training machine learning models and naturally contained errors that arise from hardware faults, human error and different approaches to manual species labeling by experts. We identified three primary sources of error. The first was over-exposed images (a hardware fault) where the image foreground was 'flooded' by the flash (usually at night), making the image appear mostly white. Animals in these images were sometimes partially visible and could be classified by a skilled human observer, despite the loss of color information, texture and other detail. However, over-exposed images presented a challenge for the machine learning model because white dominated the image regardless of the species.

The second main source of error was caused by under-exposed images. This error was revealed after inspecting model outputs during the training phase, and showed that highly under-exposed images appeared almost entirely or entirely black to a human observer, but the machine learning model was capable of using information in the image to detect and correctly classify the species (Figure 1 ). The final source of error in the training data was mis-labeled images (e.g. confusing similar species, such as chimpanzee Pan troglodytes and gorilla Gorilla gorilla) and using different approaches to labeling, for example one data source combined all primates into 'monkey', whereas other data sources separated apes from other primates.

We used an iterative approach to address these issues that consisted of model training, validation, error

correction (correcting mis-labeled images in the training data) and model updating. In particular, we carefully inspected images that appeared to be incorrectly labeled by the model, but which were labeled with high confidence. This approach revealed hidden problems in the data, such as the presence of animals in under-exposed images that would have otherwise led us to underestimate the model's performance.

We chose the established ResNet50 architecture to build the model (He, Zhang, Ren, & Sun, 2016) .

Transfer learning was used to speed up training and we used weights pre-trained on the ImageNet dataset. We identified species using the entire image frame without using bounding boxes and used basic augmentation (horizontal flips, rotations, zoom, lighting and contrast adjustments, and warps) during training, but not during model validation. We used one-cycle policy training (Smith, 2018) and trained using progressive resizing in two stages. Details on the training scheme and implementation can be found in our GitHub repository (https://github.com/Appsilon/gabon_wildlife_training). It is worth noting that most of the training approaches and many of the mechanisms we used to enhance training were taken directly or almost directly from the fast.ai Python library (https://github.com/fastai), exemplifying how exceptionally robust the library is. We trained the models on various virtual machines equipped with GPU processing units, run on Google Cloud Platform with resources granted by a Google Cloud Education grant.

One of the major limitations to model performance for camera trap images is the ability to generalize dataset that was completely spatially and temporally independent from the data used to train the model.

These out-of-sample data consisted of images from 227 camera stations surveyed between 16 January 2018 and 4 October 2019 in central and southern Gabon in closed canopy forest. Cameras also differed from the models used in the training data (Panthera Cams V4 and V5), but field protocols were similar and cameras were placed approximately 30 cm above the ground on a tree at a distance of c. 3 -5 m perpendicular to the center of animal trails. Single-frame images were captured using medium sensitivity settings, and images were separated by a minimum of 1 s. The aim of the study was to survey the small-to-large mammal community, with a particular focus on great apes (Pan troglodytes, Gorilla gorilla), forest elephants Loxodonta cyclotis, leopard Panthera pardus and golden cat Caracal aurata.

These data (n = 23868 images, median 75, range 1 -545 images per station) were manually labeled by an expert (co-author CO).

To allow general comparison of our model's performance with other similar models in the literature (Norouzzadeh et al., 2018; Tabak et al., 2019; Willi et al., 2019) we calculated top-one and top-five accuracies using the out-of-sample data. Top-one accuracy is the percent of expert labels that match the top-ranking label generated by the machine learning model. Top-five accuracy calculates the percent of expert labels that match any of the top five ranking machine learning generated labels. Top-one accuracy for the overall machine learning model was 77.63% and top-five accuracy was 94.24% (Table   S2 ; Figures sequences (which can be faster to label manually).

We also compared the precision and recall for each species from our optimal model (see later, Table 2) with precision and recall for the same species reported for the model used by the WildlifeInsights webplatform (www.wildlifeinsights.org). This global project uses a deep convolutional neural network trained using Google's Tensorflow framework and a training dataset of 8.7M images, comprising 614 species.

We calculated three common ecological metrics for the out-of-sample data (raw species richness at individual camera stations, activity patterns for four focal species, and occupancy for four focal species) separately using the manually generated, expert labels and the machine learning generated labels.

Species richness (the number of species in a discrete unit of space and time) can be used to quantify temporal and spatial changes in biodiversity. Although other measures of species diversity exist, we chose this simple metric because it is widely used in the ecology literature despite its limitations. Activity patterns describe the diel activity patterns of focal species (M. Rowcliffe, 2019) and are typically calculated to understand fundamental life history traits and behavior such as temporal niche partitioning. Occupancy models are hierarchical models commonly fitted to camera trap data because they can account for imperfect detection (which rarely equals 1) to estimate the conditional probability that a site is 'occupied' by a species given it was not detected (MacKenzie et al., 2002) . Covariates such as measures of vegetation cover can be included in both the detection and occupancy component models.

These models are relatively complex, and small changes in detection histories (presence or absence of a species during a discrete time interval), false positives or false negatives can dramatically affect results (Royle & Link, 2006) . We therefore predicted that occupancy estimates obtained using machine learning generated labels would compare poorly with estimates using expert, manually generated labels. The four focal species used for calculating activity patterns and occupancy were African golden cat, chimpanzee, leopard and African forest elephant. These species were chosen because they were the focus of the camera trap survey that generated the out-of-sample test data and because they are conservation priority species in Central Africa. We also initially included western lowland gorilla but we had too few unique captures of this species (only seven of 227 stations having > 5 captures) to fit either activity pattern models or occupancy models.

All three metrics derived from machine learning labels were re-calculated using a threshold approach,

where labels were excluded if the model's predicted confidence was below a given threshold. The thresholds tested ranged from 0 (no threshold) to 90%, increasing in 10% intervals. For each of the three ecological metrics, we then re-calculated results using the machine learning labels and compared these with results from the expert labeled dataset using various statistical measures (see later). We also calculated the effect of removing data on sample size, top-one balanced accuracy and top-five accuracy for the overall model, and on four standard measures of model precision and accuracy (precision, recall, F1 score, and balanced accuracy for each species using the confusionMatrix function in the caret R package (Kuhn, 2020) .

Estimated species richness from machine learning generated labels and expert labels was compared using linear regression fitted by least squares. Species richness from expert labels was used as the predictor variable and species richness from machine learning labels was used as the response. For each threshold, we evaluated how well species richness from machine learning labels correlated with expert labels by calculating the slope coefficient and variance explained (R 2 ). Single season, single species occupancy models were fitted using the occu function from the unmarked R package (Fiske & Chandler, 2011 the Nearest Road (m) and mean distance to the Nearest Village (m) were included as continuous predictors without interactions. All covariates were mean-centered and scaled by 1 SD to prevent convergence issues. We did not perform model selection and predicted occupancy for the 227 camera stations using the full model. We then compared occupancy predictions (n = 227 camera stations) for no threshold (i.e. using all data), and the nine thresholds using linear regression fitted by least squares as described previously for the species richness comparison.

Regardless of the threshold used, top-five accuracy for the overall model predictions on the out-ofsample data were consistently close to or above 95% (Figure 2 ). To achieve a top-one balanced accuracy of 90% or more for the overall model, a threshold of ≥ 70% confidence was required and > 25% of the data were discarded (Figure 2) . With a threshold of 70% confidence (i.e. excluding labeled images below 70% confidence), top-one balanced accuracies for 16 of the 27 classes were > 90% and a further five were > 75% (Table 2) . Top-one balanced accuracies for the remaining seven classes ranged from 50% to 70% (Table 2 ). All other measures of accuracy and precision at all thresholds are in Table   S3 and Figure 3 shows the confusion matrix for the out-of-sample data after excluding labels below 70% confidence (see Figure S5 for the confusion matrix of aggregated labels after thresholding).

Relationship between threshold level to accept top label, % of data discarded and overall topfive and top-one balanced accuracy (+/-95% CI) for predictions on out-of-sample test data. insufficient training and validation data this is indicated as 'needs more data' (NMD). out of sample test data after excluding labels below a confidence threshold of 70% (each row is normalized independently). Figure S6 shows the confusion matrix with absolute numbers. 

Species richness estimated by machine learning labels and expert labels was strongly correlated at all thresholds used (Figure 4 ). There was a general tendency for species richness to be underestimated by machine learning as the threshold increased, and the slope of the relationship was close to 1 with no threshold. 

Above a threshold of 70% there was no significant difference between diel activity patterns estimated by machine learning labels and expert labels for all four focal species in the out-of-sample test data ( Figure 5 ; Table S4 ). 

As expected, occupancy estimates made using machine learning labels were sometimes inconsistent with those made using expert labels, and thresholding had a dramatic impact on inference in some cases ( Figure 6 ). For golden cat and leopard, which are predicted with high accuracy and precision by our machine learning model, occupancy estimates from machine learning labels and expert labels were highly correlated at all thresholds ( Figure S8 ). African elephant occupancy estimates using machine learning labels improved dramatically as the threshold increased, but chimpanzee occupancy estimates from machine learning labels were consistently uncorrelated with those estimated using expert labels ( Figure 6 ).

Relationship between estimated occupancy probability for n = 227 camera stations (points) from machine learning (ML) labels (y-axis) and expert labels (x-axis) for the four focal species after discarding labels below a 90% threshold of predicted confidence. Plots for all thresholds tested are shown in Figure S8 .

Machine learning models have the potential to fully automate labeling of camera trap images without the need for manual validation. This would allow ecologists to rapidly process data and use the outputs (e.g. species labels) directly in ecological analyses, but it has been uncertain how this can be achieved. In particular, models published to date do not evaluate their predictive performance in an ecological modeling context (Beery et al., 2018; Norouzzadeh et al., 2018; Tabak et al., 2019; Willi et al., 2019) .

Here, we compared ecological metrics calculated on an out-of-sample test dataset using machine learning labels with the same metrics calculated using expert, manually generated labels. Using our new, high performance species classification model that generalizes to out-of-sample data, we show machine learning labels can be used in a fully automated workflow that removes the need for manual validation prior to conducting ecological analyses.

We used an established architecture for the machine learning model. However, other more recent architectures could yield further increases in performance. The ResNeXt (Xie, Girshick, Dollar, Tu, & He, 2017) , the ResNeSt (Zhang et al., 2020) see Figure S4 ), and later dedicated models are trained to identify the individual species within these aggregated classes.

We used a relatively small training set (c.300,000 images here vs 3.2 million in (Norouzzadeh et al., 2018) and 8.7M used by (Ahumada et al., 2020) ) and a large number of individual classes, yet our model achieved high precision and accuracy even when tested on completely out-of-sample data, which is considered a significant challenge for the field (Beery et al., 2018 (Beery et al., , 2019 . We believe this encouraging result can be explained both by the machine learning approaches used (e.g. the fast.ai framework and image augmentation), and because forest camera traps in the tropics are often deployed in very similar settings, with animals captured at a predictable distance from the camera (usually on a path) with a general background of green and brown vegetation. This is in contrast to camera trap images from more open habitats, where animals are detected across a wide range of distances and backgrounds (Beery et al., 2018) . On the other hand, informational richness in the background of photos taken in forest settings poses a significant challenge to machine learning models as well as human experts, as illustrated in Figure 7 . of the body is hidden behind branches and leaves. A section of its characteristic red legs is visible between the leaves. The model used features from the beak and head region to identify the bird (see Figure S9 ).

Thresholding improved the overall performance of the model and its performance for individual species. In our tests we 'discarded' labels with low confidence but these data could equally be classified manually if sample sizes were small. It is important to note, however, that this additional effort to manually label low confidence images would not have improved inference in our example ecological analyses, with the exception of chimpanzee occupancy estimates. Chimpanzee images had the lowest measure of precision among the four focal species, which suggests that true detection events were probably missed frequently, resulting in false negatives ( Figure S2 ). Species that were classified with the highest precision and accuracy were either relatively unique in their shape, color and pattern (e.g. African leopard, the 'Genet' group) or were well represented in the training data. We recommend that users of our model in Central Africa use a threshold of 70% to accept labels and have created an offline, multi-platform software tool that can label large batches of images or videos, and display simple maps of species presence/absence and species richness (available at https://github.com/Appsilon/wildlife-explorer). The software also outputs the labels in a format that can be used for calculating activity patterns or for use in occupancy models. We do not fully automate these analyses at present (in part because of logistical constraints and delays caused by the COVID19 pandemic), but we anticipate these features will be integrated into future releases.

If machine learning models can fully automate labeling of camera trap images, the first question likely to be posed by most ecologists is 'Should we?'. Camera trap images contain a wealth of information beyond species identity that would be missed using our model such as behavior, demography, individual phenotype and body condition. A trained model is also limited to detecting and classifying the species in the training dataset, and by definition cannot detect new species. Some machine learning models can already classify behavior (Norouzzadeh et al., 2018) and other future models will achieve this and much more. In our opinion fully automated labels can and should be used in ecological analyses, but only after validation (and re-validation) from an ecological perspective, and to answer clearly defined questions. Each use-case will also differ in the benefits that can be gained from fully automated analysis. A conservation manager with tens of thousands of images collected on a rolling basis might accept a trade-off between increased speed of data analysis and having to discard images with uncertain labels, but a scientist testing hypotheses for peer-reviewed publication might prefer to view all of the images manually. We recommend that in all cases models should be validated on a continual basis using sub-sampled data to detect potentially new or hidden biases. Model accuracy could change if field protocols or environmental conditions change in unexpected ways (e.g. heavy snowfall in temperate zones). However, during model evaluation we found that expert labels in the training and validation data were also never themselves 'perfect', and perhaps high performance machine learning models offer a more consistent means of analyzing camera trap data than manual labeling because biases are predictable and can be quantified explicitly.

Camera traps are commonly used worldwide by conservation practitioners whose normal scope of work might not allow sufficient time for the handling, processing, and analyzing of large quantities of digital data. The authors personally know of several large camera trap databases that have not been analyzed years after data collection ended, often because of a lack of resources or technical expertise.

New web-based platforms for ecological data are seeking to address this problem by allowing users to upload data to the cloud where it is stored and analyzed using machine learning models (Aide et al., 2013; Ahumada et al., 2020) , but a lack of fast internet access can be a barrier to using such platforms and our offline application can fill this important gap. The next generation of camera traps will also have embedded machine learning models following the current rise in edge-computing technology. Together, edge and cloud computing will open the door to national and international real-time ecological forecasting at unprecedented spatial and temporal scales. We anticipate that the model, software and validation workflow presented here could revolutionize how camera trap data are processed and analyzed, and conclude that high performance machine learning models can be used for fully automated labeling of camera trap data for ecological analyses. Table S2 . Measures of precision, accuracy and prevalence (%s) for the 27 species/groups (see Table S1 for further details on species groups) in the out-of-sample test data. Table S3 . Precision, recall, F1 score and prevalence (%s) for the 27 species/groups (see Table S1 for further details on species groups) in the out-of-sample test data at all thresholds used (10 -90% confidence). Figure S7 shows the confusion matrix with absolute numbers. Figure S2 shows the confusion matrix with each row normalized independently. 596 597 598 ABOVE: Figure S8 . Relationship between estimated occupancy probability for n = 227 camera stations (points) from machine learning (ML) labels (y-axis) and expert labels (x-axis) for the four focal species at each threshold (row) from 0 to 90%, in 10% intervals. Figure S9 . The image from Figure 9 with an added layer illustrating the most important regions of the image for the model when identifying the nkulengu rail. The brightest spot (yellow) near the center of the image encompasses a part of the bird's beak and head, which apparently were crucial during identification. We used the Grad-CAM (1) technique to create this image.

The Role of Forest Elephants in Shaping Tropical Forest-Savanna Coexistence

Iterative near-term ecological forecasting: Needs, opportunities, and challenges

Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions

unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance

Camera-trapping version 3.0: current constraints and future priorities for development

Identity Mappings in Deep Residual Networks

Computer Vision -ECCV 2016

caret: Classification and Regression Training. R Package Version 6.0-86

Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

Camera trapping reveals trends in forest duiker populations in African National Parks

Quantifying levels of animal activity using camera trap data

Generalized Site Occupancy Models Allowing for False Positive and False Negative Errors

A disciplined approach to neural network hyper-parameters: Part 1 --learning rate, batch size, momentum, and weight decay

Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna

Machine learning to classify animal species in camera trap images: Applications in ecology

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ArXiv:1905.11946 [Cs, Stat

Zilong: A tool to identify empty images in camera-trap data

Identifying animal species in camera trap images using deep learning and citizen science

Aggregated Residual Transformations for Deep Neural Networks

ResNeSt: Split-Attention Networks

Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization