key: cord-0811751-w7rtm8c1
authors: Arumugam, S.; Ma, J.; Macar, U.; Han, G.; McAulay, K. K.; Ingram, D.; Ying, A.; Colburn, D. A. M.; Stanciu, R.; Grys, T. E.; Chang, S.-F.; Sia, S. K.
title: Adaptable Automated Interpretation of Rapid Diagnostic Tests Using Few-Shot Learning
date: 2021-06-25
journal: nan
DOI: 10.1101/2021.06.23.21258927
sha: 7f8396ecbdf88ecac72a1ee172a631b72a8c4e43
doc_id: 811751
cord_uid: w7rtm8c1

Point-of-care lateral flow assays (LFAs) are becomingly increasingly prevalent for diagnosing individual patient disease status and surveying population disease prevalence in a timely, scalable, and cost-effective manner, but a central challenge is to assure correct assay operation and results interpretation as the assays are manually performed in decentralized settings. A smartphone-based software can automate interpretation of an LFA kit, but such algorithms typically require a very large number of images of assays tested with validated specimens, which is challenging to collect for different assay kits, especially for those released during a pandemic. Here, we present an approach - AutoAdapt LFA - that uses few-shot learning, an approach used in other applications such as computer vision and robotics, for accurate and automated interpretation of LFA kits that requires a small number of validated images for training. The approach consists of three components: extraction of membrane and zone areas from an image of the LFA kit, a self-supervised encoder that employs a feature extractor trained with edge-filtered patterns, and few-shot adaptation that enables generalization to new kits using limited validated images. From a base model pre-trained on a commercial LFA kit, we demonstrated the ability of adapted models to interpret results from five new COVID-19 LFA kits (three detecting antigens for diagnosing active infection, and two detecting antibodies for diagnosing past infection). Specifically, using just 10 to 20 images of each new kit, we achieved accuracies of 99% to 100% for each kit. The server-hosted algorithm has an execution time of approximately 4 seconds, which can potentially enable quality assurance and linkage to care for users operating new LFAs in decentralized settings.

algorithms typically require a very large number of images of assays tested with validated 25 specimens, which is challenging to collect for different assay kits, especially for those released 26 during a pandemic. Here, we present an approach -AutoAdapt LFAthat uses few-shot 27 learning, an approach used in other applications such as computer vision and robotics, for 28 accurate and automated interpretation of LFA kits that requires a small number of validated 29 images for training. The approach consists of three components: extraction of membrane and 30 zone areas from an image of the LFA kit, a self-supervised encoder that employs a feature 31 extractor trained with edge-filtered patterns, and few-shot adaptation that enables generalization 32 to new kits using limited validated images. From a base model pre-trained on a commercial LFA 33 kit, we demonstrated the ability of adapted models to interpret results from five new COVID-19 34 LFA kits (three detecting antigens for diagnosing active infection, and two detecting antibodies 35 for diagnosing past infection). Specifically, using just 10 to 20 images of each new kit, we 36 achieved accuracies of 99% to 100% for each kit. The server-hosted algorithm has an execution 37 time of approximately 4 seconds, which can potentially enable quality assurance and linkage to 38 care for users operating new LFAs in decentralized settings. 39 40 signatures to discriminate positive cases from negative cases under diverse conditions (e.g., 109

color, intensity, and width of bands); the feature-extraction network also adapts to new LFA kits 110 with a small number of images. From latent feature vectors for each zone, a binary classifier 111 recognizes colored rectangular bands, the form factor seen in the vast majority of LFAs 32,33 , and 112 determines whether a band is present or absent in each zone. Finally, an assessment of the LFA 113 kit result is obtained by comparing the output of the binary classifier with a lookup table  114 containing all combinations of possible zone-level classification results; this kit-level 115 classification is sent to the user's smartphone as an interpreted LFA result. The server-hosted 116 algorithm has a mean execution time of 3.55 ± 2.28 seconds. Overall, this rapid automated 117 interpretation pipeline could fit within a larger digital platform 34-37 that collects demographic 118 data for epidemiology and provides instructions to carry out the test as well as follow-up linkage 119 to care; Supplementary Fig. 1 . 120

To develop this pipeline, we framed the objective as learning the optimal parameters of 122 the feature-extraction network and the classifier module by minimizing the loss functions given a 123 set of training images for an assay kit. Unlike methods that require de novo training on a new 124 LFA kit, we developed two novel methods to achieve adaptation requiring only a small number 125 of images of new kits. First, to ensure the underlying feature representation is robust against 126 variations in the LFA images, we developed a feature extractor that learns to extract robust latent 127 representation of zone images for classification; these latent representations are also used to 128 learned in a self-supervised manner (not requiring manually-assigned labels), by using the output 133 of an automatic edge detection algorithm (Sobel filter 38 ) as the ground truth for decoding. 134

Second, as shown in Fig. 2b Fig. 2) . This mask was generated by using Mask R-CNN 39 , an instance 145 segmentation model (more details in Methods). The kit membrane from the perspective corrected 146 image was then localized and individual test zones were cropped out using the kit-specific 147 dimensions listed in a JSON file. For this study, the test-specific dimensions, such as kit height, 148 kit width, membrane width, membrane height, and zone dimensions, were measured from images 149 of LFA kits using Adobe Photoshop v21.0.2 and saved as a JSON file. These dimensions could 150 be directly provided by the kit manufacturers in the future. To measure the accuracy of the 151 automatic membrane segmentation step, we measure the intersection over union (IoU) scores 152 between the segmented membrane and the manually annotated ground-truth membrane region. 153

IoU scores greater than 90% for all the assay kits ( The cropped test zones were fed into a feature extractor and the extracted features were 159 passed into a binary classifier and a decoder (Fig. 2a) . The binary classifier (a fully connected 160 layer) outputs '0' or '1' to denote the absence or presence of the band in the cropped zone, 161 respectively. The images from the base LFA kit were manually annotated with the binary labels, 162 and the classifier was trained to learn specific prototypes associated with the positive and 163 negative classes using cross-entropy (CE) loss. 164 165 Images of kits with faint bands can lead to false negatives while stained membranes and 166 lighting artifacts can lead to false positives (Fig. 1c) image reconstruction task to improve the generalizability of the feature extractor (Fig. 2a) . The 172 network was trained to detect the edges of the image pattern (pixels at the junction between the 173 membrane background and the band in the zone) and reconstruct the corresponding edge-174 enhanced image. This task is self-supervised: starting with RGB images of zones from a base 175 LFA kit, the model converted the image into grayscale and applied a Sobel filter 38 to generate the 176 ground truth image set (Sobel filter is a basic image processing algorithm that generates an 177 image emphasizing edges). In parallel, the model fed the extracted features into the decoder to 178 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The pre-trained model from a base LFA kit was adapted to a new LFA kit with minimal 188 retraining via few-shot adaptation (Fig. 2b) . We mixed the labeled data of the base LFA kit and 189 the new LFA kit and used this as the training set. We specifically used this mixture of data from 190 new kit and base kit to avoid overfitting to the small number of images of the new kit. In 191 addition to the CE loss used to train the binary classifier for the new LFA kit, we also used 192 supervised contrastive learning, between the cropped zone images of both the new kit and the 193 base kit, to refine the feature extractor. 194

We gathered the cropped zone images of both the base kit and the new kit, resampled the 196 data, and calculated the supervised contrastive (SupCT) loss 40 . First, we extracted features of the 197 base kit cropped zone images for both positive and negative classes and considered them as 198 anchors. Next, we extracted features from the cropped zone images of the new kit and compared 199 them with all of the anchors using cosine similarity. The feature extractor was then trained to 200 maximize the cosine similarity between features of the same class. For the implementation, we 201 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021.

kit-level classification on the base kit. A "zone-level" classification accuracy is the model's 226 performance on all the zones for the entire evaluation data set, and "kit-level" classification 227 accuracy is the model's performance in classifying all constituent zones of a single kit (e.g., a 228 kit-level result would be incorrect if any zone in that kit was classified incorrectly). Details 229 regarding the dataset for the five new kits are provided in Supplementary Table 2 . when applied directly and with the proposed adaptation method using 10-shots (20 zone images) 237 highlighting the significant performance improvement seen using our few-shot adaptation 238 strategy ( Table 2) . On top of the pretrained base model, adaptation can consistently improve the 239 performance by including only a few training images of the new LFA kits. The EcoTest housing 240 2 kit was identical in all aspects to the base kit expect for the housing, so the direct application of 241 the base model without any adaptation was able to achieve 100% zone-level and kit-level 242 accuracies. 243 244 In Fig. 3 , we plot the classification accuracy, at zone level and kit level, against the 245 number of zone images used during the adaptation process, ranging from 0 (direct testing) to 246 using the entire training dataset. These figures also serve as the ablation study evaluating the 247 separate contributions made by self-supervision in pretraining the feature extractor as well as the 248 supervised contrastive learning during adaptation. We compare our adaptation approach with 249 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint three alternative approaches: 1) the proposed approach without the self-supervision component 250 in the pre-training stage, 2) the proposed approach without supervised contrastive loss during 251 adaptation, and 3) training the network for a new kit from scratch without the two components. 252

The second approach can be considered as a finetuning process that uses the pre-trained base 253 model and finetunes it with the standard CE loss. For all approaches, the base kit and new kit 254 images were mixed for network training, and the same data sampling strategy was used to ensure For example, we were able to adapt the base model to the Flowflex kit ( Fig. 3a) using only eight 265 zone images per class (16 zone images) and reach the same performance (99.8% and 99.6% for 266 the zone and kit levels respectively) as a model trained from scratch using all available training 267 data (200 zone images). The results confirm that both self-supervised pretraining and supervised 268 contrastive loss help, and the combination of these two key ideas helps reach the highest 269 attainable performance. Between these two novel ideas, supervised contrastive learning is more 270 effective: it requires fewer training images during adaptation in order to reach the performance 271 upper bound that is achieved by using the entire training dataset. 272 273 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint

In addition, as the feature extractor is pretrained under self-supervision, the extracted 274 features are sensitive to the edges and can work well even when zones with faint bands are 275 encountered. Even though the ACON IgG/IgM kit had the highest frequency of faint bands in 276 our dataset, our approach was able to reach the same performance as using entire training dataset 277 ( Fig. 3d) using only nine images of each class (18 zone images). Adaptation without supervised 278 contrastive learning can also reach the same performance using 40-shot adaptation. For the 279 model trained without self-supervised pretraining 70 images (with SupCT loss) and 100 images 280 (without SupCT loss) of each class were required to reach the best performance. In addition, 281 direct testing performance (0-shot adaptation) of the model pretrained on the base kit was higher 282 when trained using self-supervision than when trained using only the CE loss. 283 284 Table 3 shows the confusion matrices of the performance of the optimum shot adaptation 285 when evaluated on the evaluation dataset. By starting with a base model pretrained on an 286 existing LFA kit (AssureTech EcoTest COVID-19 IgG/IgM antibody assay kit), we have shown 287 that it is possible to adapt the existing model to different assay kits, which have different 288 numbers of test lines and form factors, using a small fraction of the images needed to train the 289 base model with no loss in accuracy. In addition to evaluating the confusion matrix among on 290 samples in the evaluation set, we devised an ambiguity region to evaluate the distribution of 291 detection scores (probability of positive class). The ambiguity region is bounded by the detection 292 score thresholds such that an image will be correctly classified only if the probability of the 293 ground truth class is high. The thresholds can be either manually set or statistically estimated 294 with 95% area under the curve (more details in Methods). We checked the detection scores of all 295 the images in the evaluation dataset against the ambiguity regions and those images with scores 296 falling in ambiguity region were not classified. We computed the percentage of images that were 297 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint categorized as ambiguous as well as the accuracy over the images that were classified. Since the 298 detection score for the false predictions were close to 0.5, they fell into the ambiguity region. 299 Therefore, by using this concept of the ambiguity region we were able to treat most of the failure 300 cases as ambiguous while keeping the number of true predictions that fell into the ambiguity 301 region to a minimum. This further increased the classification accuracy among the classified 302 samples consistently over four new target kits ( Table 3) . factor. We showed that this adaptation can be carried out using a much smaller subset of images 309 than what was used for training the base model. Compared to de novo training on every new 310 assay kit, this reduction in the number of images was achieved by adopting a modular approach 311 to the machine-learning pipeline: starting from an image of the kit, the perspective-corrected 312 membrane and individual zones were extracted followed by the extraction of the features 313 preserving edge information, and finally a binary output which indicated whether a band was 314 present in the cropped zone. A robust feature extractor is important for handling challenging 315 images in LFA kits like those with faint or partially formed lines. Our approach of using self-316 supervision to extract features preserving edge information addressed this issue, and it is 317 believed that this use of self-supervised learning to reconstruct edge-enhanced images has not 318 been previously demonstrated. To our knowledge, the application few-shot learning, including 319 this adaptation framework, has not been demonstrated for interpretation of LFA kit images. 320 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint Thus, we have shown that using this novel approach, we can train accurate classification models 321 using a fraction of kit images that would be required in de novo training. 322

In terms of impact for medicine, this reduction in new training images to achieve assured 324 user interpretation of rapid test images is significant with the rise of use of rapid diagnostic tests. 325

Most immediately, the COVID-19 pandemic has thrusted front and forward the need for rapid 326 testing and population surveillance to track and control the spread of the disease in a scalable and 327 timely manner. If effectively implemented, point-of-care testing can contribute significantly to a 328 rapid and effective public health response -as well as patients' individual safety, privacy, 329 physical health and mental well-beingby enabling widespread timely testing in a manner that 330 does not overwhelm the limited capacity of testing facilities or provoke social crowding at 331 selected testing sites. By expediting the process of training a model to newly available rapid 332 diagnostic tests, the AutoAdapt LFA approach could facilitate reliable decentralized testing and 333 real-time monitoring of disease prevalence. In the longer term, the need to achieve assured user 334 interpretation will rise as patients and consumers will more frequently monitor their health via 335 self-testing for both infectious diseases and chronic conditions, in an age of precision health. 336

Future work includes validation on a wider variety of rapid tests, and generalization to LFA kits 337 beyond rectangular bands (for example, as in some vertical flow assays) and bands of single 338 colors (for example, some urinalysis kits with color-based readouts). 339 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. image reconstruction tasks were carried out to learn a good feature extractor. Thus, as shown in 408 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 25, 2021. ImageNet1K as a fully-supervised image classification task among 1,000 classes. As shown in 449 CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint For a negative class, is still used as input to find the classification score threshold . 499

For the convenience of presentation, [ , ] is used to denote the ambiguity region where 500 images with ≤ ≤ will not be classified since they fall within the region, and the 501 images with ≤ or ≥ are classified as negative or positive respectively. The 502 ratio of the unclassified images with respect to the entire evaluation set is reported as the 503 percentage of ambiguous cases (as shown in Table 3) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021. pretrained on the base kit using self-supervised learning task over edge-filtered patterns and 755 fully-supervised binary classification task. For each zone, fully-supervised binary classification 756 is carried out with cross-entropy loss with the annotated binary labels. Sobel filter is used to 757 highlight the edge pixels between the band and the background of the membrane. The edge 758 image after normalization is used as ground truth and the learning process is used to reconstruct 759 an image that resembles the ground truth edge image, with the quality measured in MSE (Mean 760

Square Error). The solid and dashed arrows indicate forward processing and gradient 761 backpropagation respectively during the learning process. (b) Model adaptation is carried out by 762 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint supervised contrastive learning to regularize the feature extractor and fully-supervised learning 763 to learn an adapted classifier for the new kit. A sampling strategy to build an episode with Q 764 (e.g., 32) images per class is used: for each class (positive or negative), given K (e.g., 10) images 765 available, P (e.g., 4) images are subsampled from the new kit and mixed with Q-P images of the 766 base kit. 767 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint 768 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 25, 2021. ; https://doi.org/10.1101/2021.06.23.21258927 doi: medRxiv preprint 

Applications of 588 digital technology in COVID-19 pandemic planning and response

On the accuracy of the Sobel edge detector

Auto-encoding variational bayes

2017 IEEE symposium series on computational 598 intelligence (SSCI)

Retrospective clinical evaluation of 4 lateral flow assays for the 600 detection of SARS-CoV-2 IgG

Detection with Region Proposal Networks

Proceedings of the IEEE international conference on computer vision

Image segmentation using deep learning: A survey

IEEE conference on computer vision and pattern recognition

The inverse Gaussian distribution 612 and its statistical application-a review

The inverse Gaussian distribution: theory: methodology, and applications

We thank Ken Mayer for his involvement in the coordination of the study. Assay kits 619 were kindly provided to Mayo Clinic for evaluations from Acon

EcoTest (EcoTest Housing 2), and BTNX (EcoTest Housing 1). The work was 621 supported by Herbert Irving Comprehensive Cancer Center in partnership with the Irving 622

Institute for Clinical and Translational Research via a SARS-CoV-2 Research Pilot Grant

Science via a Technology Innovations 624 for Urban Living in the Face of COVID-19 Pilot Grant, and a gift from Bing Zhao

632 supervised the project. U.M. developed the object detection module