key: cord-0775760-6vu1ohx9 authors: Heino, H.; Rieppo, L.; Männistö, T.; Sillanpää, M.; Mäntynen, V.; Saarakkala, S. title: Diagnostic performance of attenuated total reflection Fourier-transform infrared spectroscopy for detecting COVID-19 from routine nasopharyngeal swab samples date: 2021-11-29 journal: nan DOI: 10.1101/2021.11.29.21266906 sha: aad8dcf76343bfb0a48d1bbf6ce32cd064c0f463 doc_id: 775760 cord_uid: 6vu1ohx9 Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing global COVID-19 pandemic since 2019 has led to increasing amount of research to study how to do fast screening and diagnosis to efficiently detect COVID-19 positive cases, and how to prevent spreading of the virus. Our research objective was to study whether SARS-CoV-2 could be detected from routine nasopharyngeal swab samples by using attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy coupled with partial least squares discriminant analysis (PLS-DA). The advantage of ATR-FTIR is that measurements can be conducted without any sample preparation and no reagents are needed. Our study included 558 positive and 558 negative samples collected from Northern Finland. Overall, we found moderate diagnostic performance for ATR-FTIR when polymerase chain reaction (PCR) was used as the gold standard: the average area under the receiver operating characteristics curve (AUROC) was 0.67-0.68 (min. 0.65, max. 0.69) with 20, 10 and 5 k-fold cross validations. Mean accuracy, sensitivity and specificity was 0.62-0.63 (min. 0.60, max. 0.65), 0.61 (min. 0.58, max. 0.65) and 0.64 (min. 0.59, max. 0.67) with 20, 10 and 5 k-fold cross validations. As a conclusion, our study with relatively large sample set clearly indicate that measured ATR-FTIR spectrum contains specific information for SARS-CoV-2 infection (P<0.001 in label permutation test). However, the diagnostic performance of ATR-FTIR remained only moderate, potentially due to low concentration of viral particles in the transport medium. Further studies are needed before ATR-FTIR can be recommended for fast screening of SARS-CoV-2 from routine nasopharyngeal swab samples. max. 0.67) with 20, 10 and 5 k-fold cross validations. As a conclusion, our study with relatively large 29 sample set clearly indicate that measured ATR-FTIR spectrum contains specific information for SARS-30 CoV-2 infection (P<0.001 in label permutation test). However, the diagnostic performance of ATR-31 FTIR remained only moderate, potentially due to low concentration of viral particles in the transport 32 medium. Further studies are needed before ATR-FTIR can be recommended for fast screening of 33 SARS-CoV-2 from routine nasopharyngeal swab samples. 34 35 Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy coupled with machine 37 learning-based analysis was applied to detect severe acute respiratory syndrome coronavirus 2 (SARS-38 CoV-2) from nasopharyngeal swab samples originally collected and processed for polymerase chain 39 reaction (PCR) analysis. Even though our results showed moderate performance, we think that our 40 carefully designed and conducted work is valuable in the field of SARS-CoV-2 diagnostics as there 41 were as many as 1116 nasopharyngeal swab samples (558 negative and 558 positive) collected from 42 individual patients in a real clinical setting. The Real clinical setting refers to the fact that the 43 nasopharyngeal swab samples were collected from people with symptoms typical for COVID-19 or 1 asymptomatic individuals exposed to SARS-CoV-2. The presented technique could be relatively easy 2 to use for point-of-care testing, as ATR-FTIR can be performed with a portable machine without sample 3 preparation and machine learning-based model could give a result immediately after ATR-FTIR 4 measurement. 5 6 Introduction 7 The Global COVID-19 pandemic has raised a desperate need for an accurate, fast and cheap test to 8 efficiently detect infected people to prevent the spreading of coronavirus (1,2). Polymerase chain 9 reaction (PCR) method is the gold standard for detecting severe acute respiratory syndrome coronavirus 10 2 (SARS-CoV-2) from respiratory secretions (3) . However, PCR is not the best modality for quick 11 screening, as it requires certain sample preparation and transportation to centralized laboratories. 12 Therefore, new cost-effective tests for SARS-CoV-2 are being developed. Furthermore, in the future, 13 the possibility to adjust cost-effective test to the novel viruses could provide a way to avoid new 14 infectious diseases developing to a pandemic state. 15 According to the scientific literature, attenuated total reflection Fourier-transform infrared (ATR-FTIR) 16 spectroscopy is a potentially suitable method for the fast detection of SARS-CoV-2 infection (4-6). In 17 ATR-FTIR measurement, infrared (IR) light is guided to a sample to measure how the sample molecules 18 interact with the IR light. Collected data shows molecular bond vibrations related to the sample 19 chemical composition, i.e., revealing the chemical fingerprint for the studied sample. Besides fast 20 measurement time in ATR-FTIR method (a few minutes), several ATR-FTIR equipments are already 21 portable, which could allow analysis of the biological samples directly in the public places like border 22 control stations, shopping centres and airports. 23 ATR-FTIR has been used earlier in diverse studies to detect different conditions from human liquid 24 biopsies, such as breast cancer from saliva (7), dengue fever from blood and serum (8) and hepatitis B 25 and C from sera (9). During the last year, ATR-FTIR method has already been applied to detect SARS-26 CoV-2 from blood (both in serum and plasma (10,11) ) and saliva samples (12, 13) . Even though results 27 from these studies are extremely promising, the number of investigated samples is typically relatively 28 small. For example, in the study by Barauna et al. excellent results were obtained for detecting SARS-29 CoV-2 infection directly from the pharyngeal swab samples (accuracy, sensitivity, and specificity of 30 90%, 95%, and 89%, respectively) within a total of 181 samples (12). Furthermore, in the study by 31 Nogueira et al., authors had 65 nasopharyngeal swab samples in viral transport medium 1 (sensitivity 32 84%, specificity 66% and accuracy 76.9%) and 178 nasopharyngeal swab samples in viral transport 33 medium 2 (sensitivity 87%, specificity 64% and accuracy 78.4%) (14). Despite these promising 34 preliminary results, more research is needed, especially with larger sample size, in order to judge the 35 true diagnostic performance of the ATR-FTIR method in realistic clinical scenario. 36 Our aim was to investigate the diagnostic performance of the ATR-FTIR spectroscopy to detect the 37 SARS-CoV-2 infection from the same routine nasopharyngeal swab samples that are used in PCR. Compared to earlier studies, our analyzed sample set contained 558 positive and 558 negative samples, 39 making it the largest ATR-FTIR study so far to detect the SARS-CoV-2 from the very same samples 40 that were originally collected and processed for the PCR. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Materials and methods 1 The studied nasopharyngeal swab samples were originally collected by Finnish public healthcare in the 3 Northern Ostrobothnia region, Northern Finland, for conducting PCR tests to detect patients with 4 SARS-CoV-2 infection. Residues from these nasopharyngeal swab samples were stored in a freezer (-5 20 degrees) and thawed prior to the ATR-FTIR measurements. A total of 558 negative and 558 positive 6 samples were included in this study from January 1st 2020 to November 24th 2020. The ethical 7 permissions were obtained both locally from the Ethical Committee of North Ostrobothnia's Hospital 8 District as well as nationally from the Finnish Medicines Agency (www.fimea.fi). 9 The PCR analysis, used as the gold standard for the ATR-FTIR measurements, was conducted by the 10 Northern Finland Laboratory Centre NordLab (www.nordlab.fi), Oulu, Finland. The nasopharyngeal 11 swab samples were collected with a flocked swab and immediately immersed in viral transport medium. 12 During the sample collection period patients were instructed to be tested for infection with SARS-CoV-13 2, if they had any symptoms indicative of COVID-19. Asymptomatic individuals were tested in Finland 14 upon discretion of physicians working in disease control. The samples were then transported in room 15 temperature to the laboratory performing the analyses. The laboratory had no information on patients' 16 symptoms. The PCR methods were performed according to instructions. Before the PCR measurements, 17 the swab samples were dissolved into the viral transport medium, and the residues from these samples 18 were stored in the freezer for ATR-FTIR measurements. Before ATR-FTIR all the swab samples were 19 inactivated with viral lysis buffer liquid. Bruker Alpha II FTIR spectrometer (Bruker Optics GmbH, 20 Ettlingen, Germany) equipped with an ATR module (Platinum ATR, Bruker Optics GmbH, Ettlingen, 21 Germany) was used to perform ATR-FTIR measurement one-by-one for the every sample after thawing. 22 The spectral resolution was set to 2 cm -1 , the number of scans to 64, and the collected spectral range 23 was 4000 to 400 cm -1 . A droplet (volume: 1 µl) of the sample solution was pipetted onto the ATR crystal 24 (Figure 1) . A repeat measurement was started immediately after the sample droplet was placed onto the 25 ATR crystal to collect a total of 10 successive spectra. The spectrum from the last measurement was 26 selected into the final data analysis to ensure that the water has evaporated. This measurement procedure 27 was repeated three times for each sample to minimize differences due to pipetting process. The ATR-28 crystal was cleaned with ethanol and Virkon disinfectant after every measurement procedure when the 29 measurement was completed to prevent sample contamination. 30 31 Figure 1 . A routine nasopharyngeal swab sample was dissolved into the viral transport medium, analyzed with PCR, and the 32 remnant from that sample was stored in the freezer. The frozen swab sample was thawed and inactivated by adding a viral 33 lysis buffer liquid before the ATR-FTIR measurements. The measurement was conducted by pipetting a one drop from sample 34 solution containing the nasopharyngeal swab sample, the viral lysis buffer and the viral transport medium onto the ATR crystal. Pipetting and measurement was repeated three times to obtain three spectra from each sample. Those three measured spectra 36 were finally averaged so that there was one representative spectrum from each nasopharyngeal swab sample. Before ATR- FTIR measurement, the sample was vortexed to have as homogeneous material distribution as possible, and spun with is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101/2021.11.29.21266906 doi: medRxiv preprint The ATR-FTIR spectra were truncated to the fingerprint region of 1800-900 cm -1 , vector normalized, 2 and the three spectra of each sample were averaged to create one average spectrum per sample. In 3 addition, the spectral region of 1490-1180 cm -1 was also analyzed in the same way as fingerprint 4 spectrum region, as preliminary testing indicated similar performance in classification, as with 5 fingerprint spectral region. Preprocessing steps were performed with Anaconda3 (Conda 4.8.3 package 6 manager and Python version 3.8.3) using NumPy, scikit-learn and SciPy packages. 7 8 Partial least squares discriminant analysis (PLS-DA) is a popular method for classification for 10 multivariate datasets. PLS-DA performs dimensionality reduction by creating latent variables, i.e. PLS 11 components, by maximizing the covariance between the new predictor and response scores (15-17). 12 According to authors' own experience and literature (9,10) PLS-DA is particularly suitable for 13 classification of FTIR spectra, and it was therefore selected as the primary data analysis approach in 14 this study. 15 Anaconda3 (Conda 4.8.3 package manager and Python version 3.8.3) and PLS Regression package of 16 the scikit-learn library was used to create the PLS-DA model. The performance of the model was studied 17 by using cross-validation with k-fold values 5, 10 and 20. Every k-fold cross-validation was repeated 18 100 times with different random seed initializations to make sure that the results are not affected by the 19 random seed choice used to divide data into training and validation sets. The presented results have 20 been calculated as average, minimum, and maximum values of the repeated k-fold cross validations. 21 Receiver operating characteristic (ROC) curve, area under the receiver operating characteristics 22 (AUROC) curve, accuracy, sensitivity, specificity, precision, and confusion matrix were used to assess 23 the model performance (18, 19) . 24 We repeated the same predictive analysis 1000 times by randomly permuting sample labels in each k-25 fold case. This permutation analysis was performed to obtain an empirical null distribution of the 26 AUROC values, where we can then see how the extremal position (quantile), the corresponding 27 AUROC value, obtained with the real data, can find in this distribution. This was done to show that the 28 PLS-DA model have the real skill to differentiate between the positive and negative sample groups, and 29 thus it is not possible to obtain a similar magnitude of AUROC values by chance by randomly dividing 30 samples into the negative and positive classes. P-values were calculated by using test 1 according to 31 Ojala and Garriga (20) . For permutation tests, see also Efron and Hastie (21) . In practice, one random 32 seed used to divide data into the training and validation sets was chosen, and the 1000 permutations 33 were performed to achieve the null distribution of AUROC-values for each k-fold training in question, 34 meaning k-fold 20, 10 or 5. (21) 35 36 The dataset containing ATR-FTIR spectra from the 558 positive and 558 negative nasopharyngeal swab 38 samples, dissolved into the viral transport medium, and inactivated by adding the viral lysis buffer 39 liquid, was analyzed with the PLS-DA method, with PCR analysis as the gold standard method. In the 40 spectral data, the averaged spectra (the mean spectrum from three repetitive measurements per one is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101/2021.11.29.21266906 doi: medRxiv preprint 0.69) were achieved from the PLS-DA k-fold cross-validation trainings repeated 100 times for k-folds 1 20, 10 and 5, respectively (Table 1) . Obtained P-value for each k-fold showed significant difference 2 between the mean AUROC value achieved in our study, and the mean AUROC value obtained from 3 the same study repeated with the randomly permuted sample labels (for each P-value, P<0.001, see 4 Table 2 from supplementary material). Averaged ROC curves were also calculated (figure 4). 5 When the mean accuracies, the mean specificities, the mean sensitivities, and the mean precisions were 6 calculated, a global threshold of 0.5 was set to divide the PLS-DA analyzed samples into the positive 7 and negative classes, also resulting the mean confusion matrixes shown in Figure 4 . The mean 8 accuracies of 0.63, 0.63, 0.62, and the mean specificities of 0.64, 0.64 and 0.64 for k-folds 20, 10 and 9 5, respectively were achieved (Table 1 ). The mean sensitivities of 0.61, 0.61, 0.61, and the mean 10 precisions of 0.63, 0.63 and 0.63 were obtained for k-folds 20, 10 and 5, respectively (Table 1) . 11 The spectral region of 1490-1180 cm -1 ( Figure 5 ) from fingerprint region yielded results close to, or 12 almost the same compared to the fingerprint region when the 13 latent variables were set for the PLS- respectively ( Table 3 in supplementary material). Obtained P-value for each k-fold was showing 16 significant difference between the mean AUROC value from the study of the spectral region of 1490-17 1180 cm -1 , and from the same study repeated with randomly permuted sample labels (for each P-value, 18 P<0.001, see Table 4 from supplementary material). 19 20 Figure 2 . Mean spectra of positive and negative groups from the fingerprint region (1800-900 cm -1 ). Vector normalization 21 was applied before calculating the mean spectra. Difference spectrum multiplied by factor of 20 is also presented to illustrate 22 differences between the sample groups in fingerprint region. Only minor visual differences can be seen in the spectra between 23 the sample groups. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101/2021.11.29.21266906 doi: medRxiv preprint 1 Figure 3 . Mean AUROC and mean accuracy values against the latent variables when fingerprint region (1800-900 cm -1 ) and 2 region of 1490-1180 cm -1 were studied. As seen from the figure, 32 latent variables yielded the best performance when 3 fingerprint region was studied, and k-fold was 20 or 10, but for k-fold 5 the 31 latent variables were the best possible choice. In case of region of 1490-1180 cm -1 , the 13 latent variables showed the best performance for k-folds 20, 10 and 5. 10 measured with ATR-FTIR spectroscopy. Every sample was measured three times. The measured samples were preprocessed 11 before PLS-DA, by applying spectrum truncation to the spectrum region of 1800-900 cm -1 , vector normalization and spectrum 12 averaging to create a one average spectrum for every measured nasopharyngeal swab sample. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101/2021.11.29.21266906 doi: medRxiv preprint 1 Figure 4 Averaged ROC curves and confusion matrices from PLS-DA k-fold cross-validation predictive ability repeated 100 2 times with k-folds 20, 10 and 5. Training was accomplished with ATR-FTIR data from the spectral region of 1800-900 cm -1 . 3 4 5 Figure 5 . Mean spectra of positive and negative groups from the fingerprint region (1800-900 cm -1 ) and from the spectral 6 region of 1490-1180 cm -1 . Vector normalization was applied before calculating the mean spectra. Overall, only minor spectral 7 changes were visually observed between the sample groups in the spectral region of 1490-1180 cm -1 , although the diagnostic 8 performance in the cross-validation setup was almost the same than with the whole fingerprint region (see Figure 2 , Table 1, 9 and Table 3 for the spectral region of 1490-1180 cm -1 from supplementary material). The aim of our study was to investigate whether the diagnostic performance of the ATR-FTIR 13 spectroscopy coupled with PLS-DA is adequate to detect SARS-CoV-2 infection from the 14 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 nasopharyngeal swab samples originally collected and processed for PCR analysis. The AUROC values 1 obtained in our study showed moderate performance (the AUROC values were between 0.66-0.68) 2 when the fingerprint region, and the spectral region of 1490-1180 cm -1 were studied. Analyzed dataset 3 contained ATR-FTIR spectra from the 558 negative and 558 positive samples from the separate patients 4 within a real clinical setting in the city of Oulu, Finland. 5 When analyzing results from our study, it seems that the spectral region of 1490-1180 cm -1 is containing 6 enough information to classify ATR-FTIR spectra, even though the whole fingerprint region (1800-900 7 cm -1 ) is often used in earlier studies. On the other hand, it was not possible to find out separate 8 wavenumbers that could be used to classify spectra. Therefore, at least in this dataset, the meaningful 9 information is spread out over certain spectral region rather than just focused on the separate 10 wavenumbers. Moreover, it is notable that the spectral region of 1490-1180 cm -1 was classified with 11 the 13 latent variables, whereas the 31-32 latent variables were needed in case of the whole fingerprint 12 region (1800-900 cm -1 ). Less latent variables mean simpler model that usually refers to better 13 generalization ability. Consequently, in that sense the spectral region of 1490-1180 cm -1 could be more 14 suitable choice for spectral analysis. 15 We observed that only simple spectral preprocessing was needed to prepare the mean spectra for 16 classification, i.e., spectrum truncation to the fingerprint region or the region of 1490-1180 cm -1 , vector 17 normalization, and spectrum averaging. In practice, this means that our workflow to classify spectra is 18 relatively easy to set up, and the retraining of PLS-DA model in case of more training data available 19 could be also easily accomplished. 20 Our results with ATR-FTIP spectroscopy showed clearly weaker performance than the results of 21 previous studies with similar methods focusing on SARS-Cov-2 detection. Zhang et al. (10) studied 22 serum with ATR-FTIR spectroscopy using a total of 115 samples (41 of them were confirmed to be 23 COVID-19 positive and others were from healthy donors and patients with other infections or 24 inflammatory diseases). They analyzed the measured dataset with the PLS-DA method and reported the 25 AUROC value as high as 0.9561. Furthermore, coronavirus detection from the saliva (pharyngeal 26 swabs) was conducted by collecting a total of 111 negative and 70 positive samples by combining 27 genetic algorithm with linear discriminant analysis (GA-LDA). Results from that saliva study showed 28 90% accuracy, 95% sensitivity and 89% specificity (12). Other saliva-based study was performed by 29 asking donors to dribble into a container with added viral transport medium and by analyzing data with 30 the PLS-DA method. From the collected samples, 29 were confirmed as positive and 28 as negative (in 31 overall there were 171 transflection infrared spectra) yielding the sensitivity of 93%, and the specificity 32 of 82% (13) . In a very recent study, ATR-FTIR spectroscopy coupled with partial least squares (PLS) 33 and cosine k-nearest neighbours (KNN) analysis was applied to detect SARS-Cov-2 from 34 nasopharyngeal swab (14). There were samples from 243 patients (in total of 714 ATR-FTIR spectra), 35 as where 40 + 111 patients were confirmed as COVID-19 positive. Samples were inserted into the viral 36 transport medium 1 (the liquid 1) or the viral transport medium 2 (the liquid 2). For the liquid 1, the 37 sensitivity was reported to be 84%, the specificity 66% and the accuracy 76.9%. In the case of liquid 2, 38 the sensitivity was 87%, the specificity 64% and the accuracy 78.4%. 39 There can be various reasons for the overall weaker performance of ATR-FTIR spectroscopy observed 40 in our study. First and foremost, as the nasopharyngeal swab sample was dissolved into a relatively 41 large amount of solvent, the low concentration of viral particles in the viral transport medium (and the 42 viral lysis buffer) may explain the lower diagnostic performance. In addition, it should be taken account 43 that the swab samples were collected also from the people without symptoms but been exposed to 44 COVID-19. It is possible that the performance would be different if the swab samples had been collected 45 from the hospitalized patients of whom swab samples typically contains higher viral loads and the 46 control samples would have been collected from the healthy volunteers. 47 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 As a second reason, even though the sample was vortexed before the ATR-FTIR measurements, the 1 sample solution may still have distributed unevenly onto the ATR crystal. This assumption about the 2 uneven viral material distribution may be reinforced by the observation from preliminary test that the 3 averaged spectra (from three repetitive measurements) provided better diagnostic performance when 4 compared to training and validation with each separate spectrum. Consequently, it is possible that the 5 used viral transport medium (and the viral lysis buffer) affects samples causing them to be unevenly 6 distributed, which induces unwanted variation in the ATR-FTIR measurements. In optimal situation, 7 liquid sample drops should be as homogenous as possible to obtain the most accurate results. 8 It may also well be that the physical sensitivity of ATR-FTIR spectroscopy with these low concentration 9 levels is inadequate to reach the high levels of diagnostic performance. This speculation is also 10 supported by the recent study where excellent diagnostic performance was reported (the accuracy of 11 90%) when ATR-FTIR spectroscopic measurements were conducted directly from the pharyngeal swab 12 samples, i.e., without dissolving the sample to the viral transport media (12). If that is truly the case, 13 ATR-FTIR spectroscopy could not be recommended as the SARS-CoV-2 screening tool from the nasal 14 swab samples dissolved into the viral transport medium. Instead, the measurement should be conducted 15 directly from the swab stick, or at least without any added chemicals. 16 Finally, the sample freezing can also affect the performance of ATR-FTIR spectroscopy. It is possible 17 that the freeze-thaw cycle is deleterious for the ultrastructure of viral particles, which would inevitably 18 also affect the sensitivity of ATR-FTIR spectroscopic measurements. Obviously, it would have been 19 the best to conduct all the ATR-FTIR spectroscopy measurements on the same day as the PCR analysis 20 without freezing the samples in between. Unfortunately, it was not logistically possible in this study. 21 In this study, we used a constant global threshold (0.5) in PLS-DA model to separate between the 22 negative and positive classes. Obviously, if selecting different threshold, the mean accuracies, the mean 23 specificities, the mean sensitivities, and the mean precisions would be different. If portable and fast 24 SARS-CoV-2 detecting test would be developed in the future, the practical choice could be to maximize 25 the specificity, i.e., the ability to detect persons without the disease as effectively as possible. The results from our study with relatively large sample set indicate that the ATR-FTIR spectroscopy 45 coupled with PLS-DA has a potential to detect SARS-CoV-2 infection from the nasopharyngeal swab 46 samples. However, the diagnostic performance of ATR-FTIR spectroscopy remained only moderate, 47 . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10.1101/2021.11.29.21266906 doi: medRxiv preprint potentially due to low concentration of viral particles in the transport medium (and the viral lysis buffer). 1 Further studies are needed before ATR-FTIR spectroscopy can be recommended for routine fast 2 screening of SARS-CoV-2 from routine nasopharyngeal swab samples. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10. 1101 /2021 Supplementary material 1 Table 2 . P-values of randomly permuted sample labels for presented mean AUROC values from the fingerprint region. Sample 2 labels were permuted 1000 times when calculating P-values for k-folds 20, 10 and 5. P-values were calculated by using test 1 3 according to Ojala and Garriga (20) . P-value 20 0.000999001 10 0.000999001 5 0.000999001 5 Table 3 . Averaged results from the PLS-DA model k-fold cross-validation predictive ability repeated 100 times with k-fold 6 values 20, 10 and 5. The number of latent variables were set to 13, as it was yielding the best performance in predictive ability. There were 558 negative and 558 positive nasopharyngeal swab samples (dissolved into the viral transport medium and 8 inactivated by adding viral lysis buffer liquid) measured with ATR-FTIR spectroscopy. Every sample was measured three 9 times. The measured samples were preprocessed before PLS-DA, by applying spectrum truncation to the spectrum region of 10 1490-1180 cm -1 , vector normalization and spectrum averaging to create a one average spectrum for every measured 11 nasopharyngeal swab sample. Table 4 . P-values of randomly permuted sample labels for presented mean AUROC values from the spectral region of 1490-14 1180 cm -1 . Sample labels were permuted 1000 times when calculating P-values in case of k-folds 20, 10 and 5. P-values were 15 calculated by using test 1 according to Ojala and Garriga (20) . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted November 29, 2021. ; https://doi.org/10. 1101 /2021 Acknowledgements 1 Financial support for this research project was received from European Regional Development Fund 2 (ERDF). 3 Coronavirus disease 6 2019 (COVID-19): A literature review Characteristics of SARS-CoV-2 and COVID-19 Detection of 12 2019 novel coronavirus (2019-nCoV) by real-time RT ATR-FTIR spectroscopy for virus identification: A 16 powerful alternative Using Fourier 20 transform IR spectroscopy to analyze biological materials ATR-FTIR spectroscopy: Its advantages and limitations Attenuated Total Reflection-Fourier Transform Infrared (ATR-FTIR) Spectroscopy Analysis 28 of Saliva for Breast Cancer Diagnosis ATR-FTIR spectroscopy coupled 31 with multivariate analysis techniques for the identification of DENV-3 in different 32 concentrations in blood and serum: a new approach Spectroscopy goes viral: Diagnosis 35 of hepatitis B and C virus infection from human sera using ATR-FTIR spectroscopy Fast Screening and Primary 39 Diagnosis of COVID-19 by ATR-FT-IR Rapid 42 Classification of COVID-19 Severity by ATR-FTIR Spectroscopy of Plasma Samples Ultrarapid 1 On-Site Detection of SARS-CoV-2 Infection Using Simple ATR-FTIR Spectroscopy and an 2 Analysis Algorithm: High Sensitivity and Specificity Infrared Based Saliva Screening Test for COVID-19. Angew Chemie Int Ed Rapid 9 diagnosis of COVID-19 using FT-IR ATR spectroscopy and machine learning So you think you can PLS-13 DA? A tutorial review: 16 Metabolomics and partial least squares-discriminant analysis -a marriage of convenience or a 17 shotgun wedding Partial least squares-discriminant analysis (PLS-DA) for 20 classification of high-dimensional (HD) data: a review of contemporary practice strategies and 21 knowledge gaps An introduction to ROC analysis Better decisions through science Permutation Tests for Studying Classifier Performance Computer Age Statistical Inference