key: cord-1015068-pmosgp8z authors: Spick, M. P.; Longman, K.; Frampas, C.; Costa, C.; Dunn-Walters, D.; Stewart, A.; Wilde, M.; Greener, D.; Evetts, G.; Trivedi, D. K.; Barran, P.; Pitt, A.; Bailey, M. title: Changes to the sebum lipidome upon COVID-19 infection observed via non-invasive and rapid sampling from the skin date: 2020-09-29 journal: nan DOI: 10.1101/2020.09.29.20203745 sha: 0e8547a7f41ec155950375ebf6a9bf5f1f81443f doc_id: 1015068 cord_uid: pmosgp8z The COVID-19 pandemic has led to an urgent and unprecedented demand for testing - both for diagnosis and prognosis. Here we explore the potential for using sebum, collected via swabbing of a patient's skin, as a novel sampling matrix to fulfil these requirements. In this pilot study, sebum samples were collected from 67 hospitalised patients (30 PCR positive and 37 PCR negative). Lipidomics analysis was carried out using liquid chromatography mass spectrometry. Total fatty acid derivative levels were found to be depressed in COVID-19 positive participants, indicative of dyslipidemia. Orthogonal Partial Least Squares-Discriminant Analysis (OPLS-DA) modelling showed promising separation of COVID-19 positive and negative participants when comorbidities and medication were controlled for. Given that sebum sampling is rapid and non-invasive, this work may offer the potential for diagnostic and prognostic testing for COVID-19. INTRODUCTION SARS-CoV-2, a novel coronavirus, was identified by the World Health Organization as originating in the Wuhan province of China in late 2019, 1,2 and causes Corona Virus Disease 2019 . SARS-CoV-2 combines an RO of 3 to 4 (RO, the reproduction number absent any controls such as lockdown), 3 and an estimated case fatality ratio (CFR) of 1%, 4 making the virus faster spreading with higher mortality than seasonal influenza. 5, 6, 7 The threat of SARS-CoV-2, therefore, derives from this combination of exponential transmission and relatively high mortality rates, on top of additional mortality due to 'crowding out' of the treatment of other illnesses. In many countries, high mortality rates were mitigated against through lockdown measures. But such measures are far from costless, with GDP in the OECD contracting by 9.8% in Q2 2002, 8 leading to meaningful disruption and welfare harm. But such measures are far from costless, with GDP in the OECD contracting by 9.8% in Q2 2002, 8 leading to meaningful disruption and welfare harm. Entry of SARS-CoV-2 to the human body occurs via receptors on the surfaces of cells, specifically the angiotensin-converting enzyme related carboxy-peptidase (ACE2) receptor. 9 Many cases of COVID-19 will be asymptomatic; those that are symptomatic most commonly present with pathologies related to the lower respiratory tract, although attacks on other organs are also well described. 10 In the most severe cases, the disease leads to hyper inflammation and acute respiratory distress syndrome (ARDS), driven by an excess of pro-inflammatory cytokines, sometimes referred to as a cytokine storm. 11 These symptoms reflect both the direct impact of the virus on specific tissues and also the host body's immune response. By taking account of both the presence of symptoms and pre-existing conditions that may influence the immune response, progress has been made in stratification of patients admitted to hospital with COVID- 19, 12 and treatment options have also improved. 13 Nonetheless, the disease still represents a major threat to health and welfare. Mass testing has been identified by the World Health Organisation as a key weapon in the battle against COVID-19 to contain outbreaks and reduce hospitalisations. 14 Currently deployed approaches to testing require the detection of SARS-CoV-2 viral RNA collected from the upper respiratory tract via polymerase chain reaction (PCR). Whilst these approaches are easily deployable and highly selective for the virus, they suffer from a significant proportion of false negative events. These arise due to a limited time window during the course of infection for sampling, as well as difficulties with sample collection. Furthermore, currently deployed approaches carry no prognostic information. Approaches that measure the effect of the virus on the host (as opposed to direct measurement of the virus itself) may offer a complementary solution in clinical or mass testing settings. As a coronavirus requiring lipids for reproduction, COVID-19 can be expected to disrupt the lipidome. 15 Evidence of dysregulated lipidomes have recently been observed in patients with COVID-19 via analyses of blood plasma and also by lipidomic analysis of nasopharyngeal swabs, 16, 17 and dysregulation of the skin would be consistent with the ability of canines to differentiate COVID-19 positive and negative by smell. 18 Lipidomics therefore offers a promising route to better understanding of -and potentially diagnosis and prognosis for -COVID-19. Sebum is a biofluid secreted by the sebaceous glands and is rich in lipids. A sebum sample can be collected easily and non-invasively via a gentle swab of skin areas rich in sebum (for example the face, neck or back), with characteristic features identified from sebum for illnesses such as Parkinson's Disease 19 and Type 1 Diabetes Mellitus. 20 In this work, we explore differences in sebum lipid profiles for patients with and without COVID-19, with a view to exploring their future use as a non-invasive sampling medium for testing and prognosis. In May 2020 several UK bodies announced their intention to pool resources and form the COVID-19 International Mass Spectrometry (MS) Coalition. 21 This consortium has the proximal goal of providing molecular level information on SARS-CoV-2 in infected humans, with the distal goal of understanding the impact of the novel coronavirus on metabolic pathways in order to better diagnose and treat cases of COVID-19 infection. This work took place as part of the COVID-19 MS Coalition and all data will be stored and fully accessible on the MS Coalition open repository. The study population analysed in this work included 67 participants, comprising 30 participants presenting with COVID-19 clinical symptoms (and an associated positive COVID-19 RT-PCR test) and 37 participants presenting without. A summary of the metadata is shown in Table 1. Age distributions for COVID-19 positive and negative cohorts were almost identical (mean age of 64.7 years and 65.0 years respectively). Comorbidities are associated with both hospitalisation and also more severe outcomes for COVID-19 infection, but will also alter the metabolome of participants, representing both a causative and confounding factor. The impact on classification accuracy of these comorbidities was tested by splitting participant data by variable and retesting modelling of COVID-19 positive and negative participants to see if separation improved; this process is described in the following sections. In this pilot study, comorbidities were less well represented in the cohort of COVID-19 positive participants than in the cohort of COVID-19 negative participants. In terms of diagnostic indicators, levels of C-Reactive Protein (CRP) were significantly higher for COVID-19 positive participants. Univariate analysis of individual lipid features showed no significant differences, but aggregated lipid classes did show differentiation; aggregate triglyceride (n=82), diglyceride (n=51) and monoglyceride (n=12) levels were all depressed for participants with both a positive COVID-19 diagnosis and PCR result. Boxplots for these lipid classes are shown in Figure 2. Other work has found evidence of dyslipidemia in plasma from COVID-19 positive patients, 17 albeit evidence of whether upregulation or downregulation is dominant for these lipid classes is mixed. Plasma triglyceride (TAG) levels have been found to be elevated in blood plasma for mild cases of COVID-19, but TAG levels in plasma may also decline as the severity of COVID-19 increased. 24 It should be remembered, however, that the primary role of skin is barrier function, and lipid expression in the stratum corneum depends on de novo lipogenesis -in fact nonskin sources such as plasma provide only a minor contribution to sebum lipids. 25 No clustering was identifiable at the total population level by PCA, i.e. by unsupervised analysis ( Figure S1 , Supplementary Information). OPLS-DA performed on the same data set still revealed limited separation ( Figure 3 ). R2Y was 0.72 and Q2Y was -0.04, showing that the model was able to achieve some separation of the two groups (COVID-19 positive and negative), but that the model did not have predictive power, possibly indicating overfitting. Given the wide range of comorbidities and the lack of age-matching, this is not unexpected. To test whether separation would improve in smaller / more homogenous groups, separate OPLS-DA models were built for each split of the population by comorbidity. If model performance improved (measured by goodness of fit, R2Y, and predictive power, measured by Q2Y) then this could indicate that sebum profiling would perform better if models were constructed based on stratified and matched datasets. Table S1 shows the results for these metrics across the different modelled subsets. Separation generally improved as the data were binned more finely, but for most subpopulations there was no improvement in the modelled predictive power. Four subsets did however show more interesting improvements in model performance. These were the subsets with a specific comorbidity that was being treated by medication (high cholesterol, T2DM and IHD) and the subset undergoing treatment with statins. These are discussed in more detail in the following sections. OPLS-DA modelling of the subset of participants under medication for high cholesterol showed both good separation (R2Y of 1.00) and also better predictive power (Q2Y of 0.53). This subgroup was treated with lipid-lowering agents, specifically statins, with one exception (due to allergic reaction). The subgroup comprising participants undergoing treatment for ischemic heart disease (IHD) also showed much better separation (R2Y of 1.00) plus some indication of improved predictive power (Q2Y of 0.52). This subgroup received varied medication, but participants presenting with IHD were also being prescribed statins. One possibility for the improved separation and model scores is that the sub-populations are more homogenous for confounding factors. Certainly, comorbidity as a confounding factor has been reduced by grouping according to whether participants were treated for said comorbidities. The ranges of ages in the comorbidities subgroups are somewhat reduced for those treated for hypertension and T2DM (generally skewing older) and more markedly reduced for those treated for high cholesterol ( Figure S2 , supporting material), albeit as discussed above age itself appears not to be a direct predictor of dyslipidemia. Additionally, as shown in Figures S3 to S5 , it is possible to provide separation on the basis of gender. This raises the possibility that a larger dataset -with the potential for matching and stratifying the participant population more rigorously -could yield classification models with greater predictive power. Alternatively, confounding factors might be reduced by medication, providing a more similar "baseline" against which to measure perturbance in the lipidome by COVID-19. The subset of participants taking statins (which includes both participants treated for high cholesterol and also participants with poor diabetic control or history of ischaemic heart disease, where statins are routinely added prophylactically to improve long-term outcomes) also shows improved separation and predictive power by OPLS-DA modelling (Figure 9 ), with R2Y of 0.74 and Q2Y of 0.39. Furthermore, these findings suggest that better matching of participants could yield a clearer separation of positive and negative COVID-19 participants by their lipidomic profile. Of course, it cannot be ruled out that the lower n values for the smaller subsets could lead to apparently better R2Y and Q2Y scores only by chance. Overfitting is a risk in any pilot study with small n; this risk can only be reduced through both a larger training set of data and subsequently testing the models on validation sets of data. Another point to note is a possible lack of confounders in the participant population from seasonal respiratory viruses. Whilst the COVID-negative patients included patients with respiratory illnesses (e.g. COPD, asthma) and COVID-like symptoms, samples were collected between May and July, when the incidence of respiratory viruses is generally low. Both the common cold and influenza have some symptoms overlap with COVID-19 and may possibly lead to alterations to lipid metabolism that could interfere with the identification of features related to COVID-19 infection. Such viruses within the UK are more prevalent in autumn and winter. 27 Whilst it seems unlikely that seasonal respiratory viruses were a major confounding factor in this work, this is a factor that will need to be taken into account in future studies, and may also allow the opportunity to test sebum's selectivity and specificity with regard to other respiratory viruses. A final limitation of this study is inconsistency in the timeline between onset of symptoms, hospital admission, PCR test and sebum sampling, which was an inevitable consequence of collecting samples in a pandemic situation. Patients were sampled immediately upon recruitment to the study. This means that the range in time between symptom onset and sebum sampling ranged from 1 day to > 1 month. Future work should explore longitudinal sampling of patients, to establish how quickly (or whether at all) the sebum lipidome returns to normal, and whether it has prognostic power. This will help to inform the practical utility of sebum in clinical or mass testing. This means that the range in time between symptom onset and sebum sampling ranged from 1 day to > 1 month. Future work should explore longitudinal sampling of patients, to establish how quickly (or whether at all) the sebum lipidome returns to normal, and whether it has prognostic power. This will help to inform the practical utility of sebum in clinical or mass testing. At the aggregate level, analysis of the metadata for the participants in this study illustrates the challenges involved in constructing a well-designed sample set during a pandemic. Age ranges of participants were large, and a wide range of comorbidities were present, leading to many confounding factors. We provide evidence that COVID-19 infection leads to dyslipidemia in the stratum corneum, with participants in this study with symptoms and a positive clinical COVID-19 diagnosis presenting with depressed levels of some lipid classes. We further find that the sebum lipidomics profiles of COVID positive and negative patients can be separated using the multivariate analysis method, OPLS-DA, with the separation improving when the patients are segmented in accordance with certain co-morbidities. In addition to these promising findings, sebum samples can be provided quickly and painlessly and can be transported and stored at room temperature. We conclude that sebum is worthy of future consideration for clinical sampling for COVID-19 infection. The materials and solvents utilised in this study were as follows: gauze swabs (Reliance Medical, UK), Collection of the samples was performed Researchers from the University of Surrey at Frimley Park NHS Foundation Trust hospitals. Participants were identified by clinical staff and were categorised by the hospital as either "query COVID" (meaning there was clinical suspicion of COVID-19 infection) or "COVID positive" (meaning that a positive COVID test result had been recorded during their admission). Each participant was swabbed on the right side of the upper back, using 15 cm by 7.5 cm gauzes that had each been folded twice to create a four-ply swab. The surface area of sampling was approximately 5 cm x 5 cm, pressure was applied uniformly whilst moving the swab across the upper back for ten seconds. The gauzes were placed into Sterilin polystyrene 30 mL universal containers. Samples were transferred from the hospital to the University of Surrey by courier within 4 hours of collection, whereupon the samples were then quarantined at room temperature for seven days. Finally, the vials were transferred to minus 80°C storage until required. Alongside sebum collection, metadata for all participants was also collected covering inter alia gender, age, comorbidities (based on whether the participant was receiving treatment), the results and dates of COVID PCR (polymerase chain reaction) tests, bilateral chest X-Ray changes, smoking status, and whether the participant presented with clinical symptoms of COVID. Values for lymphocytes, CRP and eosinophils were also taken -here the most extreme values during the hospital admission period were recorded. These were not collected concomitantly with the sebum samples. The analysis of the obtained samples was adapted from Sinclair et al. 28 To extract analytes from the sample gauzes, the Sterilin vials and contents were allowed to equilibrate to room temperature after which 9 mL methanol was added, followed by vortex-mixing for 10 sec. The solution with gauze was then sonicated for 30 min at ambient temperature. The metabolite-rich methanol was then filtered through a 0.2 μm filter to yield three equal aliquots of 2 mL fractions in 2 mL Eppendorf tubes, and a 0.2 mL aliquot reserved to create a pooled QC in a separate 10 mL scintillation vial. Each 2 mL sample was then dried under nitrogen for 3 hours, leaving a lipid pellet, and frozen at minus 80 °C until the day of analysis. To reconstitute the samples, the Eppendorf tubes and contents were allowed to equilibrate to room temperature. Each day's run was completed with pooled QC injections (n=2) and solvent blanks (n=3). A triplicate injection of a field blank was also obtained. Analysis of samples was carried out using a Dionex Ultimate 3000 HPLC module equipped with a binary solvent manager, column compartment and autosampler, coupled to a Orbitrap Q-Exactive Plus mass spectrometer (Thermo Fisher Scientific, UK) at the University of Surrey's Ion Beam Centre. Chromatographic separation was performed on a Waters ACQUITY UPLC BEH C18 column (1.7 µm, 2.1 mm x 100 mm) operated at 55 ºC with a flow rate of 0.3 ml min -1. The mobile phases were as follows: mobile phase A was acetonitrile:water (v/v 60:40) with 0.1% formic acid, whilst mobile phase B was 2-propanol:acetonitrile (v/v, 90:10) with 0.1% formic acid (v/v). An injection volume of 5 µL was used. The initial solvent mixture was 40% B, increasing to 50% B over 1 minute, then to 69% B at 3.6 minutes, with a final ramp to 88% B at 12 minutes. The gradient was reduced back to 40% B and held for 2 minutes to allow for column equilibration. Analysis on the Q-Exactive Plus mass spectrometer was performed in split-scan mode with an overall scan range of 150 m/z to 2 000 m/z, and 5 ppm mass accuracy. Split scan was chosen to maximise the m/z range to 150 to 2 000 m/z whilst maximising the number of features identified. 29, 30 Operating conditions are summarised in the table below. Data processing LC-MS outputs (.raw files) were pre-processed for alignment, normalisation and peak identification using Progenesis QI (Non-Linear Dynamics, Waters, Wilmslow, UK), a platform-independent small molecule discovery analysis software for LC-MS data. Peak picking (mass tolerance ±5 ppm), alignment (RT window ±15 s) and area normalisation was carried out with reference to the pooled QC samples. Features were annotated using accurate mass match with Lipid Blast in Progenesis QI cross-checked against LipidSearch (Thermo Fisher Scientific, UK). This process yielded a peak table with 14,160 features. All those features with a coefficient of variation across all pooled QCs above 20% were removed, as were those that were not present in at least 90% of pooled QC injections. These features were then field blank adjusted: all those features with a signal to noise ratio below 3x were also rejected. The remaining set of 998 features were deemed to be robust, reproducible and suitably distinct from those found in the field blank. Inclusion criteria were also applied to participant data, requiring both full completion of metadata and also agreement between the result of the PCR COVID-19 test (Y/N) and the clinical diagnosis for COVID-19 (Y/N). Whilst these inclusion criteria reduced the total number of participants from n=87 to n=67, this was considered worthwhile given the potential for misdiagnosis to confound the development of statistical models. Data processing and analysis of the peak:area matrix was conducted through a combination of (a) user-written scripts in the statistical programming language R, using the RStudio graphical user interface package, and (b) the online metabolomics suite of tools contained within Metaboanalyst TM . 31 Both PCA and OPLS-DA were performed for classification and prediction of data. A knock-one-out approach was used for OPLS-DA model validation. The data were pareto-scaled in RStudio as part of all statistical analyses, without replacement of missing values. The authors would like to acknowledge funding from the EPSRC Impact Acceleration Account for this work. WHO advice for international travel and trade in relation to the outbreak of pneumonia caused by a new coronavirus in China Diamond Princess passenger dies, bringing ship's death toll to seven Simulation of the effects of COVID-19 testing rates on hospitalizations 