key: cord-0264497-rh5npuv6 authors: Storey, Emily E.; Wu, Duxuan; Helmy, Amr S. title: Enhanced Sensitivity for Quantifying Disease Markers via Raman and Machine-Learning of Circulating Biofluids in Optofluidic Chips date: 2021-05-26 journal: nan DOI: 10.1109/jlt.2021.3084471 sha: b1b9b0578be9b8a298251ff4b0687e3b820336b2 doc_id: 264497 cord_uid: rh5npuv6 We demonstrate novel instrumentation for spontaneous Raman spectroscopy in biofluids, enabling development of a portable, automated, reliable diagnostics technique requiring minimal operator expertise to quantify disease markers. Label-free Raman analysis of biofluids at physiologically-relevant sensitivities is achieved using a microfluidic-embedded liquid-core-waveguide augmented with a unique circulation approach: thermal damage and spectrum variance is minimized, eliminating conventional limits on integration time for excellent signal-to-noise ratio and temporal stability. Machine-learning then optimizes spectrum processing, yielding quantitative results independent of end-user proficiency. Sub-mM accuracy is achieved in solutions of both high and low turbidity, surpassing the sensitivity of previous techniques for analytes with a small scattering cross-section, such as glucose. We attain a new record for label-free glucose measurements in an artificial whole-blood, achieving an accuracy up to 0.14 mM, well-exceeding the 0.78 mM accuracy required for diabetic monitoring, establishing our technique's potential to significantly facilitate portable Raman for complex biofluid analysis. O N-SITE rapid-turnaround health monitoring is increasingly in demand to maximize efficiency and minimize patient stress. Biofluids are a promising means for noninvasive diagnostics, faciliting routine monitoring and replacing invasive investigative procedures which may entail complications, thus enhancing patient quality of life. To this end we seek an analytic system which offers enhancement in sensitivities reaching physiologically relevant values; when developed, this system has diagnostic potential in a range of biofluids which each offer unique insight into a patient's state of health. Four factors will play a key role in the success of a rapidturnaround health monitoring system: portability, non-invasive sample collection, no specimen pre-treatment, and reliable automated diagnostic interpretation of the results. Diabetes management excellently illustrates the need for a system with E.E. Storey these four features. Regular and accurate monitoring of blood glucose concentration is key to successful management, yet many find finger pricking, the traditional standard, to be painful. Non-invasive biofluids, such as tears, can serve as an alternate monitoring fluid, increasing compliance to decrease mortality [1] . These four factors are equally essential for patients dealing with a sudden illness onset: rapid diagnosis can have a drastic influence on survivability, both to the individual and to the community at large depending on pathogenicity. This situation is demonstrated during the Covid-19 pandemic: as under-funded health systems struggle to maintain high testing rates, the virus surges where there is significant delay between disease contraction and diagnosis [2] . Several techniques have been proposed in recent years as alternate biofluid sensing platforms [3] . Paper-based devices, such as lateral-flow assays, have become widespread in part due to their ease of use and cost-effectiveness. These devices are highly specific, requiring multiple tests to detect multiple analytes -an option that can be complicated if further sample is unavailable [4] . A second alternate technique is a flexible biosensor for tear glucose measurement, which shows promising correlation between output current and glucose levels [5] . Devices of this type can arguably satisfy the four factors we desire, but there is high potential for interference by other electroactive species. The aforementioned devices must be tailored to each particular analyte. While the sample itself does not require pre-treatment, the measurement system must be reconfigured to detect different targets, reducing applicability towards a range of diseases. Optical methods do not suffer this drawback. Non-invasive biofluid analysis is readily accomplished by Raman spectroscopy due to its excellent chemical specificity. Spontaneous Raman is notorious for a faint scattering signal, however, thus not serving our desire for a pre-treatment-free system. Amplification solutions such as high-powered laser sources are damaging to biological specimens, while pretreatments such as Surface-Enhanced Raman Spectroscopy (SERS), or Drop-Coating Deposition Raman Spectroscopy (DCDRS) require specially-prepared substrates or functionalization which is dependent on the reagents involved. It is prohibitive to yield consistent spectra without highly-trained personnel, making these methods unsuitable for straightforward diagnostic interpretation. Spontaneous Raman with Liquid Core Waveguide (LCW) optofluidics enhances the collection efficiency without these specimen modifications. This technique has shown that it provides superior sensitivity, ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. enabling native-state analysis of compounds which would not otherwise be possible using Raman. We have previously demonstrated the use of LCW Raman as a probe to monitor growth of ZnO nanoparticles, and in the world-first detection and characterization of thiol-capped CdTe Quantum Dots, with zero impact on their molecular properties [6] , [7] . However these prior optofluidic Raman demonstrations, alongside more recent related literature, are not practical for implementation outside of a laboratory environment: they require that the analyte is contained in a bulky apparatus, or is exposed to the environment, posing a risk of contamination and exposure to pathogens [8] , [9] . We demonstrate here a system which eliminates these hurdles, using optofluidic Raman to achieve comparable signal enhancement to that obtained by SERS in a manner which makes portable implementation and straightforward pretreatment-free diagnostics a realizable goal. The complete fluid-based monitoring system which we demonstrate is ready to integrate with portable Raman spectroscopy for analysis of non-invasively-collected biofluids at physiologically-relevant sensitivity of detection. We achieve such sensitivities using continuous-flow optofluidics, minimizing thermal damage to the sample and thereby eliminating limits on integration time. The hardware which we demonstrate serves three of the key factors we desire: portability, non-invasive sample collection, and no specimen pre-treatment. The final factor, reliable automated diagnostic interpretation, is addressed through machinelearning. Variation in biological samples is inherent even from one individual: these variations are reflected in Raman spectra and it is crucial to compensate for with a high degree of reproducibility in order to accurately assess one patient relative to a normal standard [10] , [11] . Spectrum preprocessing techniques are common to mitigate these variations, but the decision of which specific technique is most suitable requires manual intervention, and thus a predictive model's robustness will vary with the operator [12] . Thus we introduce a machine-learning algorithm for preprocessing technique selection, achieving user-independent spectrum optimization. The different components of this system, once integrated, provide previously untenable performance from a single device, for robust automated diagnostics capability. We demonstrate excellent chemical specificity, readily adaptable to quantitatively analyze a range of fluids with accuracies at or exceeding the µM level. All spectra were collected on a Horiba HR800 spectrometer equipped with three excitation laser wavelengths: 488.2 nm (3.5 mW at the sample), 632.8 nm (10 mW), and 785.0 nm (30 mW). 1) Microfluidics: Fluidic sample circulation to facilitate portable and pre-treatment-free sample analysis is accomplished via an integrated microfluidic waveguide-on-chip. This design builds upon previous work in which the waveguide facet is exposed to air and fluid is introduced to the waveguide core by capillary action. Containing the sample and LCW Fig. 1 . Integrated-waveguide microfluidic design. Microfluidic chip consists of a layer of Polydimethylsiloxane (PDMS) patterned with fluidic channels and irreversibly bonded to a glass coverslip. Fluidic sample is depicted as dark yellow in tubing and microfluidic channel; inside the waveguide it is depicted in red, illustrating the co-location of sample and exciting laser source. Direct tubing connections allow a syringe pump to supply differential pressure between two syringes, such that the biofluid never leaves the closed-loop system and can be collected after measurement, eliminating environmental exposure and sample wastage. in a microfluidic device eliminates sample evaporation at the facet, maintaining optimal optical coupling. A dual-syringe system, one syringe to dispense and one to collect via differential pressure, interfaces with the microfluidic device to circulate sample through the waveguide during a measurement. This continuous-flow eliminates previous limits on integration time for delicate biological samples which may crystallize or denature in the presence of high-intensity laser sources, as the volume of sample which resides in the focal volume is continually refreshed, thus dispersing the thermal load. This microfluidic and syringe system additionally eliminates waveguide length limitations. Within the waveguide, laser and sample are co-located throughout the full length: increasing the total volume of sample which interacts with the laser increases the number of scattering events and thereby the quality of spectrum [13] . Figure 1 details this configuration. Microfluidic devices consist of Polydimethylsiloxane (PDMS) (Sylgard-184) adhered to a glass coverslip. Biopsy punches are used to bore cylinders in cured PDMS for pressfit LCW and tubing connections. Two different LCWs have been tested in our device: Teflon Capillary Tubes (TCTs) (AF 2400, Biogeneral, Inc.) and Hollow-Core Photonic Crystal Fibers (HC-PCFs) (HC-800 and HC-1060, NKT Photonics). TCTs provide Raman enhancement solely through Total Internal Reflection (TIR). HC-PCFs additionally enhance the Raman collection efficiency via the photonic bandgap effect [14] , resulting in an enhancement upwards of two orders of magnitude relative to the signal obtained from a TCT, and three orders of magnitude relative to the Raman spectrum of a droplet of fluid exposed to air, as shown in Supporting Information, figure ? ?. Panel B of the same figure contrasts the enhancement provided by each waveguide for a single glucose Raman mode, located at 1127 cm −1 , as a function of concentration, normalized to exposure time. It is not until optofluidics is merged with continuous circulation that we are able to integrate indefinitely and obtain spectra of excellent signal to noise ratio for optimized concentration determination. Optofluidic integration of HC-PCFs requires that the fluid is restricted from entering the microstructured cladding of both fiber facets in a manner which does not compromise coupling to the fiber core. This is accomplished in a twostep process, dubbed Photonic Adhesive Tip Tamping (PATTi). A thin layer of ultraviolet-curable adhesive is applied across the tip of the fiber, while a burst of air clears adhesive from the central core. Specialized equipment is minimal: a micromanipulator, microscope, syringe pump, and ultraviolet source. Full procedure is included in Supporting Information, figure ? ?. PATTi has a yield over 80% and allows us to routinely achieve adhesive-sealed lengths below 20 µm at both ends of the fiber. Length of LCWs in all experiments is 25 mm. Exposure times ranged from 3 to 480 seconds per detector window, for a total spectrum collection time up to 2 hours. Species which are highly Raman active, suspended in a low-scattering medium and excited with a high-energy laser required very short exposure times. When the opposite situation occurs, Raman spectra of similarly excellent quality can likewise be collected by extending exposure times to compensate. The samples described in this experiment were circulated at rates between 0.05 and 0.33 µL/min. For TCTs, this means the fluid within the waveguide core is refreshed every 2 to 15 minutes, and for HC-PCFs, every 1.5 to 10 minutes. The lower limit for flow rate was selected such that the fluid within the waveguide core is being refreshed on a reasonable time scale within which thermal damage is unlikely to occur. The upper limit was selected to mitigate pressure drop within the device which may cause press-fit tubing connections to disconnect. Within the quoted range of flow rates, the fluid velocity within the waveguide core did not present a noticeable influence on the quality or intensity of a Raman spectrum. 2) Artificial Biofluids: Two artificial biofluids are prepared for this study: one to mimic human tear fluid and one which emulates the high-scattering whole human blood. Samples to mimic human tears were comprised of lysozyme and glucose dissolved in Deionized Water (DIW), each at physiologically relevant levels. D-(+)-Glucose (G8270) and Lysozyme from chicken egg white (62970) were purchased from Sigma-Aldrich. 40 unique samples were prepared with analyte distribution primarily concentrated within the range of diagnostic relevance, down to a minimum sensitivity of 0.17 mM lysozyme and 1.4 mM glucose [10] , [15] , [16] . Maximum analyte concentrations in our samples exceeded the upper bounds for diagnostic relevance. These elevated concentrations allow us to confirm the presence of known Raman modes for each analyte, particularly glucose due to its small scattering cross-section. This combination of components was selected to provide a spectrum which allows us to discern the concentration of the desired analyte under varying background conditions. This is a necessary first step upon which future work will build, as the Raman spectra of human biofluids will contain other components which complicate the baseline spectrum against which we seek to measure the analyte of interest. A 20% Intralipid fat emulsion (Sigma-Aldrich I141) acts in stead of whole human blood; its scattering coefficient is a good match to that of whole blood and it requires no special handling or disposal [17] , [18] . These samples were prepared to test the limitations of detecting a species with weak Raman signature in a highly scattering medium. 15 binary solutions were prepared between 0 mM and 138.8 mM glucose, primarily concentrated in the range 278 µM to 2.78 mM. To achieve reliable automated diagnostics from Raman spectra, we have previously introduced a machine-learning algorithm to optimize spectrum processing for analysis [19] . Here we build upon those methods: the parameter which selects a solution for constituent concentration determination is tested against alternates, to determine which parameter optimizes the solution and minimizes machine learning error. Principal Component Regression (PCR) is used for predictive analysis. As the highest-order Principal Components (PCs) inevitably represent noise and are of no value to a predictive model, the number of PCs in the model must be carefully chosen not to exclude subtle spectrum details in relatively high-order PCs which positively contribute to prediction accuracy. We accomplish this by performing cross-validation on a training data set for preprocessing methods under consideration. Results are then stored in a Predicted Residual Sum of Squares (PRESS) matrix for variance analysis. An F-test (significance level α = 0.05) assesses the variability in this matrix to determine if improvement in prediction is statistically significant or due to sampling. This process rejects preprocessing methods which demonstrate chance correlation and is presented in figure 2 [19] . Utilizing only those preprocessing methods which display statistical significance provides a numerical measure of assurance that the model is robust and will perform similarly well when applied to new spectra. Throughout this discussion we shall refer to this as the current method, as it has been successfully utilized in previous work [19] . However, a model constructed in complete absence of user input will only be as robust as the selection of optimal PC allows. To build upon the work in [19] we aim to statistically optimize the indicating factor which identifies optimal PC, comparing the current method to 10 other indicators. To do this we form a linear regression model between the indicating variables and Residual Sum of Squares (RSS). A linear fit is performed and R 2 and median error values from this fit are used to guide our assessment of the indicator(s) which optimize PC selection. Full description of indicator variables is included in Supporting Information table ??. We demonstrate in this section the design of a label-free user-independent quantitative analysis system for biofluids, providing proof of concept for reliable detection and quantification of disease markers in a portable automated setting. This is accomplished with a microfluidic device with embedded waveguide for closed-system continuous-flow and circulation of the solute under test. Microfluidics lends itself readily to portability, easily realizing a small-scale device with flexible configuration in which the sample is circulated for measurement. This closed-system continuous-flow system, illustrated in figure 1, provides temporal stability to the Raman spectrum on two critical aspects. First, if the fluid reservoir is not sufficiently maintained, the sample disappears from the HC-PCF facet in a matter of seconds. This case is illustrated in figure 3 panel A in which a vial of sample (DIW) is removed, causing signal intensity to drop by more than 80%. Second, we directly inhibit structural or chemical changes to the biofluid over extended measurement periods by dispersing the thermal load. These modifications alter the resultant spectrum and thus our ability to reliably quantify. A sample case is illustrated in figure 3 , panels B and C, in which the Raman spectrum of a static solution of glucose dissolved in DIW (2.8 mM) is measured using an HC-PCF as described in [14] . Spectra presented are relative to the spectrum at time 0. Extended integration times are necessary to sufficiently resolve Raman modes of trace analytes, but spectra are not temporally stable in this configuration: glucose modes in the vicinity of 1127 cm −1 and 2900 cm −1 become increasingly prevalent. The result is an unacceptable ambiguity in interpretation for diagnostics. To demonstrate the claims of reliable automated diagnostics with no sample pre-treatment, two artificial biofluids are constructed to emulate the optical properties of tears and whole blood, representing examples of non-invasive and invasive bio-analysis, respectively. The accuracy and performance we achieve lays the groundwork to apply similar methods to a much wider array of fluids and diagnostically-relevant analytes beyond what is presented. These alternate biofluids include urine and saliva, where their potential will be explored in the Discussion. An essential component to accurate composition prediction for diagnostics is the selection of appropriate preprocessing treatment for raw data. We automate this selection, marrying PCR with a measure of statistical significance to indicate that the model's predictive capabilities are not due to chance correlation with training data. A solution of 0.5% (m/v) glucose in deionized water is measured over several hours; glucose modes become increasingly prevalent relative to the spectrum at time 0 due to analyte deposition and crystallization, making the solution appear more saturated than its true concentration. Accurate measures of analyte concentration with respect to reference value are paramount, in particular for chemometrics relating to physiological metrics, to ensure that predictions will lead to appropriate treatment methods. At best, failure to accurately quantify may result in prolonged discomfort for a patient seeking to rectify an ailment. At worst, fatal failure to detect and enact treatment may occur. We present in figure 4 the estimated versus actual concentration of each analyte under test using our methods. Cases in which we have used machinelearning to select the optimal PC but applied no preprocessing are shown with open symbols. A dashed line indicates perfect correlation. Significance of these results are presented with four metrics of accuracy in table I: 1) error in prediction with no preprocessing or machine-learning, 2) when a preprocessing method is not found to have statistical significance, 3) when a preprocessing method is statistically significant, and 4) optimal error in prediction amongst all statistically significant preprocessing methods. Optimal predictions automate the selection of number of PCs to form a predictive model, and all preprocessing techniques which display significance are manually selected amongst. No user intervention is required to determine statistical significance, and these methods repeatedly yield improved accuracy over those that do not display significance. Artificial tear solutions designed for this study are composed of DIW with solutes glucose and lysozyme; here we shall discuss their respective potential means for non-invasive diabetes monitoring, and as a tear proteins biomarker for Herpes Simplex Virus (HSV). While the bulk of this paper is focused on applicability to glucose quantification for diabetes management, we include lysozyme and HSV determination for context of applicability towards other ailments. Optimal PCR predictions for each analyte and sample, measured by both TCT and HC-PCF, are shown in figure 4 table I. A glucose measurement device must predict concentrations within 20% of reference values to be considered clinically accurate and free from potentially-fatal disease mistreatment [20] . Here we have achieved a median accuracy up to 298 µM, a 21% error on the minimum sensitivity of our experiment (1.4 mM). While this value does not meet the threshold for clinical accuracy, we shall later demonstrate that our selection algorithm can be optimized to meet and exceed 20% error. HSV is highly contagious and cannot be cured; as such, effective detection is paramount to preventing its spread, ensuring appropriate preventative measures are taken. Mean lysozyme level in tears of patients with HSV is 1.0 mM and without is 2.1 mM [16] . When measured using our devices, artificial tear solutions were detected with a minimum lysozyme sensitivity of 0.17 mM. This sensitivity of detection exceeds the minimum level at which lysozyme is expressed in human tears as a consequence of HSV. We observe a median accuracy as low as 42 µM amongst optimal methods processed using our statistical algorithm, well in excess of the accuracy required for HSV determination. 2) Glucose Predictions in Simulated Whole Blood: To establish proof of concept and test accuracy of detection for an analyte with a small Raman scattering cross-section within a high-scattering opaque artificial biofluid, we present results from samples composed of an intralipid fat emulsion and powdered glucose. Haemoglobin in whole blood is an excellent light scatterer and samples are typically centrifuged prior to analysis; these results demonstrate a facile method to compensate for traditional measurement difficulties, an essential precursor to pre-treatment-free blood diagnostics using Raman. Minimum glucose sensitivity in these artificial whole-blood samples is 280 µM glucose. All spectra were collected using a HC-PCF waveguide and excitation wavelengths of 633 nm and 785 nm. These wavelengths were chosen strategically such that a HC-PCF (HC-800, NKT Photonics) guides their respective Raman shifts via photonic bandgap effect (50-1500 cm −1 for 785 nm, and 2500-3750 cm −1 for 633 nm). A third excitation wavelength, 488 nm, was used for a selection of low-concentration samples, providing Raman collection enhancement by TIR. The combination of these three wavelengths is appropriate for the sample at hand (intralipid), but the same combination does not extend to a human wholeblood sample due to strong fluorescence in that region. Appropriate pairing of sample and Raman wavelength, as we have done here for intralipid, is a necessary precursor to characterization of biological samples in order to mitigate fluorescence. HC-PCFs assist to this end, as their narrow photonic bandgap restricts the spectrum propagating through the sample to those wavelengths within the bandgap-guided range, thus suppressing the propagation of frequencies which promote fluorescence. Resulting optimal predictions at each excitation wavelength are presented in table I; optimal predicted versus actual concentrations for glucose, excited at 488 nm, are depicted in figure 4 panel C. Solutions in this set were constructed with diabetes detection in mind and, as such, predictions must fall within 20% of reference values above 3.89 mM (70 mg/dL) in order to be considered clinically accurate [20] . Thus, we define here the minimum permissible accuracy to be 0.778 mM. We exceed this accuracy by a wide margin when samples are measured using 488 nm, achieving 0.14 mM. When exciting the sample with 633 nm or 785 nm, however, accuracy falls short of this goal. Compatibility with our microfluidic system necessitates that the waveguide maintains light-guiding once immersed in fluid. TCTs immediately satisfy this requirement, suiting them for use if the facilities to prepare a waveguide for immersion are not available. Their sub-optimal scattering collection efficiency is easily compensated for with circulation and exposure times in excess of several hours. Over the same time scale in a static solution, laser heating evaporates the sample from the focal volume, compromises coupling, and introduces thermal effects such that the signal we observe is not representative of the bulk solution ( figure 3) . These thermal effects are mitigated with our closed-system continuous-flow optofluidics, as the fluid under investigation within the waveguide core is continually being refreshed and the thermal load is dispersed. HC-PCFs outperform TCTs with regards to accuracy, however we demonstrate that difference in performance is largely mitigated using our algorithm for quantification, rather than peak fitting, the traditional standard. We present three different R 2 metrics to validate this claim: 1) correlation between concentration and peak height at the 1127 cm −1 glucose Raman mode, 2) correlation between true versus predicted concentrations of glucose in all cases where statistical significance was identified by our algorithm, and 3) correlation between true and predicted concentrations of glucose in optimal statistically significant preprocessing methods. Results are presented in table II. Using the traditional peak fit, an R 2 difference of 0.11 is observed. This falls to a mere 0.02 once our algorithm is introduced. The machine-learning methods which are utilized in this paper are an extension of those in which we propose that an F-test may be used to indicate whether a preprocessing method will yield a robust model, and to simultaneously select the optimal PC [19] . Furthering those efforts, the effect of the indicating parameter which selects the optimal PCs for a predictive model cannot be negated. In this section we statistically optimize the indicating parameter for PC selection relative to the current indicating parameter, as it is used in previous work. Ten indicative parameters which may act as markers to select the optimal PC were identified; a list of these parameters is included in table ??. Forming a linear model between predictor (value associated with the indicating parameter) and response (RSS), we use the model's R 2 value and median error to select candidates for optimization. Amongst all linear models of potential factors we observe R 2 = 0.23 ± 0.10 (median ± stdev) and median error 1.61 ± 0.37. The indicating parameter which simultaneously maximized R 2 and minimized median error for all data sets included in this paper was the product of five different factors, indicating that a non-linear solution may be optimal. Using the numbering convention listed in supplementary table ??, the optimal parameter is 3×4×5×7×8. Performance of the optimal parameter relative to the original indicating parameter (used in [19] and the data presented in figure 4 and table I) is included in table III, where it is apparent that the optimized parameter has greatly increased R 2 and reduced the median error. This optimized indicating parameter is validated using the samples from this study which represent the closest approxi- Non-invasive diagnostics via biofluid analysis in healthcare applications has undergone steady progress in recent years. Paper-based devices, smartphone-compatible microscopy systems, and wearable biosensors all offer rapid results outside of a laboratory setting, making healthcare more accessible by reducing reliance on specialized facilities and personnel [3] . These devices are highly specific, however, requiring reconfiguration for each detected analyte. Even label-free optical techniques, such as Surface-Plasmon Resonance or Mach-Zehnder Interferometers, require functionalized sensor surfaces which are specific to the target [21] . Although some of these methods demonstrate sensitivity greater than that which we have shown here (a wearable smartphone-based biosensor has demonstrated a limit of detection of 35 µM glucose in sweat, for example), their utility is limited by the need to reconfigure [22] . This is the barrier which we address here with Raman spectroscopy and machine-learning. The use of Raman-based chemometrics for biofluid-based healthcare applications offers an alternative for label-free analysis with zero functionalization, and great strides have been made in recent years to increase reliability and robustness of predictive measurements with machine-learning. With regards to glucose detection in solution via Raman, we present a summary of historical related literature in table V, where we define a Figure of Merit (FOM) as the product of laser power and glucose sensitivity. We wish to minimize both of these parameters: low laser power decreases damage to biological samples and increases the likelihood that the technology can reasonably be implemented, and minimal sensitivity increases accuracy for confident diagnostics. We achieve here a reduction in the necessary laser power by two orders of magnitude for non-SERS characterization, and a heightened sensitivity by two IV MACHINE-LEARNING PREDICTION ERROR FOR STATISTICALLY SIGNIFICANT PREPROCESSING METHODS, ACTING ON A SOLUTION OF GLUCOSE IN INTRALIPID, IN WHICH THE OPTIMIZED INDICATING PARAMETER IS USED TO DETERMINE THE PRINCIPAL COMPONENT. RESULTS MAY BE DIRECTLY COMPARED TO TABLE I IN WHICH THE ORIGINAL INDICATING orders of magnitude for glucose quantification in solutions of high turbidity. The analysis system described here provides an enhanced Raman signal collection and vastly improved algorithms for portable, pre-treatment-free, reliable diagnostics in biofluids, without the need for SERS. This has the potential to streamline diagnostics using Raman without the need to carry out any sample preparation, particularly for whole blood. Here we shall discuss the novelty and impact which each component in our system provides. The microfluidic device ensures that a given biofluid does not interact with potential environmental contaminants and does not experience thermal effects which may alter or obscure the Raman spectrum, thereby affecting diagnostic efficacy. The microfluidic configuration also eliminates evaporation, which ensures that optimal optical coupling is maintained during prolonged measurements. Environmental confinement also contributes to operator safety if the sample is may be pathogenic or toxic. To the best of our knowledge this is the first instance of a fully-contained microfluidic system with continuous circulation and an integrated waveguide for enhanced Raman spectroscopy. LCWs allow us to achieve enhanced Raman collection efficiency which permits label-free non-invasive detection and monitoring. Traditional Raman is limited to collect a scattering signal from the laser pump's focal volume; the captured signal can be enhanced via larger Numerical Aperture (NA) objectives to gather from a greater solid angle, but this comes at the expense of depth of focus and thereby the number of molecules in the focal volume. Confining both laser pump and sample in the core of a waveguide bypasses this NA tradeoff, as molecules now interact with the pump along the entire waveguide length, increasing the number of scattering events and thereby the Raman signal [14] . The enhanced collection efficiency which we achieve due to fully-contained LCWs allows us to integrate near-indefinitely, providing sensitivity beyond that needed to simply detect the presence of relevant Raman modes and bypassing the levels of sensitivity previously achieved using bare LCWs [23] , [14] . At the same time, circulation inhibits crystallization and analyte deposition during prolonged integration, ensuring that the composition which we observe is true to the sample's chemical makeup. With these novel modifications we can ensure accurate monitoring of minute changes between specimens. These enhanced Raman collection methods achieve both sensitivity and accuracy for compounds which are difficult to characterize with traditional Raman spectroscopy, such as pretreatment-free quantitfication of glucose in tears and whole blood [26] . In a human tear phantom we achieve 0.274 mM versus the clinical accuracy limitation of 0.278 mM; in a human whole-blood phantom we achieve 0.14 mM versus clinical accuracy threshold of 0.778 mM [15] . Accurate noninvasive glucose concentration measurements using photonic methods have, in the past, required significant sample and reagent preparation, such as SERS. This is not feasible where specialized facilities are impractical to implement. In this study we have exceeded aqueous glucose errors-of-detection previously reported using SERS (1.8 mM) by a significant margin, achieving 0.14 mM [26] . Our spectrum collection methods demonstrate stability and reproducibility during lengthy acquisition times, further advocating the use of LCW Raman over SERS in this application. The reproducibility provided by our microfluidic design does not compensate for inherent signal variance between biological samples -this issue is addressed with our analytical techniques. Our user-independent machine-learning algorithm sufficiently identifies optimal preprocessing and regression methods to compensate for variance and yield reliable accurate diagnostics for a range of biomarkers, such as lysozyme in artificial human tear samples. The accuracies we achieve, assisted by the modified indicating parameter presented here, could readily aid diagnostics for HSV, Sjögren's syndrome, and kerato-conjunctivitis [16] . Existing published studies which quantify lysozyme in aqueous fluids via DCDRS have achieved of errors-of-detection on the order of 28.6 µM or lower, similar in magnitude to our own [28] , [29] . DCDRS has shown to be advantageous over SERS in terms of reproducibility and stability, but not in magnitude of enhancement [29] . SERS can provide an enhancement upwards of 9 orders of magnitude versus unassisted Raman spectroscopy of aqueous samples, but the sample must be altered with nanostructures which renders it unusable for future study [14] . Our methods achieve the reproducibility of DCDRS and spectrum enhancement of SERS without the sample processing requirements. While this study has focused on two particular biofluids, performance can reasonably be extrapolated to other biological fluids which possess similar light-scattering properties, such as urine. Glucose levels in human urine between healthy and diabetic patients may vary from 2.78 mM to over 5.55 mM; by reasoning that tears and urine have similar turbidity, we can extrapolate our results from a human tear phantom to measure glucose with a sensitivity of 0.2 mM [30] . Glucose concentrations range from 0.01 mM to 5 mM in other biofluids such as sweat, saliva, and ocular fluids. Further refinement is necessary for our system to achieve confident glucose predictions on this scale; potential solutions include further extending exposure times. When considering alternate biofluids for constituent analysis, it is crucial to be aware of differences in light-matter interactions. Certain wavelengths may prompt a strong fluorescent response, but an appropriate pairing of excitation wavelength and photonic bandgap LCW (both of which our microfluidic device can readily adapt) promotes optimum characterization. To extend this device into field-deployable applications, two particular limitations must be addressed: with regards to applicability towards human fluids, and to the true limitation of analyte sensitivity. All artificial human biofluids in experiments described here are prepared from common stock solutions with a limited variety of constituents. In addition, we have normalized spectrum collection conditions across each data set. As such, the variances in our spectra do not reflect the range which we should expect from human biological samples. These experiments are a necessary foundation prior to testing with human biofluids; future implementations of this device will require testing on larger spectrum sets which do not come from the same data distribution, and which have a greater degree of variation in both measurement conditions and presence of other substances which may complicate the Raman spectrum of the analytes of interest. With regards to sensitivity, we argue that this does not reflect a limitation of our device. Artificial tear glucose sensitivity in our experiments is 1.4 mM; expected glucose concentrations in human tears vary substantially as a function of collection method and our experiments were designed to comply with one reported range. Enhanced sensitivities for alternate expected glucose ranges may be obtained by extending exposure times beyond those which we report. We have demonstrated system stability for acquisition times between three minutes and two hours on a tabletop vibration-isolated Raman system, with no changes in coupling into the LCW due to sample pumping around the fiber. Unwanted background and environmental interference is prevalent in portable Raman systems; our configuration is ideal for this scenario [31] . We expect a straightforward extension to interface the device described here with a portable spectrometer, satisfying the desire for portability and pretreatment-free spectrum collection. Reliable and reproducible diagnostics is achieved via our algorithm, successfully automating selection of the optimal number of PCs for a model and eliminating unsuitable preprocessing treatments. This achieves a tremendous reduction in the number of decisions which require human intervention. Our artificial tears data set contains 40 spectra, 1 which acts as the unknown spectrum and 39 of which form a crossvalidation training data set. This set forms a model containing up to 37 PCs for each possible preprocessing method. If we assess 28 different treatments, this results in 1036 possible unique predictive models. If even one preprocessing method is identified as unsuitable we reduce the number of decisions which require manual intervention from 1036 to just 27, eliminating 97.3% of cases. The architecture of the algorithm developed here lends itself to complete automation; continuous improvements in accuracy and sensitivity will follow as larger datasets for machine-learning are made available, thereby bringing the manual intervention to zero for fully automated diagnostics. Additional figures and tables are included in a separate supplementary information document. Tear Glucose Dynamics in Diabetes Mellitus All things equal? Heterogeneity in policy effectiveness against COVID-19 spread in chile From Diagnosis to Treatment: Recent Advances in Patient-Friendly Biosensors and Implantable Devices The Application of Lateral Flow Immunoassay in Point of Care Testing: A Review A flexible and wearable biosensor for tear glucose measurement Raman Spectroscopy of Nanoparticles Using Hollow-Core Photonic Crystal Fibers Photonic Crystal Fiber for Efficient Raman Scattering of CdTe Quantum Dots in Aqueous Solution Optofluidic jet waveguide enhanced Raman spectroscopy Highly Sensitive Broadband Raman Sensing of Antibiotics in Step-Index Hollow-Core Photonic Crystal Fibers Tear glucose levels in normal people and in diabetic patients Saliva specimen: A new laboratory tool for diagnostic and basic investigation Robustness of models developed by multivariate calibration. Part II: The influence of preprocessing methods A comparative study of Raman enhancement in capillaries Recent developments in optofluidic-assisted Raman spectroscopy Collection method dependant concentrations of some metabolites in human tear fluid, with special reference to glucose in hyperglycaemic conditions Lysozyme tear level in patients with herpes simplex virus eye infection A literature review and novel theoretical approach on the optical properties of whole blood Optical properties of fat emulsions Optimized preprocessing and machine learning for quantitative Raman spectroscopy in biology Evaluating Clinical Accuracy of Systems for Self-Monitoring of Blood Glucose Last Advances in Silicon-Based Optical Biosensors A wearable, cotton thread/paper-based microfluidic device coupled with smartphone for sweat glucose sensing Chemical concentration measurement in blood serum and urine samples using liquid-core optical fiber Raman spectroscopy Feasibility of measuring blood glucose concentration by near-infrared Raman spectroscopy Multicomponent blood analysis by near-infrared Raman spectroscopy Toward a Glucose Biosensor Based on Surface-Enhanced Raman Scattering Glucose determination in human aqueous humor with Raman spectroscopy Drop coating deposition Raman spectroscopy of protein mixtures Raman Detection of Proteomic Analytes Glucose Sensing for Diabetes Monitoring: Recent Developments Recent developments in handheld and portable optosensing-A review ACKNOWLEDGMENT This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant.