key: cord-0680547-2l02uqfr
authors: Pan, Liangrui; Pipitsunthonsan, Pronthep; Daengngam, Chalongrat; Chongcheawchamnan, Mitchai
title: Identification of complex mixtures for Raman spectroscopy using a novel scheme based on a new multi-label deep neural network
date: 2020-10-29
journal: nan
DOI: nan
sha: b98434c59d4161653e30e18180870f9788deddeb
doc_id: 680547
cord_uid: 2l02uqfr

With noisy environment caused by fluoresence and additive white noise as well as complicated spectrum fingerprints, the identification of complex mixture materials remains a major challenge in Raman spectroscopy application. In this paper, we propose a new scheme based on a constant wavelet transform (CWT) and a deep network for classifying complex mixture. The scheme first transforms the noisy Raman spectrum to a two-dimensional scale map using CWT. A multi-label deep neural network model (MDNN) is then applied for classifying material. The proposed model accelerates the feature extraction and expands the feature graph using the global averaging pooling layer. The Sigmoid function is implemented in the last layer of the model. The MDNN model was trained, validated and tested with data collected from the samples prepared from substances in palm oil. During training and validating process, data augmentation is applied to overcome the imbalance of data and enrich the diversity of Raman spectra. From the test results, it is found that the MDNN model outperforms previously proposed deep neural network models in terms of Hamming loss, one error, coverage, ranking loss, average precision, F1 macro averaging and F1 micro averaging, respectively. The average detection time obtained from our model is 5.31 s, which is much faster than the detection time of the previously proposed models.

Raman spectroscopy is a fast, non-invasive, label-free and no pretreatment technology, which can display molecular fingerprints according to vibration information [1] . Since Raman spectroscopy is insensitive to water, hence it has been widely used in several applications such as chemistry [2] , materials [3] , physics [4] , polymer [5] , biology [6] , medicine [7] and geology [8] . Identification of organic chemistry using Raman spectroscopy is achieved by an interaction of molecular structure with infrared spectrum. The spectrum characteristics, which are magnitudes of Raman shift, peak intensity and peak shape, are the vital basis for identifying chemical bonds and functional groups. There have been several research works on Raman spectroscopy technology; for example surface-enhanced Raman spectroscopy [9] , high-temperature Raman spectroscopy [10] , resonance Raman spectroscopy [11] , confocal micro Raman spectroscopy [12] , Fourier transform Raman spectroscopy [13] , to name a few. These techniques promote the application of Raman spectroscopy in various fields.

One of the disadvantages of Raman spectroscopy is that it is easily interfered by fluorescence noise.

Once the sample under test responds with fluorescence, the Raman spectrum will be swamped by wideband spectrum of fluorescence noise. This causes the desired Raman spectrum hardly to be detected. Secondly, the sensitivity of Raman spectroscopy is low. There are several unavoidable noise such as shot noise, dark current noise and readout noise [14] , [15] in a Raman detector implemented with charge coupled devices (CCD) and semiconductor devices. Therefore, before using Raman spectroscopy, a preprocess algorithm such as baseline correction is needed to reduce the interference of these noises and highlight the molecular peak characteristics. Two baseline correction approaches which are based on hardware and software designs have been proposed [16] - [18] .

The hardware design approach needs an instrument modification, hence, is unpopular. The software design approach, on the other hand, is based on signal processing technique. With no additional hardware installation and modification, it is the low cost approach and thus gains more interest.

In recent decades, some researchers proposed [38] .

In [39] , rapid recognition of mixtures in complex environments was realized by establishing a fast 

The data collection process is shown in Fig. 1 . We set up the experiment for collecing spectra in the temperature-controlled room at 28℃. A dark room was set up such that no any light interfered during our measurement. BIM-6002a Raman spectrometer was used to collect the Raman spectra. From the specfications of the spectrometer, the signal-to-noise ratio (SNR) of the channel is 600:1 and the laser wavelength is 785nm.

Several complex mixture samples were prepared. was then placed in the Raman spectrometer and the spectra were measured and collected.

To preprocess noisy Raman signal to denoise and highlight the molecular spectrum, several researchers commonly use the baseline correction method to remove the fluorescence noise. Though many baseline correction algorithms were proposed, these algorithms can denoise within a certain level [16] - [18] , [40] . In this paper, we overcome this by Theoretically, WVD provides the best energy concentration and has many ideal mathematical characteristics [42] . CWT, on the other hand, expands the function () ft of any 2 () LR space under the wavelet basis, which is defined as: After feature extraction, all features are reshaped into vectors by using the full connection layer. Then the vectors are multiplied to reduce their dimensions following with the Softmax layer which is used for output [48] . This method not only changes the network parameters, but also causes overfitting problem. However, the global pooling layer is considered as a new technique to replace the full connection layer, which has a great effect on reducing parameters and reducing the risk of overfitting. In the experiment, the global average pooling is used instead of the full connection layer, which directly averages the entire feature map, and then input it into the Sigmoid layer to get the probability of tags and mappings [48] . By replacing the black box operation of the full connection layer, the network parameters are significantly reduced, hence avoiding overfitting problem.

Multi-label learning is to map a sample and a set of tags to an instance. Suppose The task of multi-label learning is to learn a function (1) Macro-averaging:

(2) Micro-averaging:

In the case-based indicators, four classification measures can be defined as follows:

(1) Hamming loss evaluates the number of times instance tags which are misclassified. Predicting tags that do not belong to an instance or do not predict tags that belong to the instance are counted.

where V is the symmetry difference between two sets. It is noted that for all instances | | 1 i Y  . A multi-label system is actually a multi-class single label system, while Hamming loss is usually 2/Q times of the classification error.

(2) One-error calculates the proportion of instances where the top-level tags are not in the set of related tags. One-error can be intepreted as the score of evaluating the reverse tag pair. 

In this experiment, we use the DNN models which have won in the image recognition competition in recent years, mainly VGG16, VGG19, ResNet50, MobileNetV2, DenseNet121, InceptionResNetV2

were investigated and compared with our model. Experiments have proved that these models perform well in the classification task of the transfer learning. However, due to the different depth and structure of the network algorithm, the classification effect of the trained models on

Raman spectrum scale map of mixture is also different. So it is necessary to compare and discuss these models. After each epoch, the order of data and tags is disordered again. Secondly, we use the Early-stopping function to terminate the training and save the trained algorithm model when the loss value does not change on two epochs. In this paper, all models were trained and tested on Tensorflow.2.3-GPU.

Seven trained DNN models are investigated in the experiment. We put the test data set (a total of 700 Raman spectrum scale maps) into the models to test. The evaluation parameters of each algorithm are plotted in Fig. 4 

ROC curve is a comprehensive indicator reflecting the continuous variables of sensitivity and specificity and reveals the relationship between sensitivity and specificity by composition method [49] . It calculates a series of sensitivities and specificities by setting different thresholds for continuous variables [49] . The greater the area under the curve, the higher the accuracy of diagnosis. On the ROC curve, the point closest to the left above the coordinate map was the critical value of sensitivity and specificity. False-positive rate (FPR) on the horizontal axis indicates that the larger the FPR value is, the more negative classes are predicted in the positive class. The true positive rate (TPR) on the vertical axis indicates that the larger the value of TPR, the more actual positive classes in the predicted positive class.

We discuss the relationship between each model for better sensitivity and specificity. From the result part, we find that MDNN model is much better than other existing models in the actual test. 

In this subsection, the detection times of the proposed model and other compared models are reported. We prepared 700 different kinds of moisy Raman spectra under 20-30 dB SNR. It is shown in Fig. 6 

loss, one error, coverage, ranking loss, average precision, F1 macro averaging and F1 micro averaging. In the ROC index, the measurement value of MDNN model in detecting the first and third kinds of substances is basically similar, but the measurement value of detecting the second kind of substances is significantly higher than other models. Therefore, our model is better than other models. In terms of detection time, our proposed model predicts the Raman spectra of 700 mixtures at 5.3132 seconds, which is much faster than the detection speed of other models. This scheme is of great significance for the detection of mixtures of classified chemicals and paves the way for the combination of Raman spectroscopy and artificical intelligence technology.

Rapid and non-invasive screening of high renin hypertension using Raman spectroscopy and different classification algorithms

Biological imaging of chemical bonds by stimulated Raman scattering microscopy

Ratiometric Surface Enhanced Raman Scattering Immunosorbent Assay of Allergenic Proteins via Covalent Organic Framework Composite Material Based Nanozyme Tag Triggered Raman Signal ‗Turn-on' and Amplification

Physical Layer Performance of Multi-Band Optical Line Systems Using Raman Amplification

Intrinsic Raman signal of polymer matrix induced quantitative multiphase SERS analysis based on stretched PDMS film with anchored Ag nanoparticles/Au nanowires

Resolving the individual contribution of key microbial populations to enhanced biological phosphorus removal with Raman-FISH

Inpatient Use of Ambulatory Telemetry Monitors for COVID-19 Patients Treated With Hydroxychloroquine and/or Azithromycin

Raman spectroscopy as a tool to determine the thermal maturity of organic matter: Application to sedimentary, metamorphic and structural geology

Surface-Enhanced Raman Spectroscopy for Bioanalysis: Reliability and Challenges

Continuous cell sorting in a flow based on single cell resonance Raman spectra

Fast Confocal Raman Imaging Using a 2-D Multifocal Array for

Dietary Fiber-Induced Changes in the Structure and Thermal Properties of Gluten Proteins Studied by Fourier Transform-Raman Spectroscopy and Thermogravimetry

Noise and background removal in Raman spectra of ancient pigments using wavelet transform

A Novel Pre-Processing Algorithm Based on the Wavelet Transform for Raman Spectrum

A novel baseline-correction method for standard addition based derivative spectra and its application to quantitative analysis of benzo(a)pyrene in vegetable oil samples

Baseline correction for Raman spectra using an improved asymmetric least squares method

Baseline correction using asymmetrically reweighted penalized least squares smoothing

A Fast Classification Scheme in Raman Spectroscopy for the Identification of Mineral Mixtures Using a Large Database With Correlated Predictors

Classification of Hazardous Chemicals with Raman Spectrum by Convolution Neural Network,‖ in 2020 13th International Conference on Human System Interaction (HSI)

Rapid and Low-Cost Detection of Thyroid Dysfunction Using Raman Spectroscopy and an Improved Support Vector Machine

Confocal Raman Sensing Based on a Support Vector Machine for Detecting Lung Adenocarcinoma Cells

Identification of Listeria Species Using a Low-Cost Surface-Enhanced Raman Scattering System With Wavelet-Based Signal Processing

Malicious Software Classification Using VGG16 Deep Neural Network's Bottleneck Features,‖ in Information Technology -New Generations

Exposing Computer Generated Images by Eye's Region Classification via Transfer Learning of VGG19

16th IEEE International Conference on Machine Learning and Applications (ICMLA)

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Fd-Mobilenet: Improved Mobilenet with a Fast Downsampling Strategy,‖ in 2018 25th IEEE International Conference on Image Processing (ICIP)

Implementing Efficient ConvNet Descriptor Pyramids

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

A Deep Multi-Modal CNN for

Label Co-Occurrence Learning With Graph Convolutional Networks for

Multi-Label Chest X-Ray Image Classification

Multilabel Aerial Image Classification

Multi-Label Remote Sensing Scene Classification Using Multi-Bag Integration

Inducing Hierarchical Multi-label Classification rules with Genetic Algorithms

SVM based multi-label learning with missing labels for image annotation,‖ Pattern Recognition

Noisy multi-label semi-supervised dimensionality reduction

Improving multi-label classification with missing labels by learning label-specific features

Deep learning-based component identification for the Raman spectra of mixtures

Recognition of big data mixed Raman spectra based on deep learning with smartphone as Raman analyzer,‖ ELECTROPHORESIS, p. elps

Joint Baseline-Correction and Denoising for Raman Spectra

Time-frequency feature representation using energy concentration: An overview of recent advances

Cross-terms reduction in the Wigner-Ville distribution using tunable-Q wavelet transform,‖ Signal Processing

Synthetic Minority Over-sampling Technique,‖ jair

Taking the Mystery out of the Infamous Formula, ‗SNR = 6.02N + 1.76dB,' and Why You Should Care

Error bounds for approximations with deep ReLU networks,‖ Neural Networks

Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

Classifier chains for multi-label classification

A lazy learning approach to multi-label learning

A Review on Multi-Label Learning Algorithms

A novel approach for learning label correlation with application to feature selection of multi-label data

Liangrui Pan was born in Anhui, China

He is pursuing a master's degree in electrical engineering at Prince Songkla University in Thailand in 2019 and is a Member of IEEE

His research interests are machine learning, deep learning, and pattern recognition

Pronthep Pipitsunthonsan received a bachelor's degree from Prince of Songkla University in 2010 and a master's degree in 2017. He is currently pursuing a doctorate in computer engineering

UK in 2006, and a Ph.D. in Physics from Virginia Tech

Currently, he is working as an assistant professor in the Department of Physics, Faculty of Science, Prince of Songkla University. His research interests involve nonlinear optical properties of nanomaterials, photonics, and standoff Raman spectroscopy

SM'98) was born in

He received a B.Eng. degree in telecommunication from the King Mongkut's Institute of Technology Ladkrabang, Bangkok, in 1992, a M.Sc. degree in communication and signal processing from Imperial College

an Associate Professor. His current research interests include deep learning algorithm and big data applied for agricultural applications and smart cities