key: cord-1052318-bys0migd authors: Ma, Lijuan; Liu, Daihan; Du, Chenzhao; Lin, Ling; Zhu, Jinyuan; Huang, Xingguo; Liao, Yuan; Wu, Zhisheng title: Novel NIR modeling design and assignment in process quality control of Honeysuckle flower by QbD date: 2020-07-19 journal: Spectrochim Acta A Mol Biomol Spectrosc DOI: 10.1016/j.saa.2020.118740 sha: 5f1287287fcf51557000dad9e5104f73073ba078 doc_id: 1052318 cord_uid: bys0migd Honeysuckle flower is a common edible-medicinal food with significant anti-inflammatory efficacy. Process quality control of its ethanol precipitation is a topical issue in the pharmaceutical field. Near infrared (NIR) spectroscopy is commonly used for process quality analysis. However, establishing a robust and reliable quantitative model of complex process remains a challenge in industrial applications of NIR. In this paper, modeling design based on quality by design concept (QbD) was implemented for the ethanol precipitation process quality control of Honeysuckle flower. According to the 56 models' performances and 25 contour plots, quadratic model was the best with R(adj)(2) increasing from 0.1395 to 0.9085, indicating the strong interaction among spectral pre-processing methods, variable selection methods, and latent factors. SG9 and CARS was an appropriate combination for modeling. Furthermore, spectral assignment method was creatively introduced for variable selection. Another 56 models' performances and 25 contour plots were established. Compared with the chemometric variable selection method, spectral assignment combined with QbD concept made a higher R(pre)(2) and a lower RMSEP. When the latent factors of PLS was small, R(pre)(2) of the model by spectral assignment increased from 0.9605 to 0.9916 and RMSEP decreased from 0.1555 mg/mL to 0.07134 mg/mL. This result suggests that the variable selected by spectral assignment is more representative and precise. This provided a novel modeling guideline for process quality control in PAT. Honeysuckle flower is a common edible-medicinal food with significant antiinflammatory efficacy [1] . It not only has a specific efficacy of detoxification, but also could be used as a heat-clearing drink. It has even been developed into products, such as Ethanol precipitation is a characteristic and significant process of Honeysuckle flower production, which calls for a precise quality control method. Off-line quality control methods have hysteresis leading to an insecure and unpredictable production quality [2] . To solve this issue, process analytical technology (PAT) based on chemometrics is proposed to quality control, which is especially applicable in case of complex processes [3] . Currently, NIR spectroscopy is the most commonly used PAT process analyser in pharmaceutical technology because of non-destructive measurements and real-time moniroring in process [4, 5] . It is especially suitable for a complex production, which needs process quality control [6, 7] . Wu et al. used NIR spectroscopy to monitor the concentration distribution of amino acids in the hydrolysis of Cornu Bubali [2] . Xu et al. proposed a multi-phase and multivariate statistical process control strategy for alcohol precipitation of Honeysuckle flower. [8] . Laub-Ekgreen et al. applied NIR spectroscopy to rapid and non-destructive salt concentration monitoring in the pickling process of squid [9] . Oxidative damage of pork myofibrils during frozen storage has been monitored by the NIR hyperspectral imaging [10] . In the application of NIR to process quality control, there is an essential factor, quantitative model. To establish an accurate NIR model, the most important part is the J o u r n a l P r e -p r o o f 5 selection [14] is another CMP to extract useful information for modeling. Bi et al. proved that, compared with the full spectra, the NIR model established by optimal spectra achieved better performance [15] . Yuan et al. indicated that the discriminant models were improved and simplified significantly by variable selection [16] . In addition, a suitable latent factor is also a CMP to avoid over-fitting and under-fitting for modeling [17] . In classical modeling, the CMPs were optimized step-by-step. Genetic algorithm is a commonly used method to optimize the spectral pre-processing method or variable selection method [18] . Rosas et al. compared three spectral pre-processing methods for NIR process optimization of a multicomponent formulation [19] . Wu et al. used a novel method to optimize the model performance of Partial least square (PLS), interval PLS (iPLS), backward interval PLS (BiPLS) and moving window PLS (MWPLS), and point out that with different evaluation indicator, the optimal method is diverse [20] . Pan et al. found that BiPLS was the appropriate variable selection method for establishing the particle size model rather than synergy iPLS (SiPLS) [21] . Nevertheless, the established models optimized step-by-step ignored the interaction among modeling parameters and were not the best in overall situation. An integrated approach was introduced to optimize several modeling parameters simultaneously based on genetic algorithm [22, 23] . Similarly, a systematic modeling method was put up by using a processing trajectory to select modeling parameters [24] [25] [26] . Although more valid than before, this method still needs to establish a lot of models laboriously and could not demonstrat the interaction among the parameters. Hence, modeling design is necessarily applied here to simplify the process and establish a overall optimal model. To implement modeling design, Quality by Design (QbD) concept is a good choice [27] , which was introduced in chemical manufacturing control in 2004. In the ICH Q8 guideline, QbD is defined as a systematic approach to development that begins with predefined objectives and emphasizes product and process understanding, as well as J o u r n a l P r e -p r o o f 6 analytical method for Huanglian [31] . Similarly, it could aslo be applied to optimize NIR CMPs by a design of modeling evaluation procedures. However, the chemometrics variable selection could not discern special components in samples directly. Lee et al. argued that the different variable selection methods performed wide variability in their capabilities to identify the consistent subset of variables [32] . Du et al. also demonstrated that different chemometrics selection methods led to distinct characteristic wavelengths and bands [33] . NIR spectral assignment based on the interrelation between spectra and structure is efficacious to improve model performance and interpretation [34, 35] . Chlorogenic acid is the main medicinal component of honeysuckle [36, 37] . It is also used as the quality control Therefore, a design of NIR modeling evaluation procedures was implemented by Doptimal design method according to QbD concept. Futhermore, getting the characteristic band of chlorogenic acid [38], the special component of Honeysuckle flower, by spectral assignment, this paper creatively combined this characteristic band with modeling CMPs designed by D-optimal to establish a more precise and reliable model. These also provided a reference method for modeling design and the establishment of global optimal models in PAT of edible-medicinal food. The ethanol precipitation process of Honeysuckle flower was implemented according to a specific production process parameters of a certain enterprise,which was performed J o u r n a l P r e -p r o o f 7 in a 3 L glass reactor using an agitator at constant speed of 500 rpm. Ethanol was pumped into the reactor from the ethanol tank with a flow rate of 75 mL/min. Samples were collected during the alcohol precipitation process at 5 min intervals. 60 samples were collected in this research. Sample of 1.5 mL was drawn by a pipette gun each time. The NIR spectrum was recorded immediately after the sample collection had been completed, which was to ensure that the collected spectrum was consistent with the obtained sample. The on-line NIR spectra of this alcohol precipitation were collected by the transmission way for 16 times of each sample, setting resolving power as 2500 μm and scanning range as 1.0 μm -2.5 μm. Quantitative determination by high performance liquid chromatography (HPLC) of chlorogenic acid in Honeysuckle flower was implemented immediately after online NIR sensor measurement. A Waters 2695 HPLC system was used with an auto-sampler, a column temperature controller, and a diode-array detector (DAD) (SHIMADZU Corporation, Japan). Samples were separated on a Diamonsil C18 column (250 mm × 4.6 mm; 5 μm particles; Dikma) using acetonitrile and water containing 0.4% phosphoric acid (13: 87, v/v) as the mobile phase. The separation parameters have been set, column temperature as 30 ℃; detection wavelength as 327 nm; flow rate as 1.0 mL/min; sample size as 10 μL. The spectral pre-processing methods, variable selection methods, latent factors of variable selection, and latent factors of PLS model were determined as the CMPs of for D-optimal design. Spectral pre-processing method was taken as a categorical variable, including raw, standard normal variate (SNV), Savitzky-Golay smoothing with 9 points (SG9), SG9 combined with first derivative spectra (SG9+1D), and SG9 combined with second derivative spectra (SG9+2D). Similarly, variable selection method contains of variable importance in projection (VIP), uninformative variable elimination (UVE), selection ratio (SR), moving window partial least square (MWPLS), and competitive adaptive reweighted sampling (CARS). Moreover, In order to avoid over-fitting effect and under-fitting effect, latent factors of variable selection and latent factors of PLS were both set as the numerical discrete variable including five levels from 3 to 11. For the optimization of NIR model, the CQAs were determined as coefficient of determination of prediction set (R pre 2 ) and root mean square error of prediction J o u r n a l P r e -p r o o f 8 (RMSEP). In practical applications, the lower the RMSEP value, the more robust and accurate the models will be, while R pre 2 is opposite of RMSEP. Where, N is the number of validation set, y i represents the reference value of the sample i, ̂ represents the prediction value of the sample i, and ̅ is the mean of the reference value of the validation set. Kennard-stone (K-S) (PCA-Score) method was used to divide the sample set into a calibration set and validation set with a ratio as 4:1. D-optimal design was implemented by Design Expert 8.0. In this paper, D-optimal design in this research contains four factors, two nominal factors and two discrete factors. Each factor contains five levels. These parameters were shown in Table S Multifactor constraints like that pictured above must be entered as an equation taking the form of: where β L and β U are lower and upper limits, respectively. In the previous study, using DMSO Design-Expert (Stat-Ease, USA). The NIR models was established by ChemDataSolution (Dalian ChemDataSolution Information Technology Co. Ltd, China). All the figures were drawn by SigmaPlot 12.5 (Systat Software, USA). The on-line NIR raw spectra Table 1 indicated the influence of four CMPs on modeling were all significant. All these demonstrated that D-optimal was an appropriate method for NIR modeling design. And quadratic model was suitable for the evaluation of models by these two CQAs. Furthermore, 25 contour plots developed by the combinations of five preprocessing methods and five variable selection methods were exhibited in Fig.2 . The darker the color in the figure, the closer it is to the set maximum value. The lighter the color, the closer it is to the set minimum value. As seen, the model performance in the upper-left corner ( Fig.2 (a1) ) was the best while one in the bottom-right corner (Fig.2 (e5) ) was the worst. As shown in Fig.2 , the model performance was becoming better with the increase of the complexity of the variable selection method, from SR to CARS. Oddly the trend was reversed for the raw spectra. This was because raw spectra contains a lot of noise which was often used as residuals for modeling by complex variable selection method. So the model performance was worse form CARS to SR. According to the 25 contour plots, it can be seen that spectral pre-processing method and variable selection method both had signifficant synergistic influence on model performance. For example, the contour in Fig.2 (a3) demonstrated that spectral preprocessing method was an important factor for modeling while variable selection method was insignificant. However, the result shown in Fig.2 (e3) was just the opposite. Moreover, when using CARS as the variable selection method, no matter which preprocessing method was chosen, a better model could be obtained under a smaller latent factors, especially for SG9. These mean SG9 and CARS was an appropriate combination for modeling. *P refers to the spectral pre-processing method, VS refers to the variable selection method. The premise of developing design space is suitable target range of the CQAs. According to the established models, the ranges were set as R pre 2 > 0.990 and RMSEP < Table S .3. All CQAs of the points inside the space were better than those outside the space, which indicated the established spaces were reliable. This proves again that there was interaction among modeling parameters, and the modeling path was multiple rather than unique. Twenty-five design spaces developed by P (raw, SNV, SG(9), 1D+SG(9), 2D+SG(9)) and variable selection method (VIP, UVE, SR, MWPLS, CARS). *P refers to the spectral pre-processing method, VS refers to the variable selection method. In order to established a pertinent PLS model, spectral assignment was introduced to select variables for PLS modeling, insted of chemometric variable selection method. Chlorogenic acid is the main active ingredients of Honeysuckle flowe. In our previous research, the characteristic band of chlorogenic acid was 1650-1800 nm [36] . This band were selected to establish PLS models combined with the spectral pre-process methods (Fig.4) . As we all known, the smaller the latent factors of PLS, the better the applicability of the model was. Therefore, the variables selected by spectral assignment was more precise and representative, so that the model established by spectral assignment combined with QbD concept was more applicable and robust. According to the model performance established by spectral assignment method combined with QbD concept, the optimal combination of parameters was using SG(9)+2D as preprocessing method and selecting 3 as latent factors of PLS. Then PLS model was established and exihited in Fig.5 (a) . The R pre 2 of this model was 0.9916 and the RMSEP was 0.07134 mg/mL. While the performance of model based on chemometric variable selection method was worse than spectral assignment method when the modeling parameter combination was same ( Fig.5 (b) ). Specifically, the R pre The authors declare no competing interests. NIR spectroscopy as a process analytical technology (PAT) tool for monitoring and understanding of a hydrolysis process Overview of PAT process analysers applicable in monitoring of film coating unit operations for manufacturing of solid oral dosage forms Research and Application Progress of Near Infrared Spectroscopy Analytical Technology in China in the Past Five Years Process Analytical Chemistry A Study on Model Performance for Ethanol Precipitation Process of Lonicera japonica by NIR Based on Bagging-PLS and Boosting-PLS algorithm Application of Online Near Infrared for Process Understanding of Spray-Drying Solution Preparation NIR analysis for batch process of ethanol precipitation coupled with a new calibration model updating strategy Nondestructivemeasurement of salt using NIR spectroscopy in the herring marinating process Heterospectral two-dimensional correlation analysis with near-infrared hyperspectral imaging for monitoring oxidative damage of pork myofibrils during frozen storage Progress and application of spectral data pretreatment and wavelength selection methods in NIR analytical technique Influence of data preprocessing on the quantitative determination of the ash content and lipids in roasted coffee by near infrared spectroscopy Review of the most common preprocessing techniques for near-infrared spectra Variables selection methods in near-infrared spectroscopy A local pre-processing method for near-infrared spectra, combined with spectral segmentation and standard normal variate transformation Application of variable selection in the origin discrimination of Wolfiporia cocos (FA Wolf) Ryvarden & Gilb. based on near infrared spectroscopy Preventing over-fitting in PLS calibration models of near-infrared (NIR) spectroscopy data using regression coefficients Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data NIR spectroscopy for the in-line monitoring of a multicomponent formulation during the entire freeze-drying process A novel model selection strategy using total error concept Near infrared spectroscopy model development and variable importance in projection assignment of particle size and lobetyolin content of Codonopsis radix An integrated approach to the simultaneous selection of variables, mathematical pre-processing and calibration samples in partial least-squares multivariate calibration Parallel genetic algorithm co-optimization of spectral preprocessing and wavelength selection for PLS regression Optimization of Parameter Selection for Partial Least Squares Model Development Rapid prediction of total petroleum hydrocarbons concentration in contaminated soil using vis-NIR spectroscopy and regression techniques An industrial perspective on the design and development of medicines for older patients Nanosystem trends in drug delivery using quality-bydesign concept Establishment and reliability evaluation of the design space for HPLC analysis of six alkaloids in Coptis chinensis (Huanglian) using Bayesian approach Reproducibility, complementary measure of predictability for robustness improvement of multivariate calibration models via variable selections Research on modeling method to analyze Lonicerae Japonicae Flos extraction process with online MEMS-NIR based on two types of error detection theory Subtractive-FTIR spectroscopy to characterize organic matter in lignite samples from different depths A review of band assignments in near infrared spectra of wood and wood components Salinity Stress Is Beneficial to the Accumulation of Chlorogenic Acids in Honeysuckle (Lonicera japonica Thunb J o u r n a l P r e -p r o o f