key: cord-0060954-e7aqpg7r authors: Yan, Xingwei; Bi, Shuhui; Shen, Tao; Ma, Liyao title: Prediction Analysis of Soluble Solids Content in Apples Based on Wavelet Packet Analysis and BP Neural Network date: 2020-06-13 journal: Multimedia Technology and Enhanced Learning DOI: 10.1007/978-3-030-51103-6_31 sha: 5eb93f769f6526070753d9b3208cd068def1d928 doc_id: 60954 cord_uid: e7aqpg7r Considering Fuji apple, the relationship between the near infrared spectrum and the soluble solids content (SSC), which is one of the important indexes to measure the internal quality of apple, is studied in this paper. In order to reduce the computational complexity and to improve the accuracy of modeling, this paper adopts the wavelet packet threshold denoising method for spectral spectrum processing, and uses the method of wavelet packet analysis (WPA) to filter the characteristic wavelength of the spectrum. Moreover, a prediction model of SSC is proposed based on BP neural network due to its characteristics of anti-noise, anti-interference, strong nonlinear conversion ability and the good capacity in handling nonlinear measured data with uncertain causality. Finally, the simulation results show that wavelet packet analysis can not only reduce the calculation of modeling variables, but also Improve modeling accuracy of the BP neural network model. The proposed method can make a better prediction of the SSC of apple. China is a major fruit production country. After years of development, China's apple industry has ranked among the major countries in the development of the world's fruit industry. China's apple output, output value and planting area rank among the top in the world, and apple industry has developed into one of the pillar industries of China's agriculture. Against the background of fierce competition in the domestic market, China's apple industry still has problems such as relatively excessive market, weak market competitiveness and less foreign exchange from export Shu (2018) . The main reason lies in the lag of apple postpartum quality inspection and safety commercialization. Therefore, improving the detection level of the internal quality of Chinese apples can not only promote the research and development of Chinese apple sorting equipment, but also have positive significance for improving the international competitiveness of Chinese apples. NIRS detection technology has the advantages of rapid, nondestructive and environmental protection, and has become the fastest developing and most widely used modern food analysis and detection technology Chen (2019). Shang Jing uses near-infrared spectroscopy to identify apple varieties. The results show that the rapid and non-destructive identification of apple varieties can be achieved by using near-infrared spectroscopy Shang (2019). Guo Zhiming et al. corrected the intensity of hyperspectral images, it predicted the sugar content and distribution of apples quickly and without loss Guo (2015) . Sanaz Jarolmasjed et al. used spectroscopy to classify healthy apples and bitter apples, and proved that near-infrared spectroscopy can be used as an indicator of apple bitter nuclear development Sanaz (2017) . The near-infrared spectrum obtained by instrument scanning will inevitably have noise interference, which will affect the accuracy of spectral analysis. In this paper, we study the optimization problem of modeling the SSC of red Fuji apples. Wavelet analysis was used to pretreat the spectrum and feature wavelength screening. The robustness and applicability of the prediction model of nearinfrared spectroscopy were improved by establishing a neural network model. Experimental Materials. In this study, Red Fuji apples from Yantai Qixia were selected, and 200 apples without defects and damage were selected. Among them, 130 were randomly selected as the correction set, and the remaining 70 were used as prediction sets, and they were stored in a 0 • cold storage. Before the experiment, it was taken out of the cold storage two hours in advance, so that the sample temperature was consistent with the laboratory temperature, and the experimental medium temperature was basically unchanged. The near-infrared spectrum of the sample was acquired by an Antaris II Fourier transform near-infrared spectrometer. From Fig. 1 , the InGaAs detector was used, and the integrated ball diffuse reflection acquisition method was used to set the number of scans of the sample to 32 times' resolution is 8 cm −1 , the collection range is 4000-10000 cm −1 , and 1557 variables are obtained for each spectrum. At the time of sampling, three different positions of the equator of the apple were collected, take the average as the experimental spectral data. After the spectrum acquisition is completed, the soluble solid content is measured at the position of the collected spectrum. The Japanese Atago Brix meter is used to peel the apple, crush the pulp and take 1-2 drops of juice, and drop it in the center of the prism plane to read. The SSC in apples, the average of the soluble solids content of each of the three spectral collection sites of apples was used as a reference value for the apple sample. The near-infrared spectrum acquisition process may be affected by various factors such as the state of the spectrometer and the detection conditions, which leads to the noise interference in the spectrum affecting the accuracy of the modeling. Therefore, it is necessary to preprocess the acquired spectrum. In engineering applications, the noise signal is usually a high-frequency signal, and the low-frequency signal that is relatively stable is a real signal. Therefore, the denoising process mainly has the following steps: (1) Import raw spectral data. (2) Wavelet decomposition of the acquired spectral signal. Specifically, it is divided into the following steps: (a) Select the order of the wavelet function and the wavelet function; (b) Determine the wavelet decomposition scale; (c) Select reasonable parameters for wavelet packet transform. (3) Select the corresponding threshold quantization method for the high and low frequency coefficients of wavelet decomposition. (4) Wavelet reconstruction. Wavelet reconstruction of the decomposition coefficients of the optimal wavelet packet base and the wavelet packet coefficients after threshold quantization. As an improvement of wavelet analysis technology, wavelet packet analysis uses a multi-scale analysis tower algorithm to decompose each frequency band of the signal. Each layer of wavelet after decomposition contains all frequencies of the signal, which can completely reproduce the original signal Xiong (2005) . The wavelet packet decomposition tree is shown in Fig. 2 . Commonly used to measure signal denoising effects are: (1) Root mean square error (RMSE): Where N is the size of the signal, X 0 is the original signal, and ∼ X is the signal after the wavelet packet is denoised. The smaller the RMSE, the higher the approximation of the denoised signal and the original signal, and the better the effect. (2) Signal to noise ratio (SNR): SN R = 10 * log 10 * (power signal /power noise ) power signal is the real data energy, power noise is the energy of the noise, and the higher the signal-to-noise ratio, the better the denoising effect. The premise of wavelet analysis is to choose the appropriate wavelet basis function. After experimental comparison, this paper adopts the db4 wavelet decomposition widely used in engineering to perform three-layer decomposition. How to perform threshold quantization is also the key to the denoising quality of the signal. The spectral denoising is performed by the four thresholds of sqtwolog, minmaxi, rigrsure and heursure respectively. From Table 1 , we can see its denoising effect. In general, the larger the SNR and the smaller the RMSE, the better the denoising effect. From Table 1 , it can be seen that the sqtwolog denoising effect is better than the other three threshold selection methods, so the sqtwolog threshold principle is used in this paper. The comparison chart before and after spectral preprocessing is shown in Fig. 3 . A partial effect diagram of one of the spectra is shown in Fig. 4 . The blue spectrum is the original spectrum, and the red spectrum is the spectrum after denoising. Full-spectrum modeling, and some spectral regions lack correlation with sample properties, and modeling results are poor. The variable group with the lowest information redundancy is selected from the full-band spectral information to ensure that the selected feature band has the least collinearity, which can greatly reduce the complexity in the model building process. The wavelet transform is used to screen the characteristic wavelength of the sample spectrum. After the wavelet decomposition, the coefficients of each node contain some details of the signal. In this paper, the near-infrared spectral signal of the apple is collected by using the Matlab wavelet packet analysis toolbox. The signal is decomposed in three layers and the spectral signals are analyzed in each frequency band. Calculate the coefficient standard deviation of the spectrum in the band at each wavelength point. The larger the standard deviation is, the larger the dispersion is, and the wavelength corresponding to the position where the coefficient dispersion of each band is the largest. Figure 5 shows the wavelengths screened based on wavelet packet analysis. Using the selected 24 characteristic wavelengths for modeling analysis greatly reduces the complexity of modeling. Artificial neural network is suitable for processing non-linear measurement data with uncertain cause and result relationship. It simulates the working principle of neuron to build a model. It has the characteristics of anti-noise, anti-interference and strong nonlinear conversion ability. As shown in the Fig. 6 , the BP neural network consists of three parts: the input layer, the hidden layer, and the output layer. It can realize arbitrary nonlinear mapping of input and output subtraction, and has good nonlinear mapping approximation ability and predictive ability Li (2019). The model is comprehensively evaluated by the correlation coefficient R between the measured value and the predicted value of the soluble solid content and the predicted root mean square error (RMSEP). The calculation formula is as shown in Eqs. (3) and (4): Where y i andŷ i are the measured and predicted sets of the i th sample in the sample set (including the training set and the prediction set), respectively, and y i is the average of the measured values of all samples in the sample set. Where y i andŷ i are the measured and predicted sets of the i th sample in the prediction set, respectively, and n is the number of samples in the prediction set. The collected apple spectrum has a total of 1557 wavelength points. Using all collected spectral points to build a model will greatly increase the amount of calculation, and the lack of correlation between some spectral bands and the nature of the sample, resulting in poor modeling results. Figure 7 shows the full spectrum modeling forecast result graph. In theory, increasing the number of layers in the network can reduce the model's error and improve the accuracy of the network, but it will complicate the network structure, increase the training time, and reduce the training efficiency. Generally, we first consider designing a hidden layer. When we need to improve the accuracy, we can achieve this by increasing the number of nodes in the hidden layer. Therefore, this paper designs a hidden layer and builds a threelayer BP network model. The number of nodes in the hidden layer has a great influence on the modeling effect. Too much will make the learning time too long and the generalization ability will decrease. Too little will make the learning time shorter and the fault tolerance performance of the network will decrease. Generally speaking, we choose a network with different numbers of hidden layer nodes to train the sample set. The smaller the error, the better the network model. The number of hidden layer nodes is generally determined based on the empirical formula (5). Where m is the number of input layer nodes, n is the number of output layer nodes, and a is a constant of 1-10. A three-layer BP neural network was selected to establish a prediction model for soluble solids content in apples. The 24 wavelength points obtained by wavelet packet analysis were used as the input of the BP neural network. Through experimental tests, the parameters of BP neural network are selected as follows: the maximum number of training is 2000; the number of hidden nodes is 30; learning rate is 0.3; the target error is 0.04, the hidden layer selects the tansig activation function, and when the output layer selects the purein activation function, the network structure is more stable. The model prediction results after screening the characteristic wavelengths are shown in Fig. 8 . It can be seen from Table 2 that the use of wavelet packet analysis to screen out the spectral characteristic wavelengths not only improves the modeling efficiency, but also the effect of the model is better than the model established by the full spectrum. This article takes red Fuji apples as the research object, and studies the relationship between near-infrared spectrum and apple internal quality. The original spectra were pretreated and the characteristic wavelengths of the spectra were screened by wavelet packet analysis. The selected wavelengths were used in BP neural network to establish apple soluble solids. The quantitative analysis model of the content, the R and the PRMSE were 0.936 and 0.471, respectively. The wavelet packet analysis method and BP neural network modeling method eliminate a large number of bands that are not related to soluble solids, which greatly reduces the complexity of the model and improves the prediction accuracy and stability of the model. The results show that it is feasible to use wavelet packet analysis and BP neural network to predict the soluble solid content of apple. The current task of the development of fruits industry in China Research and application of infrared spectroscopy technology in food safety testing Nondestructive identification of apple varieties by VIS/NIR spectroscopy Intensity correction of visualized prediction for sugar content in apple using hyperspectral imaging Near infrared spectroscopy to predict bitter pit development in different varieties of apples Study on spectral identification based on wavelet packet analysis Soil total iron content hyperspectral inversion based on BP neural network