key: cord-0030159-ro87xcu1
authors: Yan, Wenjing; Zhang, Zesheng; Zhang, Qingchuan; Zhang, Ganggang; Hua, Qiaozhi; Li, Qiao
title: Deep Data Analysis-Based Agricultural Products Management for Smart Public Healthcare
date: 2022-04-07
journal: Front Public Health
DOI: 10.3389/fpubh.2022.847252
sha: ca84775e42df638198ddd7a3d960b782b8b70122
doc_id: 30159
cord_uid: ro87xcu1

Agricultural is an indispensably public healthcare industry for human beings at any time and smart management of it is of great significance. Since substantial technical advance relies on long-term efforts and continuous progress, reasonably scheduling the distribution of agricultural products acts as a key aspect of smart public healthcare. The most intuitive factor affecting the distribution of agricultural products is its dynamic price. Forecasting price fluctuations in advance can optimize the distribution of agricultural products and pave the way to smart public healthcare. Most researchers study the prices of various agricultural products separately, without considering the interaction of different agricultural products in the time dimension. This study introduces a typical deep learning model named graph neural network (GNN) for this purpose and proposes deep data analysis-based agricultural products management for smart public healthcare (named GNN-APM for short). The highlight of GNN-APM is to take latent correlations among multiple types of agricultural products into consideration when modeling evolving rules of price sequences. A case study is set up with the use of real-world data of the agricultural products market. Simulative results reveal that the designed GNN-APM functions well.

Since ancient times, agriculture has been a life industry for human survival, which is closely related to the most basic life guarantee of human beings. At present, food shortage is one of the most important problems faced by many regions in the world (1) . This phenomenon is generally reflected in two aspects (2) . For one thing, there is still room for improvement in current agricultural technology, which makes grain yield fail to meet the expectations (3) . For another, due to the lack of scientific management and scheduling strategy, the production of agricultural products is not reasonable (4) . The exploration of advanced agricultural technology has lasted for at least a 100 years, and some technological breakthroughs have been made in some key fields (5, 6) . However, this process exerts an imperceptible influence as we all know, which needs to be accumulated over a long period of time to make progress (7) . The use of advanced computing technology to manage the agricultural products market can improve the distribution and dispatch efficiency of global agricultural products to a certain extent (8) , and then alleviate the problem of food shortage (9) . The key to improving management efficiency is to forecast the market conditions of several major types of agricultural products (10) . To realize such a goal, data-driven methods are the most intuitive ways (11, 12) .

Rationalizing the distribution of agricultural products is an important aspect of smart public healthcare. Predicting price fluctuations in advance can optimize the layout of agricultural products and pave the way for smart public healthcare. Many scholars have studied the agricultural products market in recent years (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) . For example, Fan et al. (14) analyzed the valueadded mechanism of agricultural products circulation value chain and put forward three optimization methods of agricultural products organization mode. These existing researches are mainly realized through the research methods of social sciences. Their analyses focus on the mechanism of social development and evolution but have to deal with large-scale computational tasks in a manual way. Without the assistance of intelligent computing, these methods are always faced with certain limitations. There are also some scholars who use intelligent computing methods to solve this problem. They regard the forecasting problem of the future agricultural products market as a time series forecasting problem based on historical data. However, most of them just treat different agricultural products as independent categories and then build time series prediction models separately. In this way, potential relationships between categories are ignored, so that the precision of the modeling process is reduced.

In order to solve the above challenges, a graph neural network (GNN) can be used to model this time series prediction problem (33) . Different from the traditional model that deals with gridstructured data, GNN deals with the data of topological structure. GNN is based on deep learning (34) and is widely used in various fields due to its good performance and interpretability (35) . It can deeply perceive the relationship between entities in the process of modeling by graph-structured data. Vertexes and edges connecting vertexes together constitute graph structure. The vertexes are object entities, and the edges are the specific relationship between the entities. Specifically, several common agricultural products can be regarded as entities, and the correlation between them can be regarded as edges, which together constitute a kind of graph network. By introducing a neural computing structure, a GNN model for time series prediction can be constructed. Therefore, this study designs a graph neural network-based smart management for the agricultural products market (GNN-APM). The main highlights of this article can be summed up as follows:

• The internal complexity of agricultural products systems is investigated for further management. • The GNN is employed to construct a time-series price prediction method for agricultural products systems. • A case study is carried out to evaluate the performance of the proposed method on real-world scenes.

The left part of Figure 1 illustrates the learning and training process inside GNN-APM. First, the feature space of different types of agricultural products is coded into vectorized representations. Then a prediction model is obtained by learning the data samples. As a necessity in our life, agricultural products have a strong correlation among various categories, and these correlations are very meaningful and worth exploring further. The complex interrelation between different categories affects their demand to a great extent. The introduction of the GNN model deeply excavates the correlation between different categories of agricultural products and builds a graph structure with agricultural products as the entity. Through analysis, the market conditions of agricultural products can be predicted and further intelligent management of public health can be realized.

Generalized to the problem scenario in this study, there are several types of agricultural products whose market conditions need to be forecasted. The learning algorithm of GNN-APM is shown in Algorithm 1. Types of agricultural products are viewed as the set of nodes, and their internal relations are regarded as the set of edges. Market conditions of agricultural products refer to the average market price in this research and will be updated temporally. Each time that the market condition is updated, is defined as a timestamp t which ranges from 1 to M. During each timestamp, market conditions of all the agricultural types are denoted as V (t) i , where i is the index number of agricultural products types. Inputting market conditions data of M timestamps, the main goal is to predict unknown market conditions data of following timestamps. Obviously, internal relations among nodes are likely to influence the tendency of market conditions. Thus, a relationship-aware sequential forecasting problem is formulated, and the GNN model is adopted to deal with such a problem.

Input: The market condition dataset: D; node set: V; number of agricultural products: N; total timestamp: M; learning rate: l; parameter set: ; penalty parameter: p; 1: initial iter = 1 2: repeat 3: for t ∈ [1, M] do 4: for i ∈ [1, N] do 5: compute

compute the gradient of according to loss; 8: update model parameters according to their gradients and learning rate l; 9: end for 10: end for 11: iter = iter + 1; 12: until convergence

Graph convolution network (GCN), a typical GNN model, is utilized here to model correlated sample space. The GCN extends the convolution operation to non-Euclidean data with graph structure. It is a deep learning method for graph structured data. Graph data can naturally represent data structures in real life, such as traffic networks, communication networks, and social networks. In other words, it is the way to represent this kind of data format. Unlike image and text data, graph data has different local structures for each node. This is because the nodes in the graph represent the different entities in the network, and the edges that connect the nodes represent the relationships between the entities.

As is shown in Figure 2 , taking graph structure as input, GCN obtains new node representation through graph convolution operation on neighbor nodes of each node in the graph. Then, all nodes are pooled to obtain the representation of the whole graph. In particular, an undirected graph with nodes is defined as G (V, E), where V is the number of nodes and E is the edge between two nodes. Enumerating i from 1 to N, v i constitutes the node set V. Let j denote the index number of nodes different from node i, edge e ij constitutes all the edges between pairs of nodes. Additionally, all the edge states inside graph G are able to make up an adjacency matrix A.

There are two kinds of GCN methods: spectral method and spatial method. Spectral CNN is the first method to construct a convolutional neural network on the graph. This method uses the convolution theorem on the graph to define graph convolution from the spectral domain. Specifically, it uses the convolution theorem to define the graph convolution operator in each layer. Under the guidance of the loss function, it learns the convolution kernel by gradient backpropagation and builds neural networks by stacking multiple layers. The spectral method is more general in most time-series prediction problems and is selected as the main technique of this study.

The spectral GCN method derives from the Fourier transform (FT) theory which can transform signals of the time domain into signals of the frequency domain. After such transformation, complicated convolution operations of the time domain can be approximated as multiplication operations of the frequency domain. The FT and inverse Fourier transform (IFT) are defined as follows:x

where U is the eigenvector that approximates the FT to matrix computation. Thus, the graph convolution operation is defined as follows:

where x is the input, ⊗ G is the graph convolution operator, g is the core, and ⊙ is the harmand product operator. Introducing Laplacian eigenvector as a basis function, the input signal can be expanded as:

. . .

Expanding g with matrix forms and then substituting Equation (4) into Equation (3), the following formula can be deduced:

Letting Chev (λ 1 ) denote first-order Chebyshev polynomials of λ 1 , the convolution operator g can be approximated as:

Among, the Chev λ γ can be represented as:

Hence, Equation (5) can be rewritten as the following formula:

where E is the degree matrix, and α 0 and α 1 are parameters to be learned. For simplicity, it is supposed to set α 0 = α 1 = −θ . Therefore, the above equation can be rewritten as:

In order to facilitate searching for optimum, renormalization operation is conducted on the above formula:

whereẼ ij is the degree matrix of the i-th node, and iÃ ij is the number of edges between the i-th node and other nodes. The final expression of graph convolution operation can be represented as:

At the t-th timestamp, the main input is related to the outputs of the previous several timestamps and the relation status among nodes. For the i-th node, its information state at the t-th timestamp can be represented as the following formula:

where

i is a vector that records output values at previous several timestamps, a i is a vector that records relation status between the i-th node and other nodes, β is a tuning parameter that adjusts the weight of two parts in Equation (14), and W S1 and W S2 are parameters to be learned. It is widely known that the internal of GCN is the information propagation process, as GCN emphasizes modeling of various dynamic or static relations. Accordingly, the representative vectors for node status can be also Frontiers in Public Health | www.frontiersin.org propagated to the following timestamps, which can be expressed as the following formula:

where W S3 and b S1 are parameters, and σ 1 (·) is the Reluctant Unit activation function represented as follows:

Hence, the prediction result at the t-th timestamp can be calculated as the following formula:

where W S4 and b S2 are parameters. As for training, the following optimization objective can be formulated to search for the optimal parameters:

where V (t) is the predicted result at the t-th timestamp,V (t) is the real result at the t-th timestamp, is the parameter set, p is the penalty parameter, and · 2 F is the Frobenius norm. Finally, the Adam optimization algorithm can be utilized to search optimal solution for Equation 18.

To evaluate the GNN-APM designed in this study, real-world data is used here to set a simulative analysis situation. The realworld data was crawled from the official website of the Ministry of agriculture of China 1 , including the market data of several key agricultural products from April 2019 to March 2021. The data demonstrates the average market price of agricultural products wholesale and contains five types of agricultural products. Market condition data for these agricultural products are updated once a week, and there are totally 96 weeks of data concerning the five types of agricultural products. Of all the five types of agricultural products, there are 10 kinds of node combinations, indicating 10 kinds of edges among these nodes. In other words, the adjacency matrix in this situation is a matrix with five columns and five lines, representing relation status between every two combinations of nodes.

During each round of simulations, these 10 groups of relations are randomly generated according to a Gaussian distribution whose mean is set to 0.5 and variance is set to 0.05. As for the ratio between training data and testing data, it is majorly set to 7:3 and 6:4. To quantify the error between prediction results and real results, two typical metrics are selected. They are mean absolute error (MAE) and root mean squared error (RMSE). In addition, two general prediction models are employed as baseline methods for comparison. Different from the designed GNN-APM in this study, the two baseline methods never take internal correlations among nodes into consideration. The two methods are the long short-term memory (LSTM) model and the multi-layer perceptron (MLP) model.

In this study, the whole simulative experiments are composed of three parts. First, the fluctuation tendency of the involved five agricultural product types is visualized using a curve diagram. Second, the prediction efficiency of the GNN-APM on five objects is compared with two baseline methods. Third, the The bold values correspond to results of the proposed GNN-APM method. The bold values correspond to results of the proposed GNN-APM method.

robustness of the GNN-APM is tested by changing several parameter combinations. Figure 3 visualizes the total tendency of market conditions for five types of agricultural products. It can be observed from the figure that five curves from the bottom to the top correspond to eggs, chicken, pork, mutton, and beef. During a long period about nearly 2 years, eggs and chicken remain relatively stable, mutton and beef show an ascending tendency, and the pork fluctuates frequently. These five types of agricultural products possess their own fluctuation tendencies and satisfy the assumption of diversity. And it can be also seen that the fluctuation tendency of pork has some effect on the other four types of agricultural products. Thus, the assumption that correlations exist among these types of agricultural products is reasonable.

Tables 1, 2, respectively give MAE results and RMSE results of these experimental methods when the proportion of training data ranges from 50 to 70% and the learning rate ranges from 0.001 to 0.002. Each of them has five lines and seven rows. The first two lines list the experimental setting, and the other lines present the experimental results of three methods. The first row lists three experimental methods, the second to the fourth rows present results under a learning rate of 0.001, the fifth to the seventh rows present results under a learning rate of 0.002. It can be observed from the two tables that MAE results and RMSE results of GNN-APM are below two other baseline methods, regardless of the proportion of training data and learning rate. This demonstrates that the performance of the GNN-APM is better than baseline methods. Figures 4, 5 illustrates prediction efficiency with respect to using two metrics: MAE and RMSE. As there are totally five types of agricultural products involved, the MAE results and RMSE results are obtained as the mean value of prediction results on the five types. This figure has two subfigures, corresponding to MAE results and RMSE results. Among them, Figure 4 is the curve diagram and Figure 5 is the bar diagram. For the former, the X-axis demonstrates three kinds of training sizes and the Yaxis demonstrates values of MAE results. For the latter, only the two most typical training sizes are utilized for evaluation. Thus, it has two clusters of bars, corresponding to RMSE results under two training sizes. It is clearly observed that the GNN-APM is always endowed with better prediction efficiency compared with baseline methods. To sum up, this group of simulative experiments well demonstrates the good performance of the designed GNN-APM.

In order to visualize the tendency of MAE results and RMSE results under different experimental settings, some of the results are illustrated with the use of curve diagrams or bar diagrams. Figure 4 illustrates the MAE results of three methods under two different learning rate values: 0.001 and 0.002. It has two subfigures that correspond to results about two learning rate values. In each subfigure, the X-axis denotes the proportion of training data ranging from 50 to 70%, and the Y-axis denotes values of MAE results. Figure 5 illustrates RMSE results of three methods when the training data size is set to 60 and 70%. This is because it can be seen from previous experiments that result under the two training data sizes are relatively better. It has two subfigures that correspond to two learning rate values. In each subfigure, the X-axis denotes two training sizes, and the Y-axis denotes values of RMSE results. It can be observed from these figures that values of GNN-APM are obviously below the other two methods and that values show descending tendency when the proportion of training data increases. Such results demonstrate the improvement process of methods with being trained more sufficiently. These figures show a better performance tendency of GNN-APM compared with two other baseline methods.

Besides, it is also expected to explore parameter sensitivity of the GNN-APM, and relevant simulative results are illustrated in Figure 6 . During this group of experiments, the GNN-APM is not compared with baseline methods and just performance of itself is explored. Figure 6 has two subfigures, corresponding to sensitivity results using two different metrics: MAE and RMSE. Inside each subfigure, the X-axis denotes the change of learning rate, and the Y-axis denotes the change of training size. In the middle square area, the color depth indicates the different values of evaluation metrics. As the two subfigures are heatmaps, the color depth degree inside figures is able to indicate values of metrics. Each subfigure includes a squared area, gentle color change inside it indicates that the performance of GNN-APM fluctuates not heavily. It can be objectively found that color fluctuation in both two subfigures seems quite gentle, revealing that the GNN-APM is not susceptible to parameter change. In other words, the GNN-APM is always able to remain stable, no matter how the key parameters change. This group of simulative results well prove that the GNN-APM possesses proper robustness. 

Agriculture has been viewed as the most fundamental industry since ancient times. Nowadays, E-commerce is an important sales channel of agricultural products. To better manage and schedule the supply of agricultural products, dynamic price prediction for agricultural products in the E-commerce market is of great significance. To overcome the shortcomings of existing research studies, this article proposes a deep learningbased price prediction model for agricultural products in the E-commerce market. In particular, the most typical GCN is utilized to establish a time-series prediction model for the dynamic price of agricultural products. In addition, the whole simulative experiments are composed of three parts. First, the fluctuation tendency of the involved five agricultural product types is visualized using a curve diagram. Second, the prediction efficiency of the GNN-APM on five objects is compared with two baseline methods. Third, the robustness of the GNN-APM is tested by changing several parameter combinations.

Nowadays, data mining and data management for many industries are gradually approaching the application of the Internet of Things (IoT), yielding such as mobile IoT (36, 37) , financial IoT, medical IoT (38), cloud-assisted IoT (39), vehicular IoT (40) , and industrial IoT (41, 42) . As is known to all, the IoT is a kind of effective tool or platform to integrate multidomain data and schedule business flows. To realize more effective scheduling management of the agricultural product market, designing an integrated microservice IoT platform that is embedded with robust artificial intelligence algorithms (43) , is in urgent demand to deal with many disturbing issues in various industries. Thus, for future outlook, the authors plan to deeply investigate optimal scheduling and management schemes for the agricultural product market with the use of novel IoT-related technologies.

Publicly available datasets were analyzed in this study. This data can be found at: http://www.moa.gov.cn/.

WY contributed significantly to theoretical analysis and manuscript preparation. ZZ performed the experiments and handled funding details. QZ contributed to the conception of the study and model formulation. GZ performed the data analyses and data visualization. QH provided many promising insights during the revision process. QL helped perform the analysis with constructive discussions. All authors contributed to the article and approved the submitted version.

On the design of blockchain-based ECDSA with fault-tolerant batch verication protocol for blockchain-enabled IoMT

A blockchain-based shamir's threshold cryptography scheme for data protection in industrial internet of things settings

Blockchain and PUF-based lightweight authentication protocol for wireless medical sensor networks

Efficient and privacy-preserving medical research support platform against COVID-19: a blockchain-based approach

Data-driven management for fuzzy sewage treatment processes using hybrid neural computing

Secure-Enhanced federated learning for ai-empowered electric vehicle energy prediction

A deep graph neural network-based mechanism for social recommendations

Deep learning-embedded social internet of things for ambiguity-aware social recommendations

An efficient ensemble VTOPES multi-criteria decision-making model for sustainable sugarcane farms

Data-driven peer-to-peer blockchain framework for water consumption management. Peer-to-Peer Netw Appl

A data-driven intelligent planning model for UAVs routing networks in mobile internet of things

Forecasting yield by integrating agrarian factors and machine learning models: a survey

Theory of planned behavior to predict consumer behavior in using products irrigated with purified wastewater in Iran consumer

Value added mechanism and organisational model optimisation of agricultural products circulation value chain from the perspective of game theory

An examination of the role of price insurance products in stimulating investment in agriculture supply chains for sustained productivity

Agricultural supply chain risk management under price and demand uncertainty

The calendar effect of price-reduction auction of online agricultural products

Futures price prediction of agricultural products based on machine learning

Forecasting agricultural commodity prices using model selection framework with time series features and forecast horizons

The evolutionary analysis of agricultural production transaction under the price subsidy policy

Improvements in spoken query system to access the agricultural commodity prices and weather information in Kannada language/dialects

The research on agricultural product price forecasting service based on combination model

Improving lives of indebted farmers using deep learning: predicting agricultural produce prices using convolutional neural networks

Research on the price analysis and prediction method of agricultural products based on logistics information

An analysis of price vs. Revenue protection: government subsidies in the agriculture industry

Price forecasting & anomaly detection for agricultural commodities in India

An early warning method for agricultural products price spike based on artificial neural networks prediction

Hadoop + spark platform based on big data system design of agricultural product price analysis and prediction by holtwinters

Seasonal ARIMA to forecast fruits and vegetable agricultural prices

Price prediction of agricultural products based on wavelet analysis-LSTM

Seasonal forecasting of agricultural commodity price using a hybrid STL and ELM method: evidence from the vegetable market in China

Time series prediction of agricultural products price based on time alignment of recurrent neural networks

A fuzzy detection system for rumors through explainable adaptive learning

Deep-Learning-Empowered breast cancer auxiliary diagnosis for 5GB remote e-health

Graph embedding-based intelligent industrial decision for complex sewage treatment processes

Graph neural networks-driven traffic forecasting for connected internet of vehicles

Secure artificial intelligence of things for implicit group recommendations

PMRSS: privacy-preserving medical record searching scheme for intelligent diagnosis in IoT healthcare

A privacyenhanced retrieval technology for the cloud-assisted internet of things

Attribute-based encryption with parallel outsourced decryption for edge intelligent IoV

An efficient ciphertext-policy weighted attribute-based encryption for the internet of health things

Blockchain-Based reliable and efficient certificateless signature for IIoT devices

Towards secure and privacy-preserving data sharing for COVID-19 medical records: a blockchain-empowered approach

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.