key: cord-0916693-6lxshlmk
authors: Shams, M.Y.; Elzeki, O.M.; Abouelmagd, Lobna M.; Hassanien, Aboul Ella; Elfattah, Mohamed Abd; Salem, Hanaa
title: HANA: A Healthy Artificial Nutrition Analysis model during COVID-19 Pandemic
date: 2021-06-30
journal: Comput Biol Med
DOI: 10.1016/j.compbiomed.2021.104606
sha: 3bc54fa489bc8de0e3432eac7f7ee1d91cffe61b
doc_id: 916693
cord_uid: 6lxshlmk

BACKGROUND AND OBJECTIVE: The impact of diet on COVID-19 patients has been a global concern since the pandemic began. Choosing different types of food affects peoples’ mental and physical health and, with persistent consumption of certain types of food and frequent eating, there may be an increased likelihood of death. In this paper, a regression system is employed to evaluate the prediction of death status based on food categories. METHODS: A Healthy Artificial Nutrition Analysis (HANA) model is proposed. The proposed model is used to generate a food recommendation system and track individual habits during the COVID-19 pandemic to ensure healthy foods are recommended. To collect information about the different types of foods that most of the world's population eat, the COVID-19 Healthy Diet Dataset was used. This dataset includes different types of foods from 170 countries around the world as well as obesity, undernutrition, death, and COVID-19 data as percentages of the total population. The dataset was used to predict the status of death using different machine learning regression models, i.e., linear regression (ridge regression, simple linear regularization, and elastic net regression), and AdaBoost models. RESULTS: The death status was predicted with high accuracy, and the food categories related to death were identified with promising accuracy. The Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R(2) metrics and 20-fold cross-validation were used to evaluate the accuracy of the prediction models for the COVID-19 Healthy Diet Dataset. The evaluations demonstrated that elastic net regression was the most efficient prediction model. Based on an in-depth analysis of recent nutrition recommendations by WHO, we confirm the same advice already introduced in the WHO report(1). Overall, the outcomes also indicate that the remedying effects of COVID-19 patients are most important to people which eat more vegetal products, oilcrops grains, beverages, and cereals - excluding beer. Moreover, people consuming more animal products, animal fats, meat, milk - including butter, sugar and sweetened foods, sugar crops, were associated with a higher number of deaths and fewer patient recoveries. The outcome of sugar consumption was important and the rates of death and recovery were influenced by obesity. CONCLUSIONS: Based on evaluation metrics, the proposed HANA model may outperform other algorithms used to predict death status. The results of this study may direct patients to eat particular types of food to reduce the possibility of becoming infected with the COVID-19 virus.

In many countries throughout the world, the current coronavirus (COVID-19) pandemic has led to general lockdowns that have resulted in the closure of all but essential services, such as grocery stores and pharmacies. Such closures have had an immediate and predictable effect on food obtainability and selection. The pandemic has restricted food selections, which may impact mealtimes and diet, as well as having a general effect on both physical and psychological health [1] .

Food systems have a direct and indirect impact on human health, and now it is more important than ever that they should become sustainable. In 2015, the United Nations' 2030 Plan for Sustainable Development issued an immediate call for action, including 17 sustainable development goals, by developed and emerging countries in global collaboration [2] .

Consequently, diagnostic tools for food prediction and saving food in surrounding environments after the lockdown are needed. In addition, food and industry supply chains need to be monitored to determine if they have contributed to the spread of COVID-19. This is performed by examining how COVID-19 spreads through surfaces, the food supply chain, and surrounding environments [3] . For example, Mishra and Rampal [4] presented a study of the effect of the COVID-19 pandemic on food insecurity in India. They started by tracking the general status of food insecurity and hunger everywhere in the world, focusing on lowerand middle-income countries. They found that there are significant relationships between economic growth, joblessness, and starvation resulting from food shortages during pandemic lockdowns.

More precisely, as reported by Laborde et al [5] that there are no lack of main food has appeared recently. Nevertheless, food markets and agriculture are in front of disruptions because of labor lacks outcomes from restrictions on activities of individuals and changes in food requests during the lockdown of restaurants and schools in addition to the shortfalls in people's income. 3 Nowadays, ML plays a vital role in diagnosis and prognosis issues especially tracking diseases in a medical application using image recognition systems, preventing and treat the spread of specific diseases especially in dealing with imbalanced data using ML approaches [6] - [8] . The diagnosis and classification of COVID-19 chest X-ray images as CT imagesbased ML approaches is the key feature to fight the spread of the COVID-19 virus. Moreover, the prediction system-based ML is used to forecast the effect of the current pandemic in different areas specifically in diagnosing diseases and healthcare systems [9] - [13] .

When purchasing food, the number of choices is excessive to be capable of considering them all [14] . Individuals have dissimilar dietary needs, habits, and distinguish flavor in different habits. Consequently, the only choice is to realize their requirements by discussing the person. In some cases, the recommendation system is performed to assist a modest starving consumer, cooking supporter, concerning health, dieter, or somebody hostile seeking to enhance his/her medical prominence, which will enhance the impact of the final selection. Furthermore, the food's existence in time is required to make the customer more stratified and happier. A significant feature when building these systems is the data collection sources and customer habits. They can be collected based on user's feedback to certain posts such as tracking the likes or dislikes of the customers. As well as the recorded ratings by the public watched videos and/or images in the social media. The achievement of a food recommendation system is associated with its capability to track user favorites, maximize the number of fresh and healthy food and in contrast minimize and avoid the unhealthy.

With so many people currently getting sick from the COVID-19, unhealthy diets contribute to pre-existing medical conditions that make them more vulnerable to the virus [15] . In many parts of the world, getting sick means losing income. Hence, the pandemic has increased the risks faced by consumers, producers, and policymakers around the world [16] . What is required to get a portion of proper healthy food? The answer to this question is more urgent and necessary than ever. There is a great deal of ambiguity regarding the components of a healthy diet and appropriate policy interventions. However, there is a growing body of evidence and analysis that points to actions that will save lives-or at least a little-improve the well-being of billions of people.

Excessive metabolic risk (cholesterol, blood pressure, body mass index, blood sugar) is responsible for the most important risk factors for infection and death. There are more than two billion people infected with death or dying out of 70% of them [17] . Non-common-price foods caused 600 million illnesses, and 420,000 cases each beginning of 2010, the case for every global condition, undermining human health and food security. Emerging evidence suggests that people with pre-existing medical conditions related to diets, such as obesity, heart disease, and diabetes, suffer more serious consequences from infection with the Coronavirus, such as the severity of illness and an increased need for intensive health care, such as respirators. Therefore, a good nutrition system during the recent pandemic is recommended to provide a suitable decision for the individuals to avoid the side effects of wrong diet habits [18] . Artificial intelligence tools can present effective and promising methods to predict, plan, and provide a suitable decision for decision-makers in the field of diet and nutrition [19] .

J o u r n a l P r e -p r o o f 4 In this paper, a Healthy Artificial Nutrition Analysis (HANA) approach is proposed. The HANA used ML algorithms on available public data to generate a food recommendation system as well as tracking individual habits during the COVID-19 pandemic to ensure healthy foods. The primary contributions of this paper are as follows.

• Proposing the HANA model emphasize nutrition styles with higher mortality from Covid-19. • We use ML-based analysis to seek food compatible nutrition styles during COVID- 19 incubation. • We predict the number of deaths resulting from poor habits related to food during the COVID-19 pandemic. • Designing light and fast learning model as a healthy food recommendation system. • Using ReliefF and Stochastic Gradient Descent (SGD) for optimal feature reduction based on PCA. • HANA model outcomes confirm WHO nutrition advice for COVID-19 and nutrition studies.

The remainder of this paper is organized as follows. In Section II, focusing on models that are relevant to the recent COVID-19 pandemic, studies related to food prediction models are discussed. Section III describes the system architecture of the proposed food recommendation system. Experimental and evaluation results are discussed in Section IV. Conclusions and suggestions for future work are presented in Section V.

In general, three food trends depend on Artificial Intelligence (AI) that are considered when dealing with food problems. Industrial food is commercially regulated according to the stages of manufacture to improve and facilitate the consumption process. Moreover, it was introduced to provide most of the food consumed by the world's population. In agriculture, an important issue related to AI is to help farmers eliminate diseases and pests that affect plants, which in turn affects the quantity and quality of the crop, and consequently affects the volume of food. We identified many studies that help to identify plant diseases. Food is being used in the fight against poverty by developing a recommended AI-based diet to track and monitor the nutritional level in developing countries.

Patients with certain diseases, such as diabetes, heart disease, high blood pressure, and insulin resistance, are most vulnerable to COVID-19. Therefore, to avoid that, the patients should monitor and decrease the bad habits of eating especially foods with high insulin. Low carbohydrate, moderate proteins, and moderate fat are mainly required to maintain the normal insulin in the patient's body. Food containing Zinc is an efficient way to increase the human immune system performance. Oysters, shellfish, red meat, and cheese are rich in Zinc. Vitamin D also required and existed in Cod liver oil, and salmon. Vitamin C additionally is very important to decrease the percentage of COVID-19 virus existence. The food rich in vitamin C for instance leafy greens, sauerkraut, and berries [15] - [18] .

For industrial food, Shen et al [19] proposed an application to measure the food attributes to help people balancing their diet, as it detects food items in an image and recognizes them. The application uses the Convolution Neural Network for food recognition. The system can evaluate food properties by transferring data from the internet. They used Inception-v3 and Inception-v4 models. These models are based on Convolutional Neural Networks (CNN) and the results obtained to tackle the problem are more reliable.

Furthermore, Onu et al [20] utilized AI models to expect low moisture content in drying potatoes. They used three different models; the Response Surface Methodology (RSM), Adaptive Neuro-Fuzzy Inference Systems (ANFIS), and Artificial Neural Network (ANN). They founded that the three models gave good prediction with the experimental data yet, RSM and ANFIS gave better results than ANN. In food processing, three cases are selected and studies gathering the machine learning and expert interaction as presented in [21] :

 In the first one, they hired experts to design the structure of the Bayes dynamic network for constructing a camembert maturing model, including variables from micro-scale (presence of bacteria and chemical components) to macro-scale (perceptual assessments).  In the second one, they built a model to assist winemakers in assessing when to harvest grapes, depending on weather conditions, the model is also a Bayesian network model.  Third, they used a graphical model based on symbolic regression to assist specialists making a model for bacterial production and stabilization.

An approach based on k-cluster segmentation and color detection is presented by [22] for grading, sorting fruits and vegetables, and the extracted features are calculated such as entropy, mean, and standard deviation.

In [23] the researchers produce a system where they used image processing with the help of SVM classifier to classify healthy rice plants and diseased rice plants. The system got a resolution of over 90%. Furthermore, in [24] the researchers present a proposed network structure for classifying potato leaf diseases based on CNN. The suggested architecture is consisting of 14 layers, and the average overall test accuracy is 98%. In [25] also identify leaf diseases of the apple, they use CNN based on the pre-train network AlexNet, the experiments of the proposed disease identification based on CNN give accuracy about 97.62%. To increase crop production, the researchers in [26] suggested a framework for fruit harvesting robots. The framework includes three classification models that are used to classify images of fruits in real-time date according to their type, ripeness, and harvest decision, as traditional methods may delay the production cycle of dates and represent more than 45% of the cost of production date, they used CNN with fine-tuning and transfer learning on pre-trained models. The proposed models achieve 99.01%, 97.25%, and 98.59% accuracy.

The researchers in [27] also work on date fruit; they offer a new and more accurate way to distinguish between healthy and damaged date fruits. They used deep CNN; this method can predict the maturity stage of healthy dates. The CNN model managed to achieve an overall rating accuracy of 96.98%. Furthermore, researchers were attempting to detect the food J o u r n a l P r e -p r o o f 6 defects especially for fruits such as apple in [28] they used the modified AlexNet model with an eleven-layer structure, along with a comparison study was performed to boost classification results obtained. They use three well-known algorithms back-propagation neural networks (BPNN), Particle Swarm Optimization (PSO), and SVM. The proposed CNN model for apple detection achieves a recognition rate of 92.50%, which is higher than other algorithms commonly used, such as BPNN, SVM, and PSO algorithm.

Another issue in AI is fighting the poverty presented in [29] , [30] , where the researchers in [29] study the data collected from five African countries: Tanzania, Nigeria, Uganda, Malawi, and Rwanda, it demonstrated that CNN can be prepared to distinguish image features up to 75% of the variance in the local economic level Results.

Their method could change efforts to track and target poverty in developing nations. In [30] the researchers present a more accurate approach for predicting the essential dimensions of poverty, health, education, and standard of living (Pearson correlation 0.84 -0.86). They used Gaussian Process regression, a Bayesian learning technique, providing uncertainty associated with predictions. The model is built with an elastic net regularization to prevent overfitting. The results show maximum accuracy when using disparate data such as the resulting Pearson Correlation reached 0.91.

O'Hara and Toussaint [31] observe the insecurity of food in Washington, DC. They discovered the new chances in urban agriculture and the production of food with sustainable simultaneous food access to tackle the insecurity of local food and the required infrastructure.

Ordás et al [32] present the habits of individuals in eating. The study of 170 countries were performed to discover the relationships between these habits and death rates caused by COVID-19 based on ML approaches taking the distribution of energy, fat, and protein through twenty-three different sorts of diets into consideration. The results indicate that 95% predicted correctly using a regression model based on Principal Component Analysis (PCA). Moreover, a course of treatment is performed for SARS-COV-2 patients to estimate the death cases using ML and Deep Learning (DL) approaches as investigated by Kivrak et al [33] .

Shams et al [34] proposed a regression model Based on Support Vector Machine (SVM) and Deep Learning (DL) approaches given a dataset contains both confirmed deaths and recovered cases. The results achieved indicate that the RMSE using SVM's with the Radial Basis Function (RBF) kernel is 0.27, while the SVM with linear Kernel achieves 0.18 RMSE, and the deep regression model achieves 0.29 RMSE.

In this work, we can conclude that the general structure of food systems in the period during the COVID-19 pandemic is illustrated in food directions based on AI. We observed that there are four important variables or parameters that are used in the food systems applied during the COVID-19 pandemic. These criteria include food security outcomes from individual closures, food safety affected by the recent pandemic, individual public health support system, and food sustainability [2] . Patients with severe pneumonia have been identified as vulnerable to the protein-energy deficiency that significantly damages respiratory muscle contractility and the immune 7 defense system [40] , [41] . According to [41] , SARS-CoV-2 infected individuals are most seriously and critically unwell and at nutritional danger. For assessing the dietary risks and treatment of severe and critical COVID-19 patients a study [42] was presented in 2021. A total of 523 people were enrolled in Wuhan, China from four hospitals. The window for inclusion was between 2 January 2020 and 15 February. The computerized medical records, nursing records and associated exams were used for clinical features and laboratory data. So, the power of data science was shown in this study. it concluded to the following the high risk of malnutrition in critical and serious patients with COVID-19. The low concentration of BMI and protein was substantially related to adverse outcomes. In individuals with COVID-19, early nutrition assessment and treatment are required.

In some cases, the COVID-19 pandemic has been observed to have significant impacts on food systems around the world, through both the vulnerabilities it has revealed within food supply chains, food demand, and the purchasing power of consumers [35] . The death cases are reduced in some countries compared with other countries in Europe, the major reason is the food habits as reported by Bousquet et al [36] . Moreover, some types of food like Cabbage and fermented vegetables are taken into consideration from the mortality heterogeneity of countries with mitigation candidates [37] .

The USDA center for Nutrition Policy and Promotion recommends a balanced diet comprising 10% fruits, 20% protein, 30% grains, and 40% vegetables. However, most people do not follow these recommendations. The impact of an unbalanced diet is more significant during a global pandemic. In this paper, we merge the world's overweight population, starvation, and types of food as regression features and the death rate due to COVID-19 as expected values to learn more about how healthy eating can assist in fighting the disease.

To learn more about how a healthy eating style could help combat the coronavirus, we propose an enhanced and optimized feature reduction algorithm and regression model. Based on the ReliefF algorithm, the feature reduction algorithm selects the top relevant features, to predict the probability of death due to the followed diet style as COVID-19 infected candidate.

As shown in Figure 1 , the proposed HANA model consists of four stages. The first stage is data pre-processing, which includes all processes involved in collecting and managing the data. The second stage consists of feature enhancing, selection, and dimension reduction steps. The third stage utilizes different regression and prediction models. Finally, the evaluation matrices stage is used to evaluate the applied regression models as shown in Figure 2 . The primary contributions of this proposed approach are as follows.

• First, we proposed a hybrid feature reduction algorithm based on SGD and the ReliefF algorithm. The proposed hybrid algorithm should obtain the optimal threshold values used in the reduction process. The selected threshold is used by the ReliefF algorithm to select the most relevant features from the relevant vector.

J o u r n a l P r e -p r o o f 8 • Second, we built the regression ML model using linear regression, i.e., elastic net regression). The regression MSE, RMSE, MAE, and R 2 are determined and evaluated. Furthermore, the experiment proves that the features transformation using PCA increases the efficiency of regression metrics. • Third, to the best of our knowledge, this study is the first to report hybrid feature reduction based on PCA, ReliefF, and SGD algorithms to predict the death rate of COVID-19 cases correlated to diet.

Dealing with data correctly is crucial to achieving a highly accurate prediction model. Problems related to the dataset, e.g., missing values, imbalance, etc. must be addressed. Here, missing values are completed based on the following benchmarks.

The missing values of this applied dataset are imputed based on an average and most frequent property. This means that the imputation technique used is to substitute any missing value with the mean or most frequent of the variable values for all other cases. The major advantage of this approach is that no further change in the sample mean is required for this variable. The value for the recorded ̅ is the average sample of respondent data within some class h. In addition, the average imputation can be determined within the classes and can be expressed as ̂ ̅ where ̂ is the imputed.

Feature normalization an important step to is ensure a high accuracy model. The features are normalized to the interval [0, 1] to produce a feature vector template to be applied in the next feature processing stage.

Generally, the proposed approach depends on using PCA to extract the most significant features through a hybrid algorithm based on Relief features and the SGD optimizer. Then, the AdaBoost and other regression models are presented to forecast death cases resulting from the consumption of unhealthy food during the COVID-19 pandemic. The process to output an ML regression model and the expected death rate is given in Algorithm 1, an Intelligent Healthy Food Regression System that describes the steps of the COVID-19 Pandemic Roadmap.

PCA is among the most commonly used unsupervised dimensional reduction techniques. The purpose of PCA is to locate the space that represents the high variance path of the data. PCA 9 space consists of major orthogonal elements, i.e., axes or vectors. The PCs are determined either by resolving the covariance matrix or by using singular value decomposition [38] .

In this work, we would like to show that the PCA-enhancement dataset (features) approach is not only limited to a specific subspace learning method but also represents the main PCA concept, i.e., mapping ȹ(y) on the data. If the input is uploaded to a higher-dimensional space, the subspace can be approximated more easily. Although the high dimension space becomes very significant, the main point behind PCA is to try to prevent explicit computing of ȹ and to work with ȹ( ) >= ȹ( )T ȹ( ). The C matrix of the covariance is determined as in Equation (1) [39] .

where n is the number of instances for all enrolled features i, and T is the transformed function inside the orthogonal domain. The main objective of using PCA in this paper is to modify and enhance the features that help achieve a higher accuracy prediction model.

Selecting the most important features from a whole dataset is considered an open challenge [40] . In this paper, to extract the most important features two algorithms, i.e., the ReliefF algorithm and SGD, are combined in this step. The steps of the hybrid algorithm are summarized in Algorithm (1).

In practice, the original Relief algorithm is no longer used [41] and has now been replaced by ReliefF [42] because it is one of the best and most commonly used RBA algorithms. The "F" was added to indicate that the algorithm differs from the sixth variation (A to F) of the algorithm proposed by Kononenko [43] . The ReliefF adopts a filter method approach to feature selection, and classification, accuracy is not used as an evaluation metric directly.

Assume that x is the training dataset and that the i th sample of a training dataset denoted by l is ( ) ( ). Therefore, to measure the similarity between two samples, Euclidean distance is applied to determine the Nearest Neighbour (NN) training dataset ( ( ) ) and

The Given a training dataset 

Further, the difference of n th features between ( ) and ̅̅̅ ( ( ) ) are determined by Equation (3):

In this case, the score of nth features are determined using Equation (4), Setting the threshold value for selecting the features subset using Relief method. 5

Transform the selected features using PCA. 6 Applying ML regression prediction models based on: I. linear regression model II. Ada boost regression model 7

Using Kfold cross validation method to determine the evaluation metricises. Output:

ML regression model, Expected Death Rate in the population giving the food style.

J o u r n a l P r e -p r o o f 13 where S(n) is the score of the nth feature, and N represents the total training dataset. There is always competition between higher S(n) and more discriminatory features. Thus, different differing from the contribution rate in the PCA, S(n) is an intuitive value for assessing used to assess the performance of the classification capability of the features. The ReliefF approach will identify many more discriminatory features than PCA. Even these larger, However, even this greater number of significant features are kept and shaped as a low dimensional subspace.

SGD is an effective approach to scale down as it achieves enhanced consequences for scarce data. SGD means that the gradient curve descends to the lowest point. SGD is applied iteratively until the minimum set of points has been achieved. There are three forms of SGD, SGD, full-batch stochastic, and mini-batch gradient approaches. The three variables are derived depending on the characteristics taken for each iteration. Further SGD operates on a random probability basis. Rather than using the entire dataset, random features are selected as samples from the specified dataset for each iteration. These samples are called batch samples. For a precise prediction, conventional gradient descent is applied to the entire dataset simultaneously, which is difficult when the dataset is extremely large [44] .

As a result, the SGD algorithm has been developed. The SGD algorithm attempts to address the problem associated with large datasets by selecting subcategories of the dataset at random for every iterative process. SGD is a discriminatory learning approach that prototypes and classifies detected data. SGD considers a batch to be a single sample in an iterative process. Thus, from each iterative process, the cost function gradient for each sample is calculated. However, SGD is faster than traditional gradient descent approaches because the information is introduced instantly after every sample has been trained. The general steps of the SGD algorithm are as follows [45] .

1. Take the derivatives of the loss function for the enrolled features such that the resulting loss function is given by L() = ( ̌ ) ( ) where ̌ is the resulting predicted value and y is the actual value relative to x. 2. Calculate the gradient ∇ of the loss function results from step 1. 3. Select the initial random value of the enrolled features to start . At every step, the features are selected and randomized. Evaluation_Measures has a more significant effect on the algorithm. It is better if the Evaluation_Measures value is greater at the beginning of the iteration process as it tends to make the algorithm take big stages. The Evaluation_Measure should be reduced when it approaches the minimum value, to prevent losing the minimum point.

The proposed HANA model approach was used to predict the status of death using two prediction models, i.e., linear regression (standard linear regression, ridge regression, and elastic net regression), and AdaBoost models.

Simple linear regression models the relationship between two variables such that one variable is utilized to define the value of another variable. This relationship can be expressed as a complex mathematical equation that is applied to numerous values of the two variables given certain assumptions to describe the data. Simple linear regression can be expressed as follows: Y = β 1 + β 2 X + ϵ (5) where X is the independent variable, Y is the dependent variable, and ϵ is the error. β1 and β2 are the interception and the slope of the regression model, respectively. For a given parameter β1, β2 the cost function ( ) can be determined as follows.

The main objective of simple regression is to minimize the cost function such that: ( ) [46] .

Regularization and feature selection tasks are done using the ridge regression method. Even when the variables are highly correlated, this regression method determines the feature subset that is significant to the classification and prediction problems. Ridge regression also helps to handle missing values in the variables. A specific type of estimator for coefficient shrinkage, i.e., a ridge estimator, is used in the ridge regression method [47] . This type of regression corresponds to the regularization form of L2 in which a penalty called the L2 penalty is applied. The L2 penalty is determined as the square of the coefficient's magnitude. By simply summing the penalty values, the cost function in the ridge regression method is adjusted. The cost function used in ridge regression is presented in Equation (7).

where n is the feature numbers, m is the total number of features n the dataset, and ƛ is the penalty value. Note that ridge regression applies constraints on coefficients.

An elastic net regression model is a standard type of regularized linear regression that associates two penalties, L1 and L2 penalty functions. Further, during the training process, elastic net regression can regularize and develop a linear regression model by adding penalties to the loss function. The new instance data can be evaluated automatically to make the final prediction based on a grid search strategy. To avoid the shortages initiated in lasso regularization, the elastic net includes a quadratic expression in the penalty. Moreover, isolation is performed based on ridge regression. The elastic net method can regularize variable selection instantaneously. In addition, it fits the dimensional data to be more than the number of samples used. The assemblages and selection of variables are crucial processes in the elastic net method. Assume that the given data are (y, X) such that the penalty parameters are (l1, l2). For any static non-negative λ1 and λ2, the naïve elastic net used to solve the lasso problem is determined as follows [48] .

Where | | ∑ , and | | ∑ | |

The estimator of the elastic net denoted ̃ ̇, is determined as the minimizer of Equation (6) and is calculated as follows:

In the proposed approach, the AdaBoost model [49] was used for regression. Normally, two main parameters have a direct effect on AdaBoost, i.e., the number of iterations (T) and the weights of the training patterns (w), which are initialized to be equal [50] . Given training sets S={(x 1 , y 1 ),…,(x n , y n )} such that each instance xi belongs to a domain X , and each label x i {−1, +1}. For regression and classification issues, Equation (10) determines the expected values to match and minimize the sign of f λ (x i ) to y i , such that

where ⟦ ( ) ⟧ is 1 if ( ) is validated, otherwise 0. Classification and regression can reduce the number of errors; however, the most significant contribution is to minimize some other non-negative loss function. For example, the Adaboosting algorithm is presented as a loss function, as in Equation (11).

J o u r n a l P r e -p r o o f 16

To evaluate the proposed prediction system, four well-known measures metrics are used, i.e., MSE, RMSE, MAE, and R 2 .

MSE is one of the most commonly used metrics for regression tasks. It is essentially an estimate of the square of the difference between the target value and the regression model's expected value. It penalizes small differences when it squares the discrepancies, leading to over-estimating how poor the model is as shown in Equation (12).

where is the actual expected output, ̂ is the model's prediction, and n is the number of samples.

RMSE is calculated as follows:

where n and s denote the number of data and the forecast value, respectively.

MAE is used to measure prediction model accuracy, and it is computed by Equation (14).

where is the actual expected output, ̂ is the model's prediction, and n is the number of samples.

The main objective of R 2 is to measure the correlation between forecast and measured data. A dataset has n values denoted , (communally identified as y i or as a vector y = [ , ] T ), respectively related to a predicted value f 1 ,...,f n . To determine the total sum of squares and the sum of residual squares Equations (15) and (16) , are used as follows. Total sum of squares:

The sum of residual squares is also referred to as the residual sum of squares:

J o u r n a l P r e -p r o o f 17 S res = ∑ ( ) (16) The most common expression of the coefficient of determination is given in Equation (17). (17) Low MSE, RMSA, and MAE values indicate the best result, and high R 2 values indicate high accuracy.

This section describes experiments conducted to determine the efficiency of the proposed HANA model. All experiments were performed using the Python 3.0 software package running on a machine with a Core i7 processor, 16 RAM, and NVIDIA 4G-GT 740m GPU environment. We present a dataset description, dataset statistical analysis, PCA to enhance features, feature selection, dimension reduction, and performance validation of the proposed methodology in the following subsections.

In the proposed HANA model, the COVID-19 Healthy Diet Dataset 2 was used. The dataset contains percentages of fats in different foods in 170 countries around the world. For comparison, the dataset also includes obesity, undernourished, and COVID-19 cases as percentages of the total population. The COVID-19 Healthy Diet Dataset includes information about various categories of food, alcoholic beverages, animal products, animal fats, aquatic products, cereals excluding beer, oil crops, eggs, seafood, sugar and sweeteners, fruits, meat, miscellaneous, milk excluding butter, offal, spices, starchy roots, pulses, stimulants, sugar crops, tree nuts, vegetable oils, vegetal products, and vegetables. It consists of five files in comma-separated values format containing the following. 

As mentioned previously, the database covers 170 countries and 25 types of food. In the proposed HANA, scatter graphs are used to observe and represent the relationship between food types (features). The points in a scatter plot reveal patterns in the entire dataset as well as

individual data values. This information can be used to determine correlation relationships [48] .

In other words, a scatterplot diagram is used to visualize the data in the dataset. Scatter plots represent a group of dispersion plots showing each pair of features. The most significant and weak relationships can be easily deduced from scatter plot diagrams. In turn, this enables us to explain the connection between each pair of features. Example scatterplot diagrams for the COVID-19 Healthy Diet Dataset are shown in Figure 3 . Scatter graphs were used to make the scatter plots to correlate food types and death. Death is a dependent variable and different types of food are independent variables. As shown an examples graphs illustrate a triple relationship between two different types of food in the COVID-19 Healthy Diet Dataset and death. 

Pearson correlation is one of the most common methods used with numeric variables, and its value range from 1 to −1 where 1 indicates a positive correlation, −1 denotes a negative correlation, and 0 indicates no correlation [51] , [52] . The proposed framework used the Pearson correlation coefficient to find the correlation between the different types of food in the COVID-19 Healthy Diet Dataset and the extent of their positive and negative correlation with death. The top six categories of food that are correlated with death are listed in Table 1 . The proposed HANA model also analyzes the distribution of the COVID-19 Healthy Diet Dataset and the results show that the food categories have a logarithmic relationship to death. The mean μ and variance σ2 for each food category are shown in Figure 4 .

In this study, the goal of PCA was to extract the main data representative of the typical features from the COVID-19 Healthy Diet Dataset and present it as a new set of independent parameters of the principal component. The function of the PCA is to transform correlated variables into non-correlated variables called the principal components by applying the orthogonal transformation. The transformation is designed such that the main component represents the greatest amount of data variance, and the orthogonal axes are sorted in descending order depending on the amount of variance [53] , [54] . PCA is calculated by decomposition of a covariance matrix produced from a dataset or decomposition of individual values of a dataset matrix [55] . PCA can be applied to extract and classify features to identify target samples in a dataset. Thus, in this paper, PCA is used to enhance and improve the features in the dataset. As shown in Figure 5 and Figure 6 , the values analyzed at PC 20 increased as the component variance decreased from 0.007 to 0.04 and the cumulative variance increased to 0.8 from 0.22, thus increasing the value of the features analyzed, which, in turn, improved the features. 

PCA is also used to control the correlation among features. Next in the new space, the ReliefF algorithm is used to discover more discriminatory features. Table 2 shows the result for each dimension of the features after the PCA transformation. Compared with the scores obtained by the ReliefF method, it is evident that higher scores can be achieved using the feature dimensions. 

Feature reduction is considered an optimization problem that can be by SGD. The SGD threshold will determine the dimensionality of the subspace. The main purpose of SGD is to limit the features used in prediction methods. If there is a prepared dataset (the values of the features' PC's resulting from using ReliefF), SGD is a computationally exceptionally sumptuous system. We executed linear regression using SGD, as shown in Equation (18) [56], [57] .

Here, ( ) represents the estimated features, and is the present value of the PC's features resulting from using ReliefF. Commonly, is an error function; then, by tracking the gradient direction in the value space of (y), we move in the (y) ,which reduces the error. SGD calculates the best (y) by minimizing simultaneously. More importantly, with either perceptron segmentation or linear regression, (y) requires the model's weight parameters, and (y) is the model's error. Regular gradient descent is expressed as follows:

where the objective of the error is calculated as follows.

As a result of the evolution matrix measures used to determine the exact prediction method, SGD adjusts the selection of the selected features due to the threshold. The resulting features amounted to 80% of the output of the ReliefF feature, as shown in Table 1 .

The MSE, RMSE, MAE, and R 2 values for the proposed framework's regression prediction models are compared in Table 4 .

As can be seen, death was predicted with high accuracy, and the values were predicted with acceptable accuracy. The MSE, RMSE, MAE, and R 2 metrics serve to evaluate the accuracy of the models for the COVID-19 Healthy Diet Dataset, as shown in Table 4 . The most efficient regression prediction model was elastic net regression. As shown in Table 4 , the MSE, MAE, and RMSE for the elastic net regression model were significantly lower than ridge regression, simple linear regularization, and AdaBoost models MSE, MAE, and RMSE metrics are frequently used to determine model accuracy. In this study, they provided the best fit when predicting death. Because of R 2 compares the fit of the chosen model with a horizontal straight line e.g null hypothesis, R 2 is negative for the chosen model which fits worse horizontal line. R 2 is a performance metrics that is not always the square of anything, hence it can be negative value without violating any rules of math; i.e. R 2 is negative denoting the chosen model does not follow the trend of the data. Conversely, the R 2 value for the elastic net regression model was significantly higher than other models. The features are now prepared to be close-fitting to a model, but which one? We selected four models (linear regression (ridge regression, simple linear regularization, and elastic net regression), and AdaBoost models) and performed K-fold cross-validation to determine which one is best. We used 20-fold cross-validation to compare the output of the four models and conclude that elastic net regression has a slightly higher probability of providing better prediction accuracy.

To evaluate the performance of the SGD method, we compared using and not using SGD for feature reduction. Table 5 shows the probabilities before and after feature reduction. In Table  5 , the score for the model in the row is higher than the score for the model in the column. Small numbers indicate the probability that the difference is negligible. Table 5 represents the different enhancement of results before and after feature transformation. The different enhancements are compared concerning the MSE, RMSE, MAE, and R 2 metrics. The proposed method obtains better results for all metrics using simple linear regression compared to regression using regularization with the ridge function. The MSE, RMSE, and MAE values improve with simple linear regression and regression using elastic net compared with standard linear regression, regression using the elastic net, and AdaBoost. AdaBoost increases the R 2 value compared to regression using ridge and elastic net regularization. Overall, the results support the following points regarding hybrid PCA-Relief feature selection with SGD parameter optimization. a. The reduced and enhanced COVID-19 Healthy Diet Dataset has higher predictive accuracy than the original COVID-19 Healthy Diet Dataset due to a lower number of features. b. The elastic net regression model has higher accuracy (lower MSE, RMSE, MAE, and higher R 2 ) than other models with the reduced COVID-19 Healthy Diet Dataset. c. Results show that deaths increase with increasing consumption of animal fat, , animal products, eggs, and milk -including butter. d. Results show that deaths decrease with increased consumption of vegetal products, spices, pulses, oil crops, cereals -excluding beer, and starchy roots. 

In this section, we introduce the comparative study of the proposed regression model with recent regression models related to the study of the COVID-19 effects on human food habits during a lockdown. To the best of our knowledge, investigations into the direct relationship between food and COVID-19 and the prediction of death cases resulting from bad dietary habits during the recent COVID-19 pandemic are limited. Therefore, Error! Not a valid bookmark self-reference. introduces a comparative study of the proposed regression model compared with studies performed by Ordás et al. [32] , and Shams et al. [34] . The comparative study demonstrated the superiority of the proposed regression model compared with other models in terms of accuracy and MSE. 

The problem is refined to be more concise and clearer for assisting the nutrition experts and infected COVID-19 subjects. The results discuss the intercorrelation between different types of foods also the intercorrelation of food categories and mortality of COVID-19. The analysis is supported by recent literature discussions and research in both nutrition analysis and artificial intelligence directions. The proposed HANA model works on predicting the mortality of COVID-19 depending on the nutritional system style of a given population and determines the surviving infected cases from others using machine learning. In the subsequent items, we have illustrated these insights in more detail. First, Table 1 emphasizes more the foods associated with higher correlated to death from Covid-19. Based on an indepth analysis of recent nutrition recommendations by WHO, we concise the same advice already introduced in the WHO report. During the COVID-19 pandemic, WHO report recommends avoiding eating out, using less salt and sugar, and moderate eating amounts of i) oil as avocado, nuts, sunflower, olive oil, soy, canola, and corn oils; ii) fat including animal fats iii) animal products such as fish, fatty meat and iv) the saturated fats used in manufacturing vegetal products like, cream, coconut oil, and cheese. Second, the nutritional experts emphasize that a nutritious diet can assist overall enhancement of the immune system and becomes less vulnerable to COVID-19 diseases, and they recommended that foods rich in vitamins and fresh vegetables can assist in strengthening the immunity system [66] . One of the current nutrition approaches is the Mediterranean diet for COVID-19 [67] in which discusses the impact of diabetes (Type II) and cardiovascular disease's direct effects on the infected COVID-19 subjects. It is noteworthy their contributions confirm a highly positive outcome of the HANA model representing Eat less salt and sugar and eat fresh and unprocessed foods every day. Finally, Matching the evaluation result of the proposed HANA model in determining the success rate of infected COVID-19 cases verse the presented review by Mechanick et al [68] , we confirm and concise there is a high gap between clinical nutrition applied to COVID-19 and unsafe food infrastructure. They put forward many questions including the use of nutritional supplements and nutrients in COVID-19 patients with diabetes, and nutritional interventions in patients especially older than 65 years, or who are frail, to prevent severe COVID-19 disease.

The analysis of the Root Cause (RC) is the procedure for determining root causes of problems to find suitable solutions. RC assumes it is far more harmful than helpful ad hoc symptomatology and fire removal to systemically avoid and focus on solving problems. As shown in Figure 7 the results were analyzed for the proposed model HANA and the work of RC. As shown in the root cause, it is recommended to eat less salt and sugar as low potential impact with low actionability. While it is the highest potential impact to avoid eating out with a low actionability. The high actionability recommended, is to eat a moderate amount of fats and oil as well as eating fresh and unprocessed foods every day that are presented in root cause analysis as low, and high potential impact, respectively. 

Among this COVID-19 Healthy Diet dataset, consisting of 170 countries around the world, have different types of food, and the increase in eating some food categories may increase the death status since the start of the COVID-19 virus. The world had to change its eating habits to improve health and reduce the death rate. In this work, a Healthy Artificial Nutrition Analysis (HANA) was conducted on the death status of people infected with the COVID-19 virus taking into account the type of food. For this purpose, 25 features related to different food groups were used. HANA model is presented as an algorithm based on ML and data analysis to provide an effective decision tool for the nutrition experts to predict and analyze the suitable diet and nutrition during the current pandemic. Furthermore, we determine the root cause analysis of the food to recommend which diet habits are recommended based on potential impact and actionability. A statistical analysis of the COVID-19 Healthy Diet dataset was carried out to clarify the correlation of different types of food with the death rate and also their distribution, and to calculate the mean and standard deviations. Furthermore, the HANA model is presented to ensure predictable results with enhanced results compared with the well-known methods MSE, MAE, and RMSE.

To extract from the COVID-19 Healthy Diet Dataset the main data representative of the typical features and present it as a new set of independent parameters of the principal component the PCA was used. Hence the idea of using PCA in this paper to enhance and improve the features in the dataset. Then use Relief as a feature selection, after that the reduction of features is conceived as an optimization problem solved by SGD.

Death was forecasted using two regression prediction models, namely linear regression (ridge regression, No regularization, and elastic net regression), and AdaBoost models. The most efficient regression prediction model is the elastic net regression. The MSE, MAE, and RMSE for the Elastic Net Regression model were significantly lower than ridge regression, simple linear regularization, and AdaBoost models. On the other hand, the R 2 value was for the Elastic Net Regression model significantly higher than other models. In future work, a complete diet will be developed, either for specific diseases such as obesity or as a model for a healthy life. During the future development, smart health care is increasing to adapt smart city principles. The HANA model is an IoT-based appreciation component to such cases in post actions of the COVID-19 pandemic.

Food & meal decision making in lockdown: How and who has Covid-19 affected?

The food systems in the era of the coronavirus (COVID-19) pandemic crisis

Safety of foods, food supply chain and environment within the COVID-19 pandemic

The COVID-19 pandemic and food insecurity: A viewpoint on India

COVID-19 risks to global food security

Intelligent diagnosis of Alzheimer's disease based on internet of things monitoring system and deep learning classification method

An automatic detection system of diabetic retinopathy using a hybrid inductive machine learning algorithm

A novel perceptual two layer image fusion using deep learning for imbalanced COVID-19 dataset

COVID-19: a new deep learning computer-aided model for classification

Building Resilience against COVID-19 Pandemic Using Artificial Intelligence, Machine Learning, and IoT: A Survey of Recent Progress

Deep learning and medical image processing for coronavirus (COVID-19) pandemic: A survey

Why Are Generative Adversarial Networks Vital for Deep Neural Networks? A Case Study on COVID-19 Chest X-Ray Images

Machine and Deep Learning towards COVID-19 Diagnosis and Treatment: Survey, Challenges, and Future Directions

A Food Recommendation System Based on BMI, BMR, k-NN Algorithm, and a BPNN

COVID-19: the inflammation link and the role of nutrition in potential mitigation

Challenges, opportunities, and innovations for effective solid waste management during and post COVID-19 pandemic

Endocrine and metabolic aspects of the COVID-19 pandemic

Patients with severe obesity during the COVID-19 pandemic: how to maintain an adequate multidisciplinary nutritional rehabilitation program?

Artificial Intelligence in the design of transition to Sustainable Food Systems

Patterns of Change in dietary habits and physical activity during lockdown in Spain due to the COVID-19 pandemic

Food choice motives and the nutritional quality of diet during the COVID-19 lockdown in France

Food safety and evaluation of intention to practice safe eating out measures during COVID-19: Cross sectional study in Indonesia and Malaysia

Changes in the Food-Related Behaviour of Italian Consumers during the COVID-19 Pandemic

Machine learning based approach on food recognition and nutrition estimation

Evaluation of optimization techniques in predicting optimum moisture content reduction in drying potato slices

Interactive Machine Learning for Applications in Food Science

Automated Sorting and Grading of Vegetables Using Image Processing

Development of classification system of rice disease using artificial intelligence

Artificial Intelligence in Potato Leaf Disease Classification: A Deep Learning Approach

Identification of apple leaf diseases based on deep convolutional neural networks

Date fruit classification for robotic harvesting in a natural environment using deep learning

Image-based deep learning automated sorting of date fruit

Detection of apple defect using laser-induced light backscattering imaging and convolutional neural network

Combining satellite imagery and machine learning to predict poverty

Combining disparate data sources for improved poverty prediction and mapping

Food access in crisis: Food security and COVID-19

Evaluation of Country Dietary Habits Using Machine Learning Techniques in Relation to Deaths from COVID-19

Prediction of death status on the course of treatment in SARS-COV-2 patients with deep learning and machine learning methods

Impact of COVID-19 Pandemic on Diet Prediction and Patient Health Based on Support Vector Machine

Nutritional status change and activities of daily living in elderly pneumonia patients admitted to acute care hospital: a retrospective cohort study from the Japan Rehabilitation Nutrition Database

Evaluation of nutrition risk and its association with mortality risk in severely and critically ill COVID-19 patients

Nutritional risk and therapy for severe and critical COVID-19 patients: a multicenter retrospective observational study

COVID-19 and small enterprises in the food supply chain: Early impacts and implications for longer-term food system resilience in low-and middle-income countries

Is diet partly responsible for differences in COVID-19 death rates between and within countries?

Cabbage and fermented vegetables: From death rate heterogeneity in countries to candidates for mitigation strategies of severe COVID-19

Principal component analysis-a tutorial

PCA enhanced training data for adaboost

Relief-based feature selection: Introduction and review

The feature selection problem: Traditional methods and a new algorithm

A practical approach to feature selection

Estimating attributes: Analysis and extensions of RELIEF

Deepview: View synthesis with learned gradient descent

Large-scale matrix factorization with distributed stochastic gradient descent

Simple linear regression

On regularisation methods for analysis of high dimensional data

Regularization and variable selection via the elastic net

Logistic regression, AdaBoost and Bregman distances

Modest AdaBoost-teaching AdaBoost to generalize better

Pearson correlation coefficient," in Noise reduction in speech processing

Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Mander's overlap coefficient

Data reduction and regression using principal component analysis in qualitative spatial reasoning and health informatics

Principal component analysis: a review and recent developments

Principal component analysis

On hyperparameter optimization of machine learning algorithms: Theory and practice

State-of-the-Art CNN Optimizer for Brain Tumor Segmentation in Magnetic Resonance Images

Choice of food: A preventive measure during Covid-19 outbreak

Mediterranean diet as a nutritional approach for COVID-19

Clinical Nutrition Research and the COVID-19 Pandemic: A Scoping Review of the ASPEN COVID-19 Task Force on Nutrition Research

The authors thank Smart Science Lab, Mansoura, Egypt (UPID Number: 260201-2021) for technical support.

The authors declare that they have no conflicts of interest regarding the publication of this paper.

A data availability is found in https://www.kaggle.com/mariaren/covid19-healthy-dietdataset.

Manuscript title: HANA: A Healthy Artificial Nutrition Analysis model during COVID-19

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript. The authors whose names are listed immediately below report the following details of affiliations or involvement in an organization or entity with a financial or non-financial interest in the subject matter or materials discussed in this manuscript.