key: cord-0023276-iodwvha8 authors: Mishra, Pradeep; Yonar, Aynur; Yonar, Harun; Kumari, Binita; Abotaleb, Mostafa; Das, Soumitra Sankar; Patil, S.G. title: State of the art in total pulse production in major states of India using ARIMA techniques date: 2021-10-28 journal: Curr Res Food Sci DOI: 10.1016/j.crfs.2021.10.009 sha: f763c1f68b484638e61595f9f42504239f3e795a doc_id: 23276 cord_uid: iodwvha8 Pulses are staple protein-rich food for Indian vegetarians, and India is one of the largest producers in the world. The present investigation is an attempt to study the trend in the production of total pulses in India using the autoregressive integrated moving average (ARIMA) method. For stochastic trend estimation, yearly data were used for the period from 1961 to 2019. On the basis of the performance of several goodness of model fit criteria, the most suitable ARIMA model is chosen to capture the trend of pulse production. Forecasting for the 10 years from 2020 to 2029 is done, and it is observed that India has the highest forecast value (31.03302 million tonnes) in 2029. This study will play an important role in determining the gap between production of and demand for pulses in the future. Pulses are leguminous edible dry seeds rich in protein, minerals, and fibers. They play a diverse role in agriculture as a food crop, provender, cash crop, and rotation crop or intercrop. Many sustainable development goals aiming to achieve sustainable development by 2030 cannot be met without inclusion of pulses in our consumption and production basket (Rawal and Navarro, 2019) . Pulses as a commodity group fit in all the Feed the Future Initiative themes aiming toward sustainable poverty and hunger reduction and enhancement of nutrition and health conditions alongside protection of the environment (Maredia, 2012) . Total pulse production worldwide was recorded as 92.28 million tonnes in 2018 (FAO, 2018) , of which the major pulses were dry beans (32.98%), chickpeas (18.63%), peas (13.53%), cowpeas (7.83%), lentils (6.86%), and pigeon peas (6.45%). India is the main producer of pulses for most nations, and accounts for 25% of worldwide production. It is also the leading consumer of pulses, with 27% of global consumption (Srivastava et al., 2010) . Although India is the largest producer of pulses (23020tonnes in 2019), domestic production is not sufficient to meet internal demand, and the country has to import 3 million to 5 million tonnes (15% of global imports) of pulses every year, making it the top pulse importer worldwide (Suresh and Reddy, 2016) . Despite the imports, in 2019, the consumption of pulses in India amounted to 48 g per capita per day, slightly less than the50 g per capita per day recommendation of the Indian Medical Research Council. One of the major hurdles in meeting self-sufficiency in pulses is policies that promote staple crop production, such as subsidies for fertilizers and credit and irrigation facilities that discourage the production of pulses and other legumes . In India, pulses are grown mostly under rain-fed conditions. Besides other external factors, erratic rainfall has a serious impact on the production of pulses (Reddy, 2009) . There is already a demand and supply gap for pulses in the country, and the uncertainty caused by vagaries in rainfall further widens the gap. Therefore, forecasting production, productivity, and prices is important for effective planning and decision-making related to the production of pulses. The time-series approach of forecasting is the most reliable one. On the basis of the past pattern in data, a very common method applied for forecasting a time series (Ray and Bhattacharyya, 2020) is the autoregressive integrated moving average (ARIMA) method. In a study by Vishwajith et al. (2018) on forecasting mung production, ARIMA(4,1,4) was the best-fitting model over ARIMAX and generalized autoregressive conditional heteroscedasticity (GARCH) models. Mishra et al. (2021) considered ARIMA models for forecasting of sugarcane production by major states for 2025. In contrast, Ray and Bhattacharyya (2020) found the ARIMAX(1,1,1) model for pulse production better suited than the ARIMA model. Price prediction is an important tool to forecast the market price, which is necessary for framing policies for sustained production and remunerative prices (Darekar and Reddy, 2017) . Savadatti (2017) applied the ARIMA model for projection purposes and observed stagnancy in the area of pulse production but a rise in pulse production and productivity. Many other studies have used the ARIMA model for forecasting; for example, for forecasting sugarcane production (Muhammad et al., 1992) and sugarcane and cotton crop production and yield (Ali et al., 2015) in Pakistan. In Tamil Nadu, ARIMA models were used for area, production, and productivity forecasting for various crops (Balanagammel et al., 2000) and sugarcane yield (Suresh and Krishna Priya, 2011) . Conditional variances are taken into account by use of the GARCH model. Yaziz et al. (2011) used both the GARCH model and the ARIMA modelto predict crude oil prices, and they concluded that the GARCH model was superior to the ARIMA model. However, Vishwajith et al. (2014) could not establish the superiority of either the GARCH model or the ARIMA model in modeling data for pulses in India. In the present investigation, the data relate to total pulse production for five major producing states and India from 1950 to 2019. To set model structure, 80% of the total data is selected for training and to approve the model, the remaining20% is chosen for the test. The statistical software package R was used for model building. Box and Jenkins (1976) introduced the ARIMA model, and thus it is also known as the Box-Jenkins method in the literature. ARMA model includes an autoregressive (AR) and a moving average (MA) model. While these models are suitable for stationary series, the ARIMA model is performed in nonstationary series . The easiest way to make a time series stationary is by taking the difference. The process of subtracting the values of a certain period from the last values of the time series is called the difference operation. In non-stationary data, ARMA(p, q) model is known as the ARIMA(p, d, q) models if the d-order difference operation is performed to make the data stationary. In the kind of equations in the ARIMA (p, d, q) models, p represents the degree of the AR model, q represents the degree of the MA model, and d represents the number of differences needed to stabilize the data Ray et al., 2021; Mishra et al., 2021) . The equation for the ARIMA(p, d, q) model is as follows: whereφ p represents the parameter values relating to the AR operator, α q is the error term coefficient, θ q represents the parameter values relating to the MA operator, and Y t represents the data with dth differences of the original data (Brockwell et al., 2016; Gujarati and Porter, 2012) . The following steps can be applied for fitting time-series data to an ARIMA model (Hyndman and Khandakar, 2007) . Step 1. Plot the data, detect any unusual observations, and transform the data to stabilize the variance if necessary. Step 2. Determine the values of p and q by analyzing the autocorrelation function (ACF) and the partial ACF (PACF), and from the selected model try to identify the best ARIMA model by using Akaike's information criterion (AIC) with correction (AICc). Step 3. From the best model, check the residuals by plotting the ACF and the PACF. Try to modify the model if the plotted ACF and PACF do not look like white noise. Step 4. If the residuals look like white noise, calculate the forecasts. For model selection, the mean squared error (MSE), root mean squared error(RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), AIC, AICc, and Bayesian information criterion (BIC) are statistical measures that evaluate the performance of the fit of the forecasting. The goodness-of-fit approach for a time series is based on the residuals. Therefore, the measure of forecast accuracy should be checked using many metrics together. The model with the best forecasting ability hasthe smallest error criterion value. Wheree t is the error term, y t is the observation, andỹ t is the forecast; alsoe t = y t −ỹ t . Here, k indicates the number of estimated parameters in the model and L stands the maximum value of the likelihood function for the model, n denotes sample sizes. .In this study, statistical analysis was performed with the statistical 1 . Autocorrelation function (ACF) and partial ACF graphs of the first differences of the data in pulses production in Major states in India using ARIMA models. software packageR. The auto. Arima() function in R was usedto determine the best ARIMA model. This function uses the Hyndman-Khandakar algorithm (Hyndman and Khandakar, 2007) to obtain an ARIMA model by combining the unit root test and minimization of the AICc. The Hyndman-Khandakar algorithm involves only Steps 2 and 3 given above; therefore, we conducted the other steps ourselves. Furthermore, the Arima() function in R(Version 4.03,https://stat.ethz. ch/pipermail/r-announce/2020/000662.html) was used to test other models we thought may be suitable on the basis of ACF and PACF graphs. Descriptive statistics for total pulse production between 1950 and 2019 are presented in Table 1for Karnataka, Madhya Pradesh, Maharashtra, Rajasthan, Uttar Pradesh, and India. While the minimum value for total pulse production in India is 8347, the maximum value is 25416.22 between 1950 and 2019. Thus, it can be said that there has been an increase of approximately 205% in total pulse production in India since 1950. Moreover, in Karnataka, Madhya Pradesh, Maharashtra, Rajasthan, and Uttar Pradesh, production increased by approximately 716%, 830%, 735%, 1530%, and 230%, respectively. As a result, the mean total pulse production is 13110.88 for India. The state with the highest average pulse production is Madhya Pradesh, with 2803.88, and the lowest is Karnataka, with 697.94. When the standard deviations of pulse production are analyzed, the highest standard deviation is seen in India, with 3531.55 (Table 1) . To test the normality of the data set, the descriptive statistics are calculated and then by dividing the skewness coefficient by the standard errors, the normality of the data is tested (Das et al., 2017) . Skewness and kurtosis should be within the range from +2 to − 2(a few authors also use the more lenient +3 to − 3). With use of this rule of thumb and as the data size is large, it can be concluded that the data sets are normally distributed. The positively skewed and platykurtic nature of the data for the states of Karnataka, Maharashtra, and Rajasthan indicate that there wasa marginal change of the area in favor of pulse production during the early period and it remained almost the same in the study period (Vishwajit et al., 2018) . The leptokurtic and positively skewed nature of the data for Madhya Pradesh and India indicates a very marginal change of the area during the early period. In Uttar Pradesh, the platykurtic and negatively skewed nature of the data indicates a marginal change of the area during the late period, and it remained almost the same during the of the study. ACF and PACF graphs of the first differences of the series are presented in Fig. 1 . According to the goodness-of-fit parameter, especially the AICc, the best models given in Table 2 were selected for five major states and India, and the best model was used to determine the forecast value for the 10years from 2020 to 2029. It is clearly seen from Fig. 1 and Table 2 that the data became stationary at first differences. As seen in Table 2 , the goodness of fit of ARIMA models was also assessed by various information criteria, such as the mean error, RMSE, MAE, mean percentage error, MAPE, mean absolute scaled error, likelihood, AIC, BIC, and the AICc used in the Hyndman-Khandakar algorithm. The residuals from the fitted models were checked by means of graphs of the ACF and PACF concerning the residuals given in Fig. 2A corrected result is observed if all autocorrelations are within the threshold limits and the residuals look like white noise. To assess future quantity on the basis of recent information, the application mainly uses time series in forecasting models (Das et al., 2019) . The present investigation aimed to establish the importance of ARIMA models and attempted to make short-term predictions for pulses in India and Indian states. The forecast values were generated between the years 2020 and 2029. Tables 3-8 show point forecasts and 80% and 95% prediction intervals obtained through the ARIMA models in Table 2 for five major states and India, respectively. Lo80 and Hi80 are the lower and upper bounds of the prediction interval for significance level α = 0.20, and Lo95 and Hi95 are the lower and higher bounds of the prediction interval for significance level α = 0.05. From these tables, it can be said that total pulse production will show a continuous increase in Karnataka, Madhya Pradesh, Rajasthan, and India and will decrease continuously in Uttar Pradesh(depicted in Fig. 1) . Pulse production in Karnataka, Madhya Pradesh, Rajasthan, and India is expected to follow a rising trend in the next few years on the basis of the forecast. It was estimated that in 2029, pulse production will reach 2090.95 in Karnataka (Table 3) , 8705.09 in Madhya Pradesh (Table 4) , 3846.63 in Rajasthan (Table 6) , and 31033.02 in India (Table 8) . However in Maharashtra, the pulse production trend is highly fluctuating (Table 5) , and this is also clearly visible from Fig. 3 . In Uttar Pradesh (Table 7) , the production trend is for a decrease, and this is also supported by Fig. 3 . Agricultural funding, price support programs, better management practice, research workers, etc., for long-term production will be the major factors to sustain this trend. Pulses are an important part of a healthy, well-balanced diet, and they are particularly prevalent in the Indian diet. Pulses are classified as both a mixed crop and an intercrop in the agricultural industry. Making predictions about pulse production would aid in determining whether or not demand would be met in the foreseeable future. It is indisputably established by the outcomes of this study that India would have the highest predicted value in the year 2029. Uttar Pradesh is experiencing a decline in overall pulse output, whilst Karnataka, Madhya Pradesh, and Rajasthan are experiencing an increase in production. Agriculture funding, price support programs, improved management practices, research employees, and other variables that will contribute to longterm output will be the most important factors in maintaining this trend. This type of project aids in the implementation of policy and the long-term planning for a particular crop. AIC, Akaike's information criterion; AICc, Akaike's information criterion with correction; ARIMA, autoregressive integrated moving average; BIC, Bayesian information criterion; LL, likelihood; MAE, mean absolute error; ME, mean error; MAPE, mean absolute percentage error; MASE, mean absolute scaled error; MPE, mean percentage error; RMSE, root mean squared error. Autocorrelation function (ACF) and partial ACF(PACF) of the residuals of the fitted models on pulses production in Major states in India using ARIMA models. All authors read and approved the final manuscript. Preparation of original manuscript: PM, AY, and HY. Data compilation: BK, MA, SSD, and SGP. Coding and analysis: BK, SSD, and SGP. Results and discussion: PM, AY, and HY. Finalization of the manuscript: PM, AY, HY, and MA. All authors have read and approved the final manuscript. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Lo80, lower bound of the predictive interval for significance level α = 0.20; Hi80, upper bound of the predictive interval for significance level α = 0.20; Lo95, lower bound of the predictive interval for significance level α = 0.05; Hi95, higher bound of the predictive interval for significance level α = 0.05. Lo80, lower bound of the predictive interval for significance level α = 0.20; Hi80, upper bound of the predictive interval for significance level α = 0.20; Lo95, lower bound of the predictive interval for significance level α = 0.05; Hi95, higher bound of the predictive interval for significance level α = 0.05. Lo80, lower bound of the predictive interval for significance level α = 0.20; Hi80, upper bound of the predictive interval for significance level α = 0.20; Lo95, lower bound of the predictive interval for significance level α = 0.05; Hi95, higher bound of the predictive interval for significance level α = 0.05. Total pulse production forecasting for Rajasthan(millions of kilograms Lo80, lower bound of the predictive interval for significance level α = 0.20; Hi80, upper bound of the predictive interval for significance level α = 0.20; Lo95, lower bound of the predictive interval for significance level α = 0.05; Hi95, higher bound of the predictive interval for significance level α = 0.05. Total pulse production forecasting for Uttar Pradesh(millions of kilograms). Lo80, lower bound of the predictive interval for significance level α = 0.20; Hi80, upper bound of the predictive interval for significance level α = 0.20; Lo95, lower bound of the predictive interval for significance level α = 0.05; Hi95, higher bound of the predictive interval for significance level α = 0.05. Fig. 3 . Point forecasts and 80% and 95% prediction intervals obtained with autoregressive integrated moving average models on pulses production for major states in India. (From 2020 (From -2029 . Forecast Lo80 Hi80 Lo95 Hi95 Lo80, lower bound of the predictive interval for significance level α = 0 Hi80, upper bound of the predictive interval for significance level α = 0 Lo95, lower bound of the predictive interval for significance level α = 0 Forecasting production and yield of sugarcane and cotton crops of Pakistan for 2013-2030 Forecasting of agricultural scenario in Tamil Nadu: a time series analysis Time series analysis, control, and forecasting Introduction to Time Series and Forecasting Price forecasting of pulses: the case of pigeon pea Different methods for judging the normality assumption for univariate and bivariate data and its remedial measure Statistical study on modeling and forecasting of jute production in West Bengal The State of Food Security and Nutrition in the World 2020. Transforming Food Systems for Affordable Healthy Diets. FAO Basic Econometrics. Tata McGraw-Hill Education Automatic Time Series for Forecasting: the Forecast Package for R(no. 6/07) Global pulse production and consumption trends: the potential of pulses to achieve 'Feed the Future' food and nutritional security Forecasting sugarcane production in Pakistan using ARIMA models Modeling and forecasting of sugarcane production in India. Sugar Tech Modeling and forecasting of milk production in the SAARC countries and China. Modeling Earth Systems and Environment Time series modelling and forecasting of pulses production behaviour of India Time series SARIMA Modelling and forecasting of monthly rainfall and temperature in the south Asian countries Pulses production technology: status and way forward Trend and forecasting analysis of area, production and productivity of total pulses in India Diagnosis of pulses performance of India Forecasting sugarcane yield of Tamilnadu using ARIMA models Total factor productivity of major pulse crops in India: implications for technology policy and nutritional security Analyzing COVID-19 outbreak for Turkey and eight country with curve estimation models, Box-Jenkins (ARIMA), Brown linear exponential smoothing method, autoregressive distributed lag (ARDL) and SEIR models Time series modelling and forecasting of pulses production in India Modelling and forecasting of Arhar production in India A comparative study on Box-Jenkins and GARCH models in forecasting crude oil prices Modeling and forecasting for the number of cases of the COVID-19 pandemic with the curve estimation models, the Box-Jenkins and exponential smoothing methods The work was supported by Act 211 Government of the Russian Federation, contract No. 02.A03.21.0011. The work was supported by the Ministry of Science and Higher Education of the Russian Federation (government order FENU-2020-0022).