key: cord-0976529-k5uacc6s authors: Chen, Ying; Guo, Jifeng; Huang, Junqin; Lin, Bin title: A novel method for financial distress prediction based on sparse neural networks with [Formula: see text] regularization date: 2022-04-27 journal: Int J Mach Learn Cybern DOI: 10.1007/s13042-022-01566-y sha: 4b7c59255ae6ed5f94d24f26cf72cf6d1c130f52 doc_id: 976529 cord_uid: k5uacc6s Corporate financial distress is related to the interests of the enterprise and stakeholders. Therefore, its accurate prediction is of great significance to avoid huge losses from them. Despite significant effort and progress in this field, the existing prediction methods are either limited by the number of input variables or restricted to those financial predictors. To alleviate those issues, both financial variables and non-financial variables are screened out from the existing accounting and finance theory to use as financial distress predictors. In addition, a novel method for financial distress prediction (FDP) based on sparse neural networks is proposed, namely FDP-SNN, in which the weight of the hidden layer is constrained with [Formula: see text] regularization to achieve the sparsity, so as to select relevant and important predictors, improving the predicted accuracy. It also provides support for the interpretability of the model. The results show that non-financial variables, such as investor protection and governance structure, play a key role in financial distress prediction than those financial ones, especially when the forecast period grows longer. By comparing those classic models proposed by predominant researchers in accounting and finance, the proposed model outperforms in terms of accuracy, precision, and AUC performance. Corporate financial distress is one of the important research issues internationally. Both theoretical researchers and practical experience show that failures and bankruptcy filings are a result of financial or economic distress. Even if the firms survive from corporate failures, financial distress can still cause significant direct and indirect costs to them and their stakeholders [3, 5, 25, 35] . Therefore, how to detect and prevent financial distress on a timely basis would offer great attention and significant value to firms, regulators, investors, and other interest-related parties. Researchers have devoted great efforts to find efficient and effective methods for corporate financial distress prediction, which mainly contains two research streams. One is the expansion and supplement of the predictive factors based on classic statistical methods, e.g., multivariate discriminant analysis (MDA) and logit regression model (Logit) or probit regression models(Probit). In this research stream, various financial and non-financial variables had been explored [2, 7, 11, 19, 52] . However, these classic statistical models were restricted by strict assumptions, such as variables being normally distributed, equal variance covariance matrices across treating and control groups and the absence of multi-collinearity etc. [6, 48] . These assumptions had greatly limited the number of predictors in the models, which made it hard to deal with a large number of predictors and improve accuracy. The other stream in financial distress prediction is the choice and innovation of the method. To overcome the limitations of classic statistical methods, some researchers started to apply machine learning methods into FDP, among which support vector machine (SVM), decision tree (DT) and neural networks (NN) has been widely used [53, 62, 71] . Compared with classic statistical methods, machine learning methods increase the quantity of variables in the models, enabling features selection and accuracy improvement. Especially, as an efficient method, neural network has been proved to possess the ability to approximate any nonlinear functions, and has been successfully applied to and exhibited excellent performance in FDP [12, 20] . However, existing methods for FDP always utilize only financial variables as predictors, ignoring non-financial variables. Actually, the related researchers have found that financial variables are the reflection of corporate financial situation, while non-financial variables including strategy and governance structure indeed determine the financial situation [38, 44] . Therefore, non-financial variables may be more powerful than financial variables in FDP. Considering the above issues, this paper adopts more predictors including financial variables, non-financial variables, and also proposes a novel feature selection and prediction method for financial distress using sparse neural networks with L 1∕2 regularization, in which the weights connect the input and hidden layer are designed in sparse coding, achieving the purpose of feature selection. Meantime, based on the simplified data, the recognition networks can act out better classification effects. The contributions of this paper are listed as follows: • A novel prediction method for financial distress is proposed, which adopts the sparse neural networks with L 1∕2 regularization to simplify features, and further improves recognition accuracy. Besides, it is extremely beneficial to the interpretability of the model. • This paper considers extensive predictors including financial and non-financial variables, which greatly improves the accuracy of the FDP model. Besides, the results also show that non-financial variables are more important in the FDP. The organization of the paper is described below. Sect. 2 reviews the related work. Section 3 introduces the related technologies and the proposed method. Section 4 exhibits and analyses experiments. Section 5 is conclusion and future work. In this section, the related works on the financial distress and the neural networks, especially the technique in neural networks and sparsity. The FDP has been extensively researching areas since the late 1960s. Various statistical and intelligence techniques have been used in this area. However, the most widely used methods are still those classic statistical methods, e.g., MDA, Logit and Probit. Because the most important issue for finance and accounting researchers is to explore new financial distress predictors to build and verify the FDP theory and those methods are qualified for these tasks. Since Altman [2] have innovatively used financial variables to predict financial bankruptcy, researchers started their work on the expansion and supplement of financial predictors [9, 19, 47] . Hereafter, both accounting and finance researchers found that except for financial variables, the non-financial variables, such as government structure [11, 57] , information disclosure [34, 52] , investor protection [8] and strategy [23] can be used to predict financial distress. During exploring new factors, classic statistical methods had been widely used. However, those statistical models were questioned and criticized by strict assumptions of variation homogeneity of data [48] , which makes them being sensitivity to multicollinearity and limited in the number of predictors. Kumar and Ravi [43] found that the maximum number of significant predictors in those models is 20 variables and more predictors would not improve the prediction performance anymore. That is, limitation for variables have restricted the information content in prediction models. As a result, it has been hard to select key features from a large number of predictors and improve accuracy. With the development of statistical methods and computer technology, machine learning methods started to be applied in FDP, including support vector machine, decision tree and neural networks. For example, Min and Lee [53] constructed a FDP model based on SVM with 38 financial variables as predictors. Sun and Li [62] design a FDP model based on DT with 35 financial variables as predictors. Chen and Du [12] test a FDP model based on NN with 37 financial variables as predictors. Zhou et al. [71] combined multiple machine learning approaches to select 20 features from 338 financial variables for FDP. Compared with classic statistical methods, the machine learning methods have advantages in their capability of modeling complex relationships between independent and dependent features without strong model assumptions, which makes it possible to put more predictors in a model so as to select features and improve accuracy. However, existing machine learning-based methods for FDP always use only financial variables without considering nonfinancial variables. Detailed comparison between classic statistical methods and machine learning methods for FDP is shown in Table 1 . Machine learning methods, especially neural networks, have been proved can fit linear and nonlinear relationships [31] and have been applied in various fields [1, 13, 30] . Kiran et al. [42] applied artificial neural networks (ANN) to predict the number of students taking make-up examinations. Singh [61] used it to determine the length of intervals in fuzzy time series (FTS) forecasting. Singh et al. [65] adopted the backpropagation neural network (BP-NN) to reconstruct the missing color-channel data. Namasudra, Dhamodharavadhani, and Rathipriya [54] proposed a neural networkbased tool to predict the confirmed, recovered, and death of COVID-19. Goel, Murugan, Mirjalili, and Chakrabartty [29] also achieved its automatic diagnosis by using convolutional neural network. Especially, Chen and Du [12] applied data mining techniques in the form of neural networks to build and test financial distress prediction models. Meatime, they also demonstrated its feasibility and validity. Hereafter, many researchers supported the neural networks approach and found that neural networks performed better in financial distress predicting than decision trees and other alternative approaches such as SVM [26] . Despite its excellent performance, neural networks still face great challenges in dealing with high-dimensional data. The redundant information in the high-dimensional data [46] will seriously influence the performance of classifiers, especially these methods that have no feature extraction or selection [69] . Sparsity provides an effective method to reduce features and improve performance, and plays an increasingly important role in fields such as machine learning and image processing [55] . The sparsity approach removes a large number of redundant variables and retains only the explanatory variables that are most relevant to the response variables, simplifying the model and effectively solving many problems in modeling high-dimensional datasets [45, 66] . It has better explanatory power and facilitates data visualization, reduced computational effort and transmission storage. L 0 regularization is the first sparse regularization method applied to variable selection and extraction, which can give the optimal variable selection constrained by the number of parameters. However, it needs to solve a difficult combinatorial optimization problem. The L 1 regularization proposed by Tibshirani [63] provides a powerful tool that only needs to solve a quadratic programming problem. However, its sparsity is lower than L 0 . The L 1∕2 regularization between them had proved to have better feature selection ability and compression representation ability than L 1 , which has a wide range of value and significance [70] . M. Chen, Mi, He, Deng, and Wei [14] replaced the L 1 regularization with L 1∕2 regularization in the reconstruction of the CT images, achieving great unbiasedness and acceleration. Liu et al. [50] proved its effectiveness in the variable selection. Wu et al. [68] investigated gene selection in cancer classification using the L 1∕2 regularized logistic regression, which outperforms the other sparse methods. In this section, the basic methods and the proposed FDP-SNN will be described. In addition, the basic methods include neural networks and L 1∕2 regularization. The neural networks is a mathematical or computational model that mimics the structure and function of a biological neural network. It consists of a large number of neurons linked together for computation. In most cases, neural networks can change their internal structure on the basis of Chen and Du [12] external information and are adaptive systems [21] . The neural networks are a nonlinear statistical data modeling tool, often used to model complex relationships between inputs and outputs, or to explore patterns in data. Figure 1 illustrates an example of the neural network. For the input data X, its actual output is Y and dimension is N × M ∈ ℝ , among which the N is the number of the sample, and the M is the number of the feature. The value of the hidden layer is computed by Eq. (1). where the W1 is the weight connecting the input layer and the hidden layer, and the b1 is the corresponding bias. The f 1 () is the activation function. Based on the value of the hidden layer H, the output of the neural network is computed by Eq. (2) . where the W2 is the weight connecting the hidden layer and the output layer, and the b2 is the corresponding bias. Similarly, the f 2 () is the activation function. After obtaining the predicted output, its weight and bias are trained by using gradient descent optimization algorithms [59] . For the predicted output Z, the loss function can be established when using the Cross-entropy function, and is represented by Eq. (3). The gradients of loss function L with respect to W and b are calculated by Eqs. (4) and (5), respectively. Then the weight Wj is updated iteratively by Eq. (6) . where ∈ (0, 1) is the learning rate, while the bias b is updated iteratively by Eq. (7). During the optimization, an iteration termination condition is set, either by terminating the recursion when the error is less than a certain value, or by setting the number of iterations. When it is finished, the neural network with optimal parameters can be obtained. Variable selection and feature extraction are the basic problems when processing the high-dimensional and massive data. If there are redundant variables in the data, identifying the real variables while eliminating the redundant ones is called the sparse problem. Since the L 1∕2 regularization can produce a sparser solution than the L 1 regularization and is easier to solve than the L 0 regularization, it has been widely applied in sparse problem [45] . For the data {X, Y} , assuming there is a unknown but definite dependencies f * (x) . Based on the training data, the variable selection aiming at prediction accuracy can be achieved by minimizing the expected risk as Eq. (8) . where is the obtained parameter finally. Since the distribution function of {X, Y} is unknown, the expected risk is replaced by empirical risk and calculated by minimizing empirical risk, as Eq. (9). Generally, over fitting occurs when solving the Eq. (9). To avoid this issue, it is solved by imposing some constraints on Eq. (9), such as Eq. (10). where P( ) is the sparse regularization term, and is its coefficient. When L 1∕2 regularization is adopted, the parameter estimates L 1 2 can be calculated using Eq. (11). . . Hidden Layer Output Layer b1 b2 W1 W2 X H Y Fig. 1 An example for the neural network with one hidden layer The framework of the proposed FDP-SNN is shown in Fig. 2 . First, the original data are input into the neural network with sparse regularization for optimization. After learning, the neural network can predict whether the company is facing financial distress. Meantime, the importance of various variables can be sorted by the weight in sparse coding, achieving the purpose of variables selection. The details about the method are described in the following subsections. Since the financial distress in this work is high-dimensional data including financial variables and non-financial variables, it is necessary to reduce the redundant features information. Aiming at the characteristics of mutual correlation and nonlinearity between the characteristics of financial data, the advantages of neural network in solving nonlinear problems are combined with sparsity norm to solve the problem of feature selection and classification. In addition, the L 1∕2 norm is adopted to sparse the weights in the neural network, because it has better sparsity than L 1 and L 2 norm, and even their combination. As shown in Fig 2, the original financial distress data { , } are fed into the single hidden neural network with p input nodes, q hidden nodes and 2 output nodes. Its initial weight of each layer are W1 pq ∈ ℝ and W2 q2 ∈ ℝ . The transfer function from hidden layer to output layer is f ∶ ℝ → ℝ . Particularly, it adopts a sigmoid function as an example. ) is a defined energy function. Its final predict output Z is calculated by Eq. (12) . To achieve the purpose of the feature selection, the weights W1 between the input and hidden layer are restricted by L 1∕2 regularization, and the loss function of neural network is modified to Eq. (13) . Then in the process of back propagation, its gradients are represented by Eq. (14) and Eq. (15). The weight W1 is updated iteratively by Eq. (16) . where the weight W2 is updated iteratively by Eq. (17). The algorithm runs IteraMax iterations to obtain the optimal classification model. In addtion, the other purpose of the method is to select the influential features to further explained and analyzed the model. Based on the obtained weights W1 connecting the input layer and the hidden layer, the feature selection process can be implemented. Their absolute values are used as the The framework of FDP-SNN. In the figure, the weights in dotted line represent their values are less than E-03 ranking basis to get the feature ordering for the financial distress, which can be called predictive power weight Wp. For a variable x i , its predictive power weight Wp i can be computed by Eq. (18) . For all variables, a sorted list can be obtained, it is represented by Eq. (19) . In which list, the top N f features can be selected and analyzed to explain the model. The entire processes are described in Algorithm 1. 1: Randomly initialize all connection weights and bias in the network within the range of (0, 1); 2: for i < IteraM ax do 3: Calculating the output via forward propagation and Eq. (15); Calculating the loss value by using Eq. (13); Calculating the gradients of W 1 and W 2 by Eq. (14) and Eq. (15); Updating W 1 and W 2 by Eq. (16) In this section, the FDP data, evaluation standard, parameter analysis, model performance and its interpretability will be introduced. Previous research set many criteria to define and distinguish whether the company is or will be in financial distress, among which firm bankruptcy [2, 32, 52, 56] , debt restructuring [4, 11, 27] , and debt default [10, 28, 33] are most widely used. 1 In China, the sample size of listed firms filing for bankruptcy is extremely small [8, 15] , thus we use the other two criteria to define corporate financial distress: (1) whether the firm is experiencing a debt restructuring in a given year (debt restructuring), and (2) whether the firm has a debt default in a given year (debt default). In the China stock market, the listed companies always disclose their financial statements for the last fiscal year around April in a year [71] . To predict the financial distress for a company, we used predictor data obtained 2 and 3 years before the companies met the financial distress criteria. Detailed definitions of financial distress variables are shown in Table 2 . To identify those predictors, we investigate financial distress prediction papers in top accounting and finance journals both at home and abroad, and finally conclude 199 predictor variables. These predictor variables include both financial variables and non-financial variables measuring kinds of aspects of an enterprise, such as capital structure, by year. We can see that the absolute number of FSMs is increasing over time with a transitory decline in year 2010-2011. However, the proportion of FSMs is decreasing by year before 2015. Figure 3 describes the distribution of predictor variables. We can see that the number of financial predictor variables is much larger than those non-financial ones, manifesting those previous researchers have focused on financial variables to predict financial distress. Among those prediction variables, capital structure, profitability and variability are the most commonly used financial variables, while governance structure is the most frequently used non-financial variables. In the experiments, the samples are consisted of 10,533 firm-year observations to predict financial distress in the next 2 years and 9189 firm-year observations to predict financial distress in the next 3 years. For each one, 80% Table 3 . We collect data of financial distress variables and predictor variables from two commonly used database: China Stock Market and Accounting Research Database (CSMAR) and Chinese Research Data Services Platform (CNRDS). The sample period is from 2007 to 2019 and there are 10,731 company-year observations. The financial distress variables whose missing values take more than 10% of the total company-year observations are excluded. After that, there are 163 financial variables measuring capital structure, cash management, development capability, liquidity, profitability, shareholder benefit, size, turnover, variability and 36 nonfinancial variables measuring governance structure, information disclosure, investor protection and strategy. Table 4 is the sample distribution of financial distress firms (FSMs) of them is randomly selected as the training set, and the remaining 20% as the test set. In the experiment, the performance of the models is measured in terms of the accuracy and precision, which are calculated by the Eqs. (20) and (21) [18] . where the variables, e.g. TP, are listed in the Table 5 . Besides, since financial distress is originally highly imbalanced, we thereby adopt Area under ROC curve (AUC) to measure the performance of the models. ROC graph is a two-dimensional graph in which sensitivity is plotted on the Y axis and 1-specificity is plotted on X axis. An ROC graph depicts relative trade-off between benefits (TP) and costs (FP). AUC is a good performance measure especially for the highly imbalance data [22] . In order to capture the optimal performance of the model, the parameters of the FDP-SNN are analyzed and shown in Fig. 4 . For the regularization coefficient , a small or large value both are not conducive to the improvement of model performance, and their accuracy is only about 50%. When = 0.0001 , the accuracy reaches the highest value 86.48%. With regard to hidden nodes, there are some upward and downward trends in a certain accuracy in a small range. Fig. 3 The distribution of predictor variables by category Table 5 The definitions of variables in the Eqs. 20 and 21 Variables Definitions TP (true positive) An instance is positive class and is also judged to be a positive class FN (false negative) An instance is originally positive class while is judged to be false class FP (false positive) An instance is originally a false class while is judged to be positive one TN (true negative) An instance is a false class and is also determined to be a false class Relatively speaking, more neural nodes can improve the accuracy of the model. Finally, 45 hidden nodes are adopted and its accuracy is 86.48%. Besides, the iteration is also analyzed. With the increase of the iteration, the training accuracy has been rising, while the test accuracy has decreased sharply when iteration is 50,000. This is because the model with large iteration will be overfit, and further led to a significant reduction of the test accuracy. For the learning rate, despite large fluctuations, its overall trend is also rising first and then dropping. When the learning is set to 0.4, the FDP-SNN has the highest accuracy. Based on the above parameter analysis, the parameter settings of all experiments are determined, which have displayed in Table 6 . In this paper, L 1∕2 regularization is adopted to sparse the weight in neural networks to select the effective features and make it correct decisions, improving performance. To verify its effectiveness, a verification experiment is set up. It compares the accuracy on two predictors (neural networks) that with or without the sparse regularization. As shown in Fig. 5 , the predictor that with sparse regularization accurately takes advantage. Despite different predicted targets, its all accuracy is higher than those without one. Particularly, the maximum promoting value The verification on effectiveness of sparse regularization reaches 6.53% on the F3-ds. These phenomena prove that selecting more effective feature using sparse regularization is indeed helpful to improve the ability of model recognition. Table 7 compares test accuracy of the proposed sparse neural networks and other intelligent methods on the four indexes. It can be observed that sparse neural networks are superior to other methods in accuracy. Regardless of the four indexes, the effect of naive bayes is the worst because it assumes that attributes are independent of each other. However, both the number and the correlation of attributes in this work are large, which makes classification effect poor. Contrast to other methods, the neural networks successfully improve the classification effect, and their accuracy on the F2-ds has up to 82.20%. Importantly, the values have been improved further when introducing the sparse regularization, where it has up to 87.30% on the F3-ds, which has proved that selecting valid features is helpful for classification. Additionally, to verify performance improvement of the proposed model, it is also compared with models proposed by predominant researches, which employ classic statistical methods. The Z-Score proposed by Altman [2] is the first study that uses MDA to predict corporate financial distress. O-Score proposed by Ohlson [56] is the first work that adopted Logit to predict the financial distress of listed companies. Campbell et al. [10] proposed a simplified financial distress prediction model, which combined the traditional financial variables with non-financial variables (the stock market variables), which is also supposed to be more predictive than Z-Score and O-Score. Their comparisons are detailed in Table 8 . As shown in Tables 9 and 10 , the proposed model outperforms the benchmark models on all of the performance measures. Among them, the precision in our model is almost twice that of the benchmark model, attributing the success of the improvement to the effectiveness of the variables and the features selection in the sparse neural networks. On the one hand, our model allows the input of more variables. However, due to the limitation of the number of input variables, traditional models only use financial variables, ignoring non-financial variables with more predictive power, so the performance is weaker. On the other hand, sparse neural networks can focus on more effective features, making the model correct decisions, to improve performance. Experimental results proved that the sparse neural networks for feature selection and prediction is effective. Based on the obtained FDP-SNN, each predictor variable can be given a predictive power weight, which is able to explain the model to some extent. As shown in Table 11 , non-financial variables have greater predictive power than those financial variables. Besides, this difference is more significant when our model is predicting financial distress in the next 3 years rather than next 2 years, which indicates that non-financial variables become more important with the forecast period grows longer. The average weights of strategy predictors are instable among the groups of financial distress. They have greater prediction weights for F2-ds and F3-df, but are relatively lower for F3-ds and F2-df. One possible reason is that there are too few variables in the strategy group. Investor protection group, by contrast, is consistently and highly predictive among all financial distress groups. This result consists with the theory of law and finance, which suggests that investor protection is a key factor affecting corporate finance [58, 60] . Tables 12 and 13 demonstrate the top 10 features with the largest weights extracting from models predicting debt restructuring and debt default. The definitions of these features are summarized in Table 15 . Consist with our findings that non-financial variables have greater predictive power, all the features with the largest weight for F2-ds, F3-ds, F2-df and F3-df are non-financial variables. Specifically, the feature Develop a dummy variable indicating whether the company registered in the developed provinces 2 has the greatest predictive power for F2-ds and F2-df. This feature was proposed by Hu and Jin [36] , who believed that the theory of political tournaments implied local governments had a strong incentive to internalize social burdens in listed companies in their jurisdictions. Thus, it is reasonable to participate that the level of development where a company is located would affect the firms financial positions. The feature SOE, a dummy variable indicating whether the firm is state-owned enterprise or not, has the greatest power to predict F3-ds. Consist with Wu and Wu [67] , we believed that state-owned enterprises had strong support from the government and were less likely to run into financial distress. The feature HPAINV, a dummy variable indicating whether the company is investment-oriented or not, has the greatest power to predict F3-df. This feature was put forward by Wang et al. [64] , who suggested that compared with operation-oriented, investment-oriented company suffer less financial risk. Moreover, non-financial predictor variables are more important with the forecast period grows longer. As shown in Table 12 , non-financial variables occupy the top 2/3 features when predicting F2-ds/F3-ds. Most importantly, when we use debt default to proxy financial distress (results are shown in Table 13 ), there are only 3 non-financial variables in top 10 when predicting F2-df. However, when predicting F3-df, the number of non-financial variables raise up to 6 with top positions. Based on the comparison between the weights of financial and non-financial predictor variables, we may conclude that non-financial predictor variables are more important in predicting financial distress. However, the number of financial predictor variables is 163, while the number of non-financial predictor variables is 36, which means that the difference of weight may be driven by the difference of variables number. Thus, we carry out a robust test. Specifically, 36 financial indicators with the highest weight are selected and put into the model together with 36 non-financial indicators to recompare the weights. As shown in Table 14 , except for predicting F2-ds, nonfinancial indicators have strong predictive ability in predicting other financial distress variables, including higher average weight and higher proportion in the ten variables with the highest weight. This suggests that financial variables are only better at predicting short-term debt restructurings. As the scope of financial distress expands and the predicting period becomes longer, the predictive power of non-financial variables increases. Those results are consistent with the new finding that non-financial variables are more powerful than financial ones in explaining and predicting corporate financial conditions. There are two main reasons. Firstly, non-financial variables are usually the determinants of the corporate financial situation, while financial variables are its reflection. The current financial situation of a company is the result of its past operation and governance, and the future financial situation depends on its current management model ( [44] ). In summary, financial variables measure what a company "has done," while non-financial variables measure what a company "is doing." Thus, non-financial variables, measuring a company's operation and governance, are more future-orientated. Secondly, financial variables are generated by the financial accounting information disclosed by the company, which is easily manipulated by the management [38] . It has the following three attributes: (1) the production process is complicated, (2) can be affected by management accounting policies, and (3) often used to evaluate management performance, and thus management has the motivation and ability to manipulate financial accounting information. In contrast, non-financial accounting information does not possess these attributes. To a certain extent, non-financial variables are also more reliable than financial variables. In this study, a novel prediction method for financial distress is proposed, which is based on sparse neural networks and whose hidden layer with L 1∕2 regularization can select the efficient feature, so as to improve the performance on prediction. Based on the existing accounting and finance theory, we identify 163 financial variables and 36 non-financial variables that might affect financial distress and then select the top 10 predictors with the largest weights. The empirical results show that non-financial predictor variables are more important in financial distress prediction especially when the forecast period grows longer. Besides, the performance of FDP-SNN is assessed by comparing it with three benchmark models and find that FDP-SNN outperforms these benchmark models in accuracy, precision, and AUC performance by a large margin. From this study, we can get the following inspirations: First, sparse neural networks with L 1∕2 regularization can be used to select features and build a better model for predicting financial distress. Second, the neural networks model enables us to consider the financial distress predictors from multiple aspects comprehensively without considering the limitation of input variables. Finally, some future-oriented non-financial variables play a key role in the prediction of Trend breaks in (net quick assets/inventory) Blum [9] financial distress, which is consistent with accounting and finance theory. Future research might also include other variables which have not been mentioned by these existing papers but are important in finance and accounting, such as managers' personalities, corporate cultures, and organizational identification, etc. Besides, whether the correlations of these predictors are determinant factors in the financial distress prediction will also be our future research. Automatic electroencephalographic information classifier based on recurrent neural networks Financial ratios, discriminant analysis and the prediction of corporate bankruptcy How costly is financial (not economic) distress? Evidence from highly leveraged transactions that became distressed Evaluating financial distress resolution using prior audit opinions Destructive creation at work: how financial distress spurs entrepreneurship 35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems Have financial statements become less informative? evidence from the ability of financial ratios to predict bankruptcy Financial distress of Chinese firms: microeconomic, macroeconomic and institutional influences Failing company discriminant analysis In search of distress risk Discriminating between reorganized and liquidated firms in bankruptcy Using neural networks and data mining techniques for the financial distress prediction model Pothole detection using locationaware convolutional neural networks A CT reconstruction algorithm based on L1/2 regularization Predicting financial distress in Chinese listed firms Over-speed growth, financial crisis and risk forecasting Ratio stability and corporate failure The relationship between precisionrecall and roc curves A discriminant analysis of predictors of business failure A comparative analysis of artificial neural networks using financial distress prediction Automatic selection of hidden neurons and weights in neural networks using grey wolf optimizer based on a hybrid encoding scheme Roc graphs: notes and practical considerations for researchers Measuring distress risk: the effect of r&d intensity Introducing recursive partitioning for financial classification: the case of financial distress Financial distress and competitors' investment Prediction of financial distress: an empirical study of listed Chinese companies using data mining Classifying bankrupt firms with funds flow components Troubled debt restructurings: an empirical study of private reorganization of firms in default Optconet: an optimized convolutional neural network for an automatic diagnosis of COVID-19 An efficient inspection system based on broad learning: nondestructively estimating cement compressive strength with internal factors An efficient model for predicting setting time of cement based on broad learning system Assessing the probability of bankruptcy Some variables have multiple references, but only the earliest ones are retained here Credit ratings and credit risk: is one measure enough? A test of the incremental explanatory power of opinions qualified for consistency and uncertainty Indirect costs of financial distress in durable goods industries: The case of auto manufacturers Social burden and the dynamics of corporate financial distress: an investigation based on the st rules Predicting corporate financial distress based on integration of support vector machine and logistic regression The association between non-financial performance measures in executive compensation contracts and earnings management Governance weakening and financial distress: a forecasting model Managerial overconfidence, firm expansion and financial distress Predicting firm financial distress: a mixed logit model Prediction of the number of students taking make-up examinations using artificial neural networks Bankruptcy prediction in banks and firms via statistical and intelligent techniques-a review The end of accounting and the path forward for investors and managers Hierarchical extreme learning machine with l21-norm loss and regularization A novel feature learning framework for high-dimensional data classification The use of simulated decision makers in information evaluation Predicting business failure using classification and regression tree: an empirical comparison with popular classical statistical methods and top classification mining methods Research on operation failure warning of listed companies based on artificial neural network method The l1/2 regularization method for variable selection in the cox model Early warning of bank failure: a logit regression approach Md&a disclosure and the firm's ability to continue as a going concern Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters Nonlinear neural network based forecasting model for predicting COVID-19 cases Financial ratios and the probabilistic prediction of bankruptcy The merger/bankruptcy alternative Law and finance An overview of gradient descent optimization algorithms Investor protection and equity markets Rainfall and financial forecasting using fuzzy time series and neural networks based model Data mining method for listed companies financial distress prediction Regression shrinkage and selection via the lasso Comparative study of the effect of financial distress pre warning model between the consolidated and parent financial statement Reconstruction of missing color-channel data using a three-step back propagation neural network An adaptive kernel sparse representation-based classification A study on prediction model for changes of financial status based on value-creation and corporate governance Gene selection in cancer classification using sparse logistic regression with l1/2 regularization Research on classification method of high-dimensional class-imbalanced datasets based on svm 1∕2 regularization The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches Methodological issues related to the estimation of financial distress prediction models The datasets generated and analysed during this study are available in the CSMAR repository: https:// www. gtarsc. com/. The authors have no relevant financial or non-financial interests to disclose. This appendix table provides the supplementary information that is not an essential part of the text itself but which may be helpful in providing a more comprehensive understanding of the research. All Variables are from the related references. 3