Data Reduction Influence on the Accuracy of Credit Risk Estimation Models
Keywords:
artificial neural networks, credit risk, discriminant analysis, factor analysis, logistic regression.Abstract
Credits in banks have risk of being defaulted. The main purpose of credit risk estimation in banks is the determination of company‘s ability to fulfil its financial obligations in future. It is very important to have a proper instrument for the estimation of credit risk in banks because it reduces potential loss due to crediting reliable clients. Banks develop internal credit risk estimation models and various data analysis methods can be applied for this purpose. Statistical predictive analytic techniques and artificial intelligence can be used to determine default risk levels. Banks must also have data about clients from the activity in the past. To understand risk levels of credits, banks usually collect information about borrowers. Financial ratios remain primary variables for predicting corporate financial distress. The principal financial ratios as variables for the analysis are indicators of company‘s financial structure, solvency, profitability and cash flow. Credit risk estimation models are based on the analysis of this data. Using these models it becomes possible to predict the default possibility of new clients.Credit risk estimation models in banks differ significantly in architecture and operating design. The main reason for these differences is that banks’ models are assigned by bank personnel and are usually not revealed to outsiders. The object of this research is credit risk estimation models. The purpose of research is to develop credit risk estimation models and to evaluate an influence of input data reduction on credit risk models accuracy. Two methods were applied in this research: the analysis of scientific publications about estimation of credit risk and the analysis of developed in this research credit risk estimation models performance. Analyzing financial data of Lithuanaian companies three initial credit risk estimation models were developed wherein these data analysis methods were applied: discriminant analysis (DA), logistic regression (LR) and artificial neural networks (ANN) - multilayer perceptron. 60 financial rates of companies were analyzed. They were calculated from the financial reports of Lithuanian companies for 3 years. The variable selection for the DA model was accomplished applying the analysis of variance (ANOVA) and Kolmogorov-Smirnov test. The variable selection for the LR model was accomplished applying ANOVA. The actual variables for credit risk analysis in ANN model were selected by network according to their ranks of importance.The classification accuracy of models was evaluated by the correct classification rate (CCR).The highest classification accuracy was reached by LR model, which classified 97% of companies correctly. ANN model correctly classified 95.5%, DA model – 84% of companies. Further situation was analyzed where 60 initial variables were reduced applying factor analysis and the changes in classification accuracy of models were estimated. The number of factors to retain were calculated by the Kaiser criterion ant the scree test.After the factor analysis the 6 new credit risk estimation models were developed applying the same data analysis methods: DA, LR and ANN. By every method 15 and 6 new variables obtained from the factor analysis were analyzed. The research has shown that the new 15 variables extract 89.37% of variance from initial variables. Analyzing these variables, the percent of correctly classified companies mostly decreased in ANN model (-14.6%). The classification accuracy of other models decreased from 2.0% to 7.1%. If an analyst includes into credit risk estimation only 6 new variables, which extract 63.92% of variance from initial variables, the highest decrease in classification accuracy will also be in ANN model (-15.7%).