WIT Press

Comparative Performance Of Credit Scorin G Models Using Aggregated Predictors


Free (open access)







486 kb

Paper DOI



WIT Press


S Caiazza & S Borra


Comparative performance of credit scoring models using aggregated predictors S. Caiazza, S. Borra Faculty of Economics, University of Rome \“Tor Vergata”, Italy. Abstract The aim of this paper is to evaluate the results in term of misclassification rate of two classification models, Logit and Classification Trees (Cart), in a credit scoring context. Due to the dependence of results on input variables we will take into account this aspect to evaluate the prediction performance. To improve the prediction capability of this two models, we have also applied two statistical techniques, bagging and boosting, to evaluate whether using these aggregated predictors can be reached a better performance in term of classification results. Our results indicate a better classification capability of Cart and the error rate of both models can be further reduced using aggregated predictors. Furthermore Cart avoids variables selection problem. 1 Introduction Credit risk assessment has recently known a great interest due to the high impact of unsound credits on banks balance and to the proposal to modify the minimum regulatory capital from Basel Committee [2]. The possibility to adopt an internal rating system induced many banks to study the performance of classification models that can be used to implement a rating system. The first step is to exploit their large databases storing the past behaviour of borrowers, to identify hidden patterns and to predict soundness of clients. Solution to this requirement can be found in the statistical tools for knowledge discovery inside the area of data mining. In searching of credit prediction models, analysts can follow parametric approach, based on specific hypotheses on parameters, or non parametric approach based on machine learning algorithms. Many analysts prefer to use parametric models due to the easier interpretability of parameters and a better confidence with them. The most used models are Linear Discriminant Analysis