Document Type: Research Paper
Credit scoring is a classification problem leading to introducing numerous techniques to deal with it such as support vector machines, neural networks and rule-based classifiers. Rule bases are the top priority in credit decision making because of their ability to explicitly distinguish between good and bad applicants.
In a credit- scoring context, imbalanced data sets frequently occur as the number of good loans in a portfolio, which is usually much higher than the number of loans that default. The paper is to explore the suitability of RIPPER, One R, Decision table, PART and C 4.5 for loan default prediction rule extraction.
A real database of one of Iranian banks export loans is used, and class imbalance issues are investigated in its loan database by random oversampling the minority class of defaulters along with three sampling of majority in non-defaulters class. The performance criterion chosen to measure such an effect is the area under the receiver operating characteristic curve (AUC), accuracy measure and number of rules. Friedman’s statistic is used to test significant differences between techniques and datasets. The results shows that PART is the best classifier in all of balanced and imbalanced datasets