WIT Press

A Study Of Re-sampling Methods With Regression Modeling


Free (open access)

Paper DOI








424 kb


M A Hossain & R L Woodburn


There are an overwhelming number of applications of data mining that result in the use of regression models. For example, predicting the propensity of a customer to default on a credit card, or the likelihood that a prospect will respond to a direct marketing campaign. Unfortunately, the implementation constraints for many such useful applications restrict the type of predictive method used to simple linear or logistic regression. While the more sophisticated techniques (e.g New-al Nets [1]) have built in processes that make the resulting model the most predictive and robust, developing a robust linear/logistic regression model requires much care with an experienced hand. In business settings, most predictive models are built on a modeling data set and independently validated on a validation dataset. Often times, the modeling and validation data set have differences that cause the modeler to question whether the model will perform well in the future. This paper explores the use of resampling methods in the model building steps to help to build an optimal sample that not only fits both the modeling and validation sample well, but also holds up robustly. The resampling allows many more sample datasets to be considered and eliminates overfitting of the model sample. 1 Business application and outline of paper The business application of this research is derived from the prediction of a 0/1 indicator as would be generated to reflect response to a marketing campaign or risk on a financial vehicle such as an auto loan or credit card. Although we have kept the variable distributions normal and simple, the correlation structure with