WIT Press

Hard Hats For Data Miners: Myths And Pitfalls Of Data Mining


Free (open access)

Paper DOI








479 kb


T Khabaza


Hard hats for data miners: Myths and pitfalls of data mining T. Khabaza SPSS Advanced Data Mining Group Abstract The intrepid data miner runs many risks, such as being buried under mountains of data or vanishing along with the \“mysterious disappearing terabyte”. This paper debunks some myths and sketches some \“hard hats for data miners”. 1 Introduction Data mining is a business process, finding patterns in your data which you can use to do your business better. Through data mining we gain insight into a business problem; this insight may be of use in itself, but it also helps us to gain the other benefits of data mining, such as a predictive capability. This paper is about the practice of data mining; it is not a research paper, but reports lessons learned through solving practical business problems and through contact with many data mining users and potential users. There are many myths and misconceptions about data mining, and holding these misconceptions leads data mining users to run specific risks. The first half of this paper lists some common misconceptions about data mining, corrects them, and describes the risks to which they can lead. The second half of the paper lists other common problems or pitfalls of data mining, with their symptoms and cures. 2 Myths and misconceptions about data mining 2.1 Myth #1: Data mining is all about algorithms The ordinary business-person, attending a typical data mining conference, reading its proceedings, or even reading only the contents page of such a