WIT Press


Implementing Data Mining Algorithms With Microsoft® SQL Server

Price

Free (open access)

Paper DOI

10.2495/DATA020081

Volume

28

Pages

Published

2002

Size

738 kb

Author(s)

C L Curotto & N F F Ebecken

Abstract

Implementing data mining algorithms with Microsoft SQL Server C. L. Curotto1 & N. F. F. Ebecken2 1 CESEC/UFPR– Civil Engineering Graduate Program, Brazil 2 COPPE/UFR& Civil Engineering Graduate Program, Brazil Abstract The OLE DB for DM (Microsoft’s object-based technology for sharing information and services across process and machine boundaries focused on database mining applications) specification provides an industry standard for implementation of data mining algorithms aggregated with Microsoft SQL Server 2000. The Simple Naive Bayes classifier is implemented using the OLE DB for DM Resource Kit. Numeric input attributes, multiple prediction trees and incremental classification are considered. All necessary steps to implement this algorithm are explained and discussed. Some results are shown to illustrate the capabilities of the implementation. 1 Introduction Nowadays database system managers like MS (Microsoft) SQL (Standard Query Language) Server [1] are available, with resources for manipulation of terabytes of data with parallel processing of queries (with multiprocessor servers) using microcomputers [2]. This situation suggests the integration of DM technology by using database managers to enlarge the scope of this technology at a low cost. This approach of integration, achieved by tightly coupling DM and OLAP (On-Line Analytical Processing) techniques in database application development environments, is matter of current interest. It has been discussed in conferences such as ICDM’98 (First International Conference on DM), happened on September 1998 in Rio de Janeiro – Brazil, ICDM’00 (Second International Conference on DM), happened on July 2000, Cambridge – UK and more recent ones. Agrawal [3] presented a methodology for tightly coupling of DM application to relational database system - IBM DB2/CS – based on utilization of user

Keywords