Multi-relational Data Mining In Microsoft SQL Server 2005
Free (open access)
C. L. Curotto & N. F. F. Ebecken & H. Blockeel
Most real life data are relational by nature. Database mining integration is an essential goal to be achieved. Microsoft SQL Server (MSSQL) seems to provide an interesting and promising environment to develop aggregated multi-relational data mining algorithms by using nested tables and the plug-in algorithm approach. However, it is currently unclear how these nested tables can best be used by data mining algorithms. In this paper we look at how the Microsoft Decision Trees (MSDT) handles multi-relational data, and we compare it with the multi-relational decision tree learner TILDE. In the experiments we perform, MSDT has equally good predictive accuracy as TILDE, but the trees it gives either ignore the relational information, or use it in a way that yields non-interpretable trees. As such, one could say that its explanatory power is reduced, when compared to a multi-relational decision tree learner. We conclude that it may be worthwhile to integrate a multi-relational decision tree learner in MSSQL. Keywords: multi-relational, data mining, algorithm, decision trees, databases, sql server, nested tables. 1 Introduction To achieve the tight coupling of Data Mining (DM) techniques in Database Management Systems (DBMS) technology, a number of approaches have been developed in the last years. These approaches include solutions provided by both company and academic research groups. Toward this objective, the Microsoft (MS) Object Linking and Embedding Database for DM (OLE DB DM) technology provides an industry standard for
multi-relational, data mining, algorithm, decision trees, databases, sql server, nested tables.