WIT Press

Integration Of Text And Data Mining


Free (open access)







571 kb

Paper DOI



WIT Press


B Drewes


Components and features of a text mining solution will be presented, illustrating the process from accessing the document through the various processing steps. As a result of these steps, the texts will ultimately be transformed into a statistical/numerical representation, enabling their use in an integrated text mining/data mining environment. An example illustrating such a use will be presented, together with some comparative performance data, showing the benfit of supplementing traditional data mining with text information. 1 Introduction The major part of this paper will focus on the text mining solution, including features and functionalities. The integration of this solution with an already existing data mining environment will be presented with the use of an example. This example will show the enhanced benefit of using both, data and text for mining purposes. The Text Mining solution to be presented here provides facilities for the creation of documents, the extraction of features, the generation of essential concepts (data reduction), the topical clustering of texts, and an interactive workbench environment to optimize the interpretation of a text collection. 2 Creation of documents Documents can come from a variety of sources, such as ASCII, PDF, HTML, EXCEL, Lotus, PowerPoints formats. These documents can be in a directory on a user’s desk, or located in the Internet. All such documents will be collected and stored in one single data set. Once such data set has been created it becomes the input to a \“Text Mining Node” which carries out the text processing consisting of feature extraction and