A New Tool For Textual Database Analysis And Management
Free (open access)
The paper deals with a new application of data mining algorithms to information systems management based on a clustering strategy when the data set is a given in terms of textual archives. The case for instance is given by the internet archives whose management is strictly dependent from the use of search engines with serious constraints to the actual retrieval of textual data. The proposed new informative system manager called TOM can be considered a search engine manager which can deal with the usual search engines; in the same way TOM can be applied to the search on the on line encyclopaedias and many other informative systems whose content is the description of a large set of items without the help of a general directory, like description either of books of a library or spare parts of machines and used in a warehouses in general. In order to face the textual frame we propose in the paper a new natural language treatment, a new mining strategy, new clustering algorithm for textual data . A mixture model comes up from the necessity of modelling the probability of the overall correlation among the n statements of the textual archive and the query of the user so that the stopping rule of the process of clustering of the asserts can be applied adopting a statistical hypothesis testing strategy. In this way the so called WEB intelligence becomes the automatic answer-machine based on a mining strategy in the WEB.