Performance Of Information Retrieval Models
Using Term Co-occurrences

G. Desjardins 1; R. Godin 1; R. Proulx 2

doi:10.2495/DATA070181

WIT Press

Performance Of Information Retrieval Models Using Term Co-occurrences

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2007

Size

386 kb

Paper DOI

10.2495/DATA070181

WIT Press

Author(s)

G. Desjardins 1 , R. Godin 1 & R. Proulx 2

Abstract

Many advanced models have been developed for information retrieval in recent years. These models are built on various artificial intelligence paradigms to improve the precision of the retrieval. Most of them exploit some form of term co-occurrences to improve retrieval quality. In this paper, we compare the retrieval performance of five of these models: the Extended Boolean model, the Generalized Vector Space model, the Frequent Set model, the Rough Set model and a Genetic-Based model. These models are tested on three sub-collections from TREC (Text REtrieval Conference). We analyze the specificity of the models regarding the form of co-occurrences introduced and report on the retrieval performance and the scalability of each model. Keywords: text mining, information retrieval, co-occurrences, extended Boolean, generalized vector space, frequent set, rough set, genetic algorithm. 1 Introduction Term co-occurrences embed major correlation information among the documents of collections. This information can be used to improve the precision at the core level of the retrieval engines. Many models try to capture this information and incorporate it to their output representation in order to increase the effectiveness of the retrieval engine. For this research, we have selected five retrieval models that exploit term co-occurrences: the Extended Boolean model, the Generalized Vector Space model, the Frequent Set model, the Rough Set model and a Genetic-Based model [1–5]. The next section reviews the principles of each model. Section 3 describes the

Keywords

text mining, information retrieval, co-occurrences, extended Boolean, generalized vector space, frequent set, rough set, genetic algorithm.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: