SEQUEST: Mining Frequent Subsequences
Using DMA-Strips

H. Tan; T. S. Dillon; F. Hadzic; E. Chang

doi:10.2495/DATA060321

WIT Press

SEQUEST: Mining Frequent Subsequences Using DMA-Strips

Price

Free (open access)

Transaction

WIT Transactions on Information and Communication Technologies

Volume

Pages

Published

2006

Size

452 kb

Paper DOI

10.2495/DATA060321

WIT Press

Author(s)

H. Tan, T. S. Dillon, F. Hadzic & E. Chang

Abstract

Sequential patterns exist in data such as DNA string databases, occurrences of recurrent illness, etc. In this study, we present an algorithm, SEQUEST, to mine frequent subsequences from sequential patterns. The challenges of mining a very large database of sequences is computationally expensive and require large memory space. SEQUEST uses a Direct Memory Access Strips (DMA-Strips) structure to efficiently generate candidate subsequences. DMA-Strips structure provides direct access to each item to be manipulated and thus is optimized for speed and space performance. In addition, the proposed technique uses a hybrid principle of frequency counting by the vertical join approach and candidate generation by structure guided method. The structure guided method is adapted from the TMG approach used for enumerating subtrees in our previous work [8]. Experiments utilizing very large databases of sequences which compare our technique with the existing technique, PLWAP [4], demonstrate the effectiveness of our proposed technique. Keywords: mining frequent subsequences, sequence, phylogenic tree, sequential strips. 1 Introduction Data mining strives to find frequent patterns in data [1–4, 12]. A task in data mining is characterized by the type of data, structures and patterns to be found [8, 9]. Advancements in data collection technologies have contributed to the high volumes of sales data. Sales data typically consists of a transaction timestamp and the list of items bought by customers. If one is interested in inter-transactional patterns, sales data is a good example of a sequential pattern. Other

Keywords

mining frequent subsequences, sequence, phylogenic tree, sequential strips.

Keep me updated

View Book

WIT Press, Ashurst Lodge, Ashurst, Southampton SO40 7AA, UK. Registered in England as a limited company No. 4741634

Connect with WIT Press: