Classification Algorithms And Analyzing The Functionality Of Protein Families
Free (open access)
L. Gao & D. K. Y. Chiu
The rapid growth of bio-sequence data has resulted in an increasing demand for reliable algorithms that group proteins in a meaningful way. Many traditional classification and clustering algorithms have been adapted or directly applied to protein sequences or structures. To capture protein functionality, new algorithms have recently been proposed specifically aiming at incorporating protein functions. In this paper, we review some of the classification and clustering algorithms for proteins. We divide algorithms into four categories based on their use of dissimilarity measure, density characteristics, computational modeling and information of evidence. The algorithm based on information of evidence analyzes the biomolecular sequences as discrete-valued n-tuples such that discrete values rather than their variables are selected as evidence for the final groupings. The advantage of this approach is that the configuration of the final clusters does not depend on a reliable distance measure, a predefined computational model of the clusters, reliability of the adaptive learning method or a measure of the density function. Finally, the methods are reviewed with respect to the quality of reflecting functionality of the protein family. Keywords: protein families, protein functionality, classification algorithm, clustering algorithm. 1 Introduction Proteins are building blocks of organisms and fundamental substances of life that play an important role in executing and regulating many biological processes.
protein families, protein functionality, classification algorithm,clustering algorithm.