Nonlinear Dimensionality Reduction Of Large Datasets For Data Exploration
Free (open access)
V. Tomenko & V. Popov
Dimensionality reduction techniques are outlined; their strengths and limitations are discussed. The novel dimensionality reduction method is presented, which is a combination of input space approximation, nonlinear dimensionality reduction and function approximation techniques. The method is especially useful for large scale real-world datasets, where existing methods fail to succeed because of extreme computational expenses. The method can be used in exploratory data analysis and aims to create low dimensional data representation for better data structure understanding and for cluster analysis. The comparison of dimensionality reduction techniques is performed in order to justify the applicability of the proposed method. Keywords: dimensionality reduction, self-organizing neural network, data exploration, function approximation. 1 Introduction Advances in data collection and storage capabilities during the past decades have led to data overload in most sciences. In such diverse domains as medicine, biology, economics, astronomy and engineering, researchers have collected huge datasets with larger and larger number of observations and of high dimensionality. The dimension of the data is determined by the number of variables that are measured on each observation. Such datasets, in contrast with smaller, more traditional ones that have been studied extensively in the past, present new challenges in data analysis, yet they must be processed and understood in order to extend our knowledge in different domains. Traditional statistical tools fail to meet the requirements of high-dimensional data processing because of the increase in the number of variables associated
dimensionality reduction, self-organizing neural network, data exploration, function approximation.