Towards Best Achievable Floating Point Performance For Linear Algebra Computations On COTS Beowulf Clusters
Free (open access)
S. Fourmanoit, R. Roy & F. Bertrand
Towards best achievable floating point performance for linear algebra computations on COTS Beowulf clusters S. Fourmanoit1, R. Roy1 & E Bertrand2 1Department of Computer Engineering 2Department of Chemical Engineering Ecole Polytechnique de Montreal, Canada Abstract The solution of linear systems is required for many modern engineering applica- tions from computational fluid dynamics to biomedical imaging. This paper shows how a low cost Beowulf cluster can be optimized to handle various CPU-intensive linear algebra computations using the scalable linear algebra package (ScaLA- PACK). The paper focuses on the techniques and tools that we have developed to enhance the performance of a basic reference implementation. The tuning of system performance includes both partial recoding of some modules and careful engineering of clusters design, doubling speedup as compared to the reference implementation. 1 Introduction In a lot of engineering applications such as CFD or biomedical imaging, the solution of large linear systems is required. Linear algebra automated handling has always been a computer-intensive activity. The improvements in coding techniques and processor speed have led many engineers to ask for increasingly larger system manipulations, and this demand currently exceeds each increase in computational power. This behaviour has been amplified by a recent trend in many engineering schools to teach computer-aided numerical differential equations and to use generic code solvers, thus broadening the users base for such methods.