A multiple search vector conjugate gradient method with additive Schwarz preconditioner
Please login to view abstract download link
Large sparse matrix system obtained from partial differential equations by discretization with finite element methods need to be solved by iterative method, especially Krylov subspace methods with well-prepared preconditioner. This is caused by the limitation on degrees of freedom which can be treated by sparse direct solvers is around 5 millions due to large working memory. Krylov subspace methods are based on multiplication of sparse matrix to search vector (SpMV) during iteration and do not require large memory, but they are not efficient in modern CPU architecture. Almost all supercomputers attain less than 5\% efficiency against the peak performance in the benchmark named as HPCG, conjugate gradient solver with multigrid preconditioner. This is caused by the SpMV operation, which is essentially as BALSA 2 operation with sparse data access. The additive Schwarz preconditioner with coarse grid collection is a kind of optimal preconditioner. Convergence of the preconditionend Krylov subspace does not depend deeply on number of submatrices where the local problem is resolved by forward and backward substitution after direct factorization of each submatrix. Here the main operation becomes triangular solution of the sparse direct solver, which mainly consists of TRSV in small dense matrices. Furthermore, TRSM and multiplication of sparse matrix to tall skinny matrix can utilize cache memory of the modern CPU efficiently, which is an aim of this research. An additive Schwarz preconditioner with A-orthogonal projection onto the coarse space and its complement, which is called ``hybrid-ASM'', well separates the preconditioned Krylov subspace into two subspaces. One is an invariant space consisting of the coarse solution and the other is spanned by Lanczos search vectors, starting from a single search vector set by the first residual vector, and generated by local preconditioner and the global sparse operation and A-orthogonal projection onto the complement of the coarse space. By selecting the local solution and some higher frequency vectors in subproblem as multiple search vectors, block Lanczos algorithm spans the solution space, which can improve arithmetic intensity of the preconditioned CG method.