ECCOMAS 2024

Fast Semi-Iterative Finite Element Poisson Solvers for Tensor Core GPUs

  • Ruda, Dustin (TU Dortmund)
  • Turek, Stefan (TU Dortmund)
  • Ribbrock, Dirk (TU Dortmund)

Please login to view abstract download link

The overarching theme of the work presented is how accelerator hardware in the form of Tensor Core GPUs can be leveraged for PDE computing. For example, the latest representative, the Nvidia H100 GPU, promises a performance of up to 495 TFLOPS, but only if dense matrix operations are performed in single precision which makes its use in the context of finite element simulations for ill-conditioned Poisson problems challenging. Novel semi-iterative, hardware-oriented finite element Poisson solvers that meet the requirements for exploiting Tensor Cores are presented. These solvers incorporate explicit preconditioning, referred to as "prehandling" techniques, to reduce the condition number and, thus, ensure sufficient accuracy, using hierarchical bases in 2D or generating systems in 3D. By subsequently applying a Schur complement and exploiting the mesh structure, the large, sparse linear system is transformed into multiplications of small, primarily dense matrices. A highly performant, direct variant of the solver is limited to special cases in terms of mesh and finite element space and to the 2D case due to its storage requirements. To extend the possible applications of this idea to the 3D case, we consider a semi-iterative variant. It consists of a direct part, complemented by an iterative part given by the conjugate gradient method to solve a small proportion of the unknowns. The largest part of the numerical work is made up of dense matrix operations promising high performance when implemented on appropriate GPUs. The focus is on the new results concerning prehandling in 3D and the algorithmics of the semi-iterative method, including estimates of storage requirements, complexity and performance on an H100 GPU.