ECCOMAS 2024

Gridap.jl and PartitionedArrays.jl: Towards a fully-fledged high-performance finite element software stack in Julia

  • Verdugo, Francesc (Vrije Universiteit Amsterdam)
  • Andersen, Anton (University of Amsterdam)
  • Frazier, Jared (University of Amsterdam)
  • Meijer, Jop (University of Amsterdam)
  • Tu, Yung-sheng (Vrije Universiteit Amsterdam)

Please login to view abstract download link

Gridap.jl is among the most advanced finite element (FE) libraries written in the Julia programming Language. It provides a user interface reminiscent to FEniCS, but implemented in pure Julia and following a novel software design based on the Julia just-in-time compiler. Gridap.jl's key design goal is to provide an efficient FE package implemented in an easy-to-use computer language, allowing numerical analysts, domain scientists, and HPC experts to develop collaboratively in the same code base without inter-language barriers. This is in contrast to the more traditional approach of combining Python for user productivity and C/C++/Fortran for performance. The extension of Gridap.jl to large-scale parallel computations is based on PartitionedArrays.jl. This library provides tools such as distributed vectors, distributed sparse matrices, and distributed sparse matrix-vector multiplication kernels, which can be used in the parallel assembly of FE operators and its solution with iterative solvers. The combined goal of Gridap.jl and PartitionedArrays.jl is to provide a modern high-performance FE software stack similar in spirit to other combos such as FEniCS plus PETSc, but in Julia. The key part still missing to accomplish this objective is the implementation of parallel preconditioners based on algebraic multigrid (AMG) and multi-level domain decomposition. In this talk, we will present the latest developments, data structures, and design choices in PartitionedArrays.jl towards the implementation of such parallel methods. Our presentation will cover topics such as parallel AMG coarsening, distributed sparse matrix-matrix multiplication kernels, and latency-hiding for additive multi-level methods, as well as performance comparisons against well established C libraries.