ECCOMAS 2024

Improving the Efficiency of the Sub-Stepping Velocity Splitting Scheme within the Nektar++ Framework

Liosi, Alexandra (Imperial College London)
Sherwin, Spencer (Imperial College London)
Hoessler, Julien (McLaren Racing Ltd)
Swift, Adam (McLaren Racing Ltd)
Chatzopoulos, Athanasios (McLaren Racing Ltd)
Bottone, Francesco (McLaren Racing Ltd)
Horikoshi, Masashi (McLaren Racing Ltd)

In session: MS098C - Reconciling Physical Fidelity, Robustness and Efficiency in Computational Fluid Dynamics III

Please login to view abstract download link

In this work, the computational performance of the Nektar++ framework for solving the incompressible Navier-Stokes (NS) equations using Spectral h/p elements is assessed and improved for industry-relevant geometries at high Reynolds (Re) numbers. There is an increasing need for simulating complex geometries with greater accuracy at a reduced computational cost. These simulations involve multiple spatial/temporal scales, and they should model or resolve complex transient phenomena such as turbulence transition, separation, and vortex system evolution. Standard explicit techniques require very low time steps, which are not feasible within an industrial environment, hence the need for efficient implicit time-stepping techniques. We use the implicit Sub-Stepping Velocity Splitting Scheme to solve the incompressible NS equations numerically in a segregated manner. This method is distinguished by its use of a mixed discretization scheme. Specifically, it employs Discontinuous Galerkin (DG) discretization to solve an unsteady advection equation during the Advection step while using Continuous Galerkin (CG) discretization for the Pressure and Diffusion steps. The I/O operations, parallelization, and serial performance are evaluated first to determine its numerical efficiency. Then, each of the Advection, Pressure, and Diffusion steps is explored based on the fundamental kernel efficiency to identify the most time-consuming and memory-intensive parts, aiming at reducing the memory footprint and moving them closer to the CPU-bound of the roofline model. The considered alternatives include parallelization format between mixed discretizations, vectorization, and GPU-friendly data structures. We will present the systematic performance analysis, the challenges, and the advancements for mixed discretization time-stepping techniques relevant to any available Finite Element code aiming at solving exascale industrial problems.