Skip to content

Stokes_Constrained is slow in parallel despite low KSP iterations #244

@gthyagi

Description

@gthyagi

Summary

Stokes_Constrained works in parallel, but the 8-rank runtime is much slower than the equivalent Nitsche free-slip solve even though the constrained solve only takes two KSP iterations. This suggests the bottleneck is setup/assembly/field handling for the block-constrained system, not Krylov convergence.

This came up while validating the Zhong et al. spherical-shell internal-boundary benchmark path using SphericalShellInternalBoundary() and exterior free-slip boundaries.

Reproducer Context

External benchmark script:

cd /Users/tgol0006/uw_folder/uw3_git_gthyagi_latest/underworld3/.claude/worktrees/mantle-convection-benchmarks
pixi run -e amr-dev mpirun -np 8 python /Users/tgol0006/uw_folder/uw3-mantle-convection-benchmarks/benchmarks/005_internal_boundary_delta_probe.py

The script uses:

  • uw.meshing.SphericalShellInternalBoundary()
  • ri = 0.55, rint = 0.775, ro = 1.0
  • cellSize = 0.125 under 8 MPI ranks
  • velocity P2, pressure P1
  • internal radial natural load on the Internal boundary
  • exterior free slip on Upper and Lower

The Nitsche comparison is:

pixi run -e amr-dev mpirun -np 8 python /Users/tgol0006/uw_folder/uw3-mantle-convection-benchmarks/benchmarks/005_internal_boundary_delta_probe.py -uw_freeslip_type nitsche

Timing Evidence

Measured with /usr/bin/time -p on the same machine and mesh resolution:

Method Wall time KSP iterations Result
Nitsche free slip 27.93s 1 pass
Stokes_Constrained 434.92s 2 pass
constrained with temporary degree-1 multiplier test 409.62s 2 pass

The temporary degree-1 multiplier test did not materially improve runtime, so multiplier polynomial degree is unlikely to be the main bottleneck.

Observed Metrics

8-rank constrained run:

{
  "cellsize": 0.125,
  "ksp_iterations": 2,
  "ksp_reason": 2,
  "l": 2,
  "max_boundary_area_relative_error": 0.009919085347124204,
  "max_y_l0_norm_error": 0.009849659575170255,
  "mpi_size": 8,
  "passed": true,
  "snes_reason": 5,
  "stokes_tolerance": 1e-05,
  "upper_characteristic_velocity": 0.012565603878091148,
  "upper_normal_velocity_rms": 1.631821482362003e-05
}

8-rank Nitsche comparison:

{
  "cellsize": 0.125,
  "ksp_iterations": 1,
  "ksp_reason": 2,
  "l": 2,
  "max_boundary_area_relative_error": 0.009919085347124204,
  "max_y_l0_norm_error": 0.009849659575170255,
  "mpi_size": 8,
  "passed": true,
  "snes_reason": 5,
  "stokes_tolerance": 1e-05,
  "upper_characteristic_velocity": 0.010110860387364697,
  "upper_normal_velocity_rms": 2.2416675430165106e-05
}

Current Diagnosis

The linear solve is not the issue: constrained free slip reports only 2 KSP iterations. The expensive part is likely one or more of:

  • setup/assembly of the extra Lagrange-multiplier fields for Upper and Lower;
  • boundary residual/Jacobian registration for the multiplier coupling;
  • grouping pressure plus multipliers into the Schur block;
  • _constrain_interior_multipliers_in_section() section work in parallel;
  • creation or handling of full-domain multiplier fields when only the boundary trace is physical;
  • repeated DM/section/fieldsplit setup that could be cached when mesh and constraints are unchanged.

Candidate Fix Directions

  • Profile Stokes_Constrained setup and assembly with PETSc/UW timing to locate the exact hotspot.
  • Optimize _constrain_interior_multipliers_in_section() if Python-side set/section operations dominate.
  • Consider boundary-only or submesh multiplier fields instead of full-domain multiplier fields with interior DOFs constrained out.
  • Cache constrained section/fieldsplit setup when the mesh, fields, and constraint boundaries are unchanged.
  • Check whether grouped [p, lambda] Schur setup rebuilds too much state each solve.

Related Work

PR #242 fixes SphericalShellInternalBoundary() boundary labels so the internal-boundary benchmark can use the built-in mesh path directly. This issue is separate: after that fix, the constrained solve is correct but slow in parallel.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions