Skip to content

Workaround elimination with latest amdflang drop and Removing array reshapes#1433

Merged
sbryngelson merged 15 commits into
MFlowCode:masterfrom
anandrdbz:master
May 13, 2026
Merged

Workaround elimination with latest amdflang drop and Removing array reshapes#1433
sbryngelson merged 15 commits into
MFlowCode:masterfrom
anandrdbz:master

Conversation

@anandrdbz
Copy link
Copy Markdown
Contributor

@anandrdbz anandrdbz commented May 12, 2026

Summary

This PR updates MFC's AMD GPU (OpenMP target offload) support to use the latest therock/AFAR flang-23 drop and eliminates two performance-limiting workarounds:

  1. New AMD flang drop (therock-23.1.0 / AFAR) — Updates the famd build target to use the therock-23 compiler drop. Adds a dedicated frontier_amd CI workflow and Docker container. Updates toolchain/bootstrap/modules.sh with the new OLCF_AFAR_ROOT path and library environment variables.

  2. Remove dummy variable workaround — The dummy variable was a workaround for an amdflang bug where GPU kernels using a loop-index variable (id) directly inside GPU_PARALLEL_LOOP caused incorrect code generation. The new flang-23 drop fixes this natively, so the workaround is removed. Replaced with a module-level iglob variable updated via GPU_UPDATE(device=...) before each kernel.

  3. Eliminate array reshapes in WENO/Riemann/viscous/surface-tension/THINC — Replaces temporary array reshape operations (which required GPU memory copies) with scalar extraction using the ${SF('')}$ Fypp macro pattern. This reduces GPU memory traffic and improves performance for case-optimized builds.

  4. Fix post_process PIE relocation error with LLVMFlang — flang-23/LLD defaults to building PIE executables. Static libraries (SILO, LAPACK) built without -fPIC produce R_X86_64_32 relocations that LLD rejects in PIE mode. Added -no-pie to post_process link options for LLVMFlang. simulation is unaffected (no SILO/LAPACK dependency).

  5. CMake: use direct find_library for HIP/hipfort with LLVMFlang — Replaces find_package(hipfort COMPONENTS hip CONFIG REQUIRED) with direct find_library calls using $ENV{OLCF_AFAR_ROOT}/lib, matching how CRAY_HIPFORT_LIB is handled. Avoids CMake config package dependency for the therock drop layout.

Testing

  • All Frontier CCE (gpu-acc, gpu-omp): ✅
  • All Frontier AMD flang-23 (gpu-omp [1/2], [2/2], cpu): ✅
  • All NVHPC versions 23.11–26.3 (gpu + cpu): ✅
  • GitHub Ubuntu + macOS (GNU + Intel): ✅

Checklist

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed
GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU (Frontier, verified with cce-acc, cce-omp, amdflang)

@anandrdbz anandrdbz requested a review from sbryngelson as a code owner May 12, 2026 17:38
@qodo-code-review
Copy link
Copy Markdown
Contributor

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@sbryngelson sbryngelson marked this pull request as draft May 12, 2026 17:44
@sbryngelson sbryngelson removed their request for review May 12, 2026 17:48
@sbryngelson
Copy link
Copy Markdown
Member

it has merge conflicts already...

@anandrdbz
Copy link
Copy Markdown
Contributor Author

It's a very minor conflict in CMake from a change yesterday, I will resolve it

@github-actions

This comment was marked as resolved.

@sbryngelson sbryngelson marked this pull request as ready for review May 12, 2026 23:39
@qodo-code-review

This comment was marked as off-topic.

sbryngelson
sbryngelson previously approved these changes May 12, 2026
…elocation error

flang-23/LLD defaults to building PIE executables. SILO and LAPACK static
libraries on Frontier are compiled without -fPIC, so their 32-bit absolute
relocations (R_X86_64_32) are rejected by LLD when linking a PIE binary.
Add -no-pie to post_process link options for LLVMFlang to allow non-PIC
system libraries. simulation is unaffected (no SILO/LAPACK dependency).
sbryngelson
sbryngelson previously approved these changes May 13, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 88.40580% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.00%. Comparing base (28863b1) to head (b186af2).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_muscl.fpp 42.85% 16 Missing ⚠️
src/simulation/m_rhs.fpp 81.25% 6 Missing and 3 partials ⚠️
src/simulation/m_viscous.fpp 76.92% 0 Missing and 3 partials ⚠️
src/simulation/m_weno.fpp 98.56% 0 Missing and 2 partials ⚠️
src/simulation/m_cbc.fpp 0.00% 0 Missing and 1 partial ⚠️
src/simulation/m_time_steppers.fpp 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1433      +/-   ##
==========================================
+ Coverage   64.95%   65.00%   +0.04%     
==========================================
  Files          72       72              
  Lines       18879    18810      -69     
  Branches     1571     1553      -18     
==========================================
- Hits        12263    12227      -36     
+ Misses       5640     5615      -25     
+ Partials      976      968       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sbryngelson
sbryngelson previously approved these changes May 13, 2026
… limit

AMD flang case-opt compilation takes close to the 2h hackathon wall limit,
leaving no time for the run step. Split into two sequential hackathon GPU jobs:
1. Pre-Build: compiles all benchmarks via --dry-run (build only, no execution)
2. Run: skips build (binaries cached), runs and validates benchmarks

Also preserve dependency dirs in prebuild for non-Phoenix clusters (deps are
already built by the Fetch Dependencies step, so only clean staging dirs).
@sbryngelson sbryngelson merged commit 060f752 into MFlowCode:master May 13, 2026
77 of 82 checks passed
sbryngelson added a commit that referenced this pull request May 13, 2026
- m_thinc.fpp: take master's extended Fypp for-loop tuple (STENCIL_VAR,
  COORDS, X_BND/Y_BND/Z_BND), update CC_PRI x_cc/y_cc/z_cc -> x%cc/y%cc/z%cc
- m_rhs.fpp: take master's drop of 'dummy' workaround condition, keep bc%y%beg naming
- m_riemann_solvers.fpp: take master's unified Re_avg_rsx_vf indexing (j,k,l)
  for all cylindrical faces, update y_cb/y_cc -> y%cb/y%cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants