Local Aware IBM#1378
Conversation
… are only locally aware
|
Claude Code Review Head SHA: 9879df3 Files changed:
Findings: Critical: Stack overflow in
|
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1378 +/- ##
==========================================
- Coverage 64.95% 64.23% -0.72%
==========================================
Files 72 73 +1
Lines 18879 19268 +389
Branches 1571 1628 +57
==========================================
+ Hits 12263 12377 +114
- Misses 5640 5888 +248
- Partials 976 1003 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…s param description
Re-run ffmt with the corrected 0.4.0 build that includes the breaks.retain filter preventing line splits immediately before `%` (member accessor). Fixes three occurrences of `& %sf` in m_ibm.fpp, one `& %geometry` in m_ibm.fpp, and one `& %beg` in m_time_steppers.fpp.
num_ib_patches_max = 2050000 caused a ~2.25 GB unconditional heap allocation + full init loop in s_assign_default_values_to_user_inputs on every startup, crashing the CI debug build even for cases with no IBM. Also, s_reduce_ib_patch_array had a 2.25 GB stack-allocated local array that caused SIGSEGV for IBM cases. Introduce num_ib_patches_max_namelist = 50000 (restoring the pre-particle-bed budget) for the initial allocation. s_generate_particle_beds grows patch_ib to num_ib_patches_max via MOVE_ALLOC only when particle beds are actually being generated. s_reduce_ib_patch_array now uses a heap-allocated local array sized to num_ibs instead of the full num_ib_patches_max.
When particle beds are used, rank 0 grows patch_ib to num_ib_patches_max inside s_generate_particle_beds. Non-rank-0 ranks only have the namelist-sized allocation (num_ib_patches_max_namelist). The num_ibs scalar is broadcast first, so we can check and grow before the per-patch MPI_BCAST loop accesses patch_ib(i) for i > num_ib_patches_max_namelist.
|
You had an issue that I think I fixed.
|
… limit NIB and the case_validator both now reference num_ib_patches_max_namelist (50000) instead of num_ib_patches_max (2050000). This constant is the actual namelist limit; particle beds grow patch_ib beyond it at runtime but those entries are never specified in the namelist. The fallback values match the constant, ensuring Homebrew installs (which lack m_constants.fpp) use the correct limit.
patch_ib is reallocated to num_aware_ibs slots (e.g. 54000 for 3D with the default ib_neighborhood_radius=1) before the 1-rank and no-MPI copy. Using patch_ib(1:num_gbl_ibs) crashed when num_gbl_ibs > num_aware_ibs (e.g. large particle beds on a single MPI rank). Use min() to clamp the copy, matching the original truncation behavior while avoiding the out-of-bounds write on the newly allocatable patch_ib_gbl.
For single-rank and no-MPI cases every IB patch is local, so shrinking patch_ib to num_aware_ibs (e.g. 54k for 3D) and then keeping num_ibs at num_gbl_ibs was a latent out-of-bounds: IBM loops over num_ibs would access patch_ib beyond its capacity. MOVE_ALLOC transfers patch_ib_gbl (sized exactly num_gbl_ibs) directly into patch_ib — no copy, no truncation, patch_ib size matches num_ibs. The num_aware_ibs resize is still done for multi-rank cases where each rank genuinely only needs its local neighbourhood subset.
patch_ib has GPU_DECLARE(create=...) in m_global_parameters, which means OpenACC/OpenMP tracks it via plain allocate/deallocate automatically. move_alloc is not reliably intercepted by all GPU runtimes for declare- create variables, so replace all three grow-on-demand sites (m_particle_bed, m_mpi_proxy, s_reduce_ib_patch_array 1-rank/no-MPI) with the established pattern: allocate tmp, copy, deallocate patch_ib, allocate patch_ib.
|
I generated and pushed the missing golden file for While reviewing the diff I found several other issues worth addressing: Critical (wrong results)
High (MPI / GPU correctness)
Post-process SILO writes called from all ranks ( Medium (robustness / performance)
Low (code quality)
The two highest-priority fixes are the |
Description
The current code has an issue scaling much past 10k particles due to limitations in the MPIAllReduceSum in the IB force computation. This PR attempts to alleviate this by limiting the number of IBs any given rank can be aware of to its neighbors. This turns the AllReduce compute to a MPI neighbor computation, removing the communication bottlneck. To support this, a massive overhaul of IB ownership between ranks was required.
Type of change
Testing
TBD
Checklist
AI code reviews
Reviews are not triggered automatically. To request a review, comment on the PR:
@coderabbitai review— incremental review (new changes only)@coderabbitai full review— full review from scratch/review— Qodo review/improve— Qodo code suggestions@claude full review— Claude full review (also triggers on PR open/reopen/ready)claude-full-review— Claude full review via label