Migrate Benchmark by faridyagubbayli · Pull Request #731 · waltsims/k-wave-python

faridyagubbayli · 2026-05-14T14:26:26Z

Migrates benchmark.m script with some additions.

benchmarks/README.md shows sample usage.

Example output json file

{
  "comp_size": [
    32768,
    65536,
    131072,
    262144,
    524288
  ],
  "comp_time": [
    2.813780007263025,
    2.5228265166903534,
    2.5327161628132067,
    4.796307735455533,
    10.499971745846173
  ],
  "options": {
    "data_cast": "off",
    "heterogeneous_media": true,
    "absorbing_media": true,
    "nonlinear_media": false,
    "binary_sensor_mask": true,
    "number_sensor_points": 100,
    "number_time_points": 1000,
    "num_averages": 3,
    "start_size": 32,
    "x_scale_array": [
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16,
      16
    ],
    "y_scale_array": [
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8,
      16
    ],
    "z_scale_array": [
      1,
      1,
      1,
      2,
      2,
      2,
      4,
      4,
      4,
      8,
      8,
      8
    ],
    "domain_size": 0.022,
    "sensor_radius": 0.01,
    "pml_size": 10,
    "pml_inside": true,
    "report_mem_usage": false,
    "backend": "python",
    "device": "gpu",
    "computer_name": "k-instance",
    "python_version": "3.11.14",
    "platform": "Linux-5.10.0-39-cloud-amd64-x86_64-with-glibc2.31",
    "kwave_python_version": "0.6.1"
  },
  "output_path": "<filename>",
  "error_reached": false,
  "error_message": ""
}

Greptile Summary

This PR ports MATLAB's benchmark.m into a new benchmarks/ package, adding a BenchmarkOptions dataclass, a run() function with injected solver/timer for testability, incremental JSON persistence, and a CLI entry point. The implementation is well-structured with good fault-tolerance design (partial saves after each average).

benchmarks/helpers.py defines the full domain setup (build_case), result accumulation (store_case_result, rolling_average), and JSON serialization (save_results).
benchmarks/benchmark.py wires everything into a run() loop and a main() CLI; the solver and timer are injectable, making unit tests fast and hermetic.
tests/test_benchmark.py provides solid coverage of the happy path and error path, but does not exercise the report_mem_usage=True branch.

Confidence Score: 4/5

Safe to merge for the typical Linux/macOS benchmark use-case; the nan-serialization defect only bites when --report-mem-usage is passed on Windows.

When report_mem_usage=True is used on Windows, peak_memory_bytes() returns float(nan), which json.dumps writes as the bare token NaN — producing a file that Python's own json.loads rejects. The rest of the benchmark logic (timing, grid construction, incremental saves, error handling) looks correct and is well-tested.

benchmarks/helpers.py — specifically the peak_memory_bytes function and its unchecked nan return value

Important Files Changed

Filename	Overview
benchmarks/helpers.py	Core helper module; peak_memory_bytes returns nan on Windows which propagates to an invalid JSON output file when report_mem_usage=True
benchmarks/benchmark.py	Main benchmark runner with injectable solver/timer; inherits the nan-propagation issue from helpers
tests/test_benchmark.py	Covers grid sizes, build_case, timing aggregation, and error handling, but does not exercise the report_mem_usage code path
benchmarks/init.py	Package marker with a single docstring; no logic
benchmarks/README.md	Usage documentation for the benchmark; accurate and complete

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    CLI["main() / CLI args"] --> RUN["run(options, backend, device, ...)"] 
    RUN --> GS["grid_sizes(options)"]
    GS --> LOOP["for each (nx, ny, nz, scale)"]
    LOOP --> BC["build_case(options, nx, ny, nz, scale)"]
    BC --> AVG["for loop_num in 1..num_averages"]
    AVG --> SOLVE["solver(kgrid, medium, source, sensor, ...)"]
    SOLVE --> TIME["elapsed = timer() - start"]
    TIME --> RA["rolling_average(loop_time, elapsed, loop_num)"]
    RA --> MEM{"report_mem_usage?"}
    MEM -- yes --> PMB["peak_memory_bytes() - returns nan on Windows"]
    MEM -- no --> SCR
    PMB --> SCR["store_case_result(result, nx*ny*nz, ...)"]
    SCR --> SAVE["save_results(path, result)"]
    SAVE --> AVG
    LOOP --> EXC{"Exception?"}
    EXC -- yes --> ERR["set error_reached=True, save_results, break"]
    EXC -- no --> LOOP
    RUN --> RET["return result dict"]

_{Reviews (2): Last reviewed commit: "helpers" | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

codecov · 2026-05-14T14:28:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.93%. Comparing base (4b110af) to head (968b5f8).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #731      +/-   ##
==========================================
+ Coverage   74.82%   74.93%   +0.11%     
==========================================
  Files          56       56              
  Lines        8095     8095              
  Branches     1577     1577              
==========================================
+ Hits         6057     6066       +9     
+ Misses       1422     1411      -11     
- Partials      616      618       +2

Flag	Coverage Δ
3.10	`74.91% <ø> (+0.11%)`	⬆️
3.11	`74.91% <ø> (+0.11%)`	⬆️
3.12	`74.91% <ø> (+0.11%)`	⬆️
3.13	`74.91% <ø> (+0.11%)`	⬆️
macos-latest	`74.86% <ø> (+0.11%)`	⬆️
ubuntu-latest	`74.86% <ø> (+0.11%)`	⬆️
windows-latest	`74.87% <ø> (+0.11%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

greptile-apps · 2026-05-14T14:31:21Z

+def _store_case_result(result: dict[str, Any], comp_size: int, comp_time: float, mem_usage: float, report_mem_usage: bool) -> None:
+    if len(result["comp_size"]) == 0 or result["comp_size"][-1] != comp_size:
+        result["comp_size"].append(comp_size)
+        result["comp_time"].append(comp_time)
+        if report_mem_usage:
+            result["mem_usage"].append(mem_usage)
+        return
+
+    result["comp_time"][-1] = comp_time
+    if report_mem_usage:
+        result["mem_usage"][-1] = mem_usage


Case identity key is not unique for custom scale arrays

_store_case_result identifies a grid case by nx * ny * nz (total grid points). If two entries in the scale arrays produce the same total point count but different shapes — e.g. (2, 1) / (1, 2) with z=(1, 1) both giving 2 * start_size³ — the second case's timing will overwrite the first rather than append a new row. The default scale arrays happen to produce unique products, so this is silent with the defaults, but custom arrays passed via BenchmarkOptions can trivially trigger it and silently drop a case's result.

greptile-apps · 2026-05-14T14:31:24Z

+def _peak_memory_bytes() -> float:
+    try:
+        import resource
+    except ImportError:
+        return float("nan")
+
+    usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+    if platform.system().lower() == "darwin":
+        return float(usage)
+    return float(usage * 1024)


ru_maxrss reports cumulative process-lifetime peak, not per-run peak

resource.getrusage(RUSAGE_SELF).ru_maxrss returns the highest RSS the process has ever reached since it started, not the peak during the current solver call. When report_mem_usage=True and the benchmark runs multiple grid sizes, every case after the largest one seen so far will record the earlier run's peak — the values will plateau rather than reflect per-run memory. Averaging this monotonically non-decreasing value further masks the issue.

greptile-apps · 2026-05-14T14:31:31Z

+        if self.number_time_points <= 0 or self.num_averages <= 0:
+            raise ValueError("number_time_points and num_averages must be positive")
+        if self.number_sensor_points <= 1:
+            raise ValueError("number_sensor_points must be greater than 1")


start_size is not validated

A start_size of 0 or a negative value passes __post_init__ silently, producing (0, 0, 0, 0) tuples from grid_sizes and a zero-division error inside build_case when computing dx = domain_size / nx.

Suggested change

if self.number_time_points <= 0 or self.num_averages <= 0:

raise ValueError("number_time_points and num_averages must be positive")

if self.number_sensor_points <= 1:

raise ValueError("number_sensor_points must be greater than 1")

if self.number_time_points <= 0 or self.num_averages <= 0:

raise ValueError("number_time_points and num_averages must be positive")

if self.start_size <= 0:

raise ValueError("start_size must be positive")

if self.number_sensor_points <= 1:

raise ValueError("number_sensor_points must be greater than 1")

greptile-apps · 2026-05-14T14:31:37Z

+def test_run_saves_partial_results_after_solver_error(tmp_path: Path):
+    options = small_options()
+    output_path = tmp_path / "benchmark.json"
+
+    def failing_solver(*args, **kwargs):
+        raise RuntimeError("solver failed")
+
+    result = run(options, max_cases=1, output_path=output_path, solver=failing_solver)
+
+    assert result["error_reached"] is True
+    assert result["error_message"] == "solver failed"
+    assert output_path.exists()
+    saved = json.loads(output_path.read_text())
+    assert saved["error_reached"] is True
+    assert saved["error_message"] == "solver failed"


report_mem_usage=True path has no test coverage

The mem_usage list — conditionally added to result and to the JSON payload — is never exercised by the test suite. The _peak_memory_bytes helper, the rolling-average accumulation of loop_mem_usage, and the _save_results branch that serialises mem_usage are all untested.

greptile-apps · 2026-05-14T14:36:48Z

+def peak_memory_bytes() -> float:
+    try:
+        import resource
+    except ImportError:
+        return float("nan")
+
+    usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+    if platform.system().lower() == "darwin":
+        return float(usage)
+    return float(usage * 1024)


nan sentinel silently produces unreadable JSON on Windows

peak_memory_bytes() returns float("nan") on Windows where the resource module isn't available. That nan flows unguarded through rolling_average and store_case_result into result["mem_usage"]. Python's json.dumps serialises nan as the bare token NaN (not valid JSON), and Python's own json.loads will then reject the file with a ValueError — so any run with --report-mem-usage on Windows silently writes a file that cannot be read back.

Consider raising early (ValueError: report_mem_usage is not supported on this platform) when resource is unavailable, or filtering/skipping the nan entry in save_results.

Migrate Benchmark

e956c91

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

helpers

968b5f8

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate Benchmark#731

Migrate Benchmark#731
faridyagubbayli wants to merge 2 commits into
masterfrom
migrate-benchmark

faridyagubbayli commented May 14, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

codecov Bot commented May 14, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faridyagubbayli commented May 14, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

faridyagubbayli commented May 14, 2026 •

edited by greptile-apps Bot

Loading

codecov Bot commented May 14, 2026 •

edited

Loading