Skip to content

perf: cache renderDouble for small integers#763

Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/cached-render-double
Draft

perf: cache renderDouble for small integers#763
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/cached-render-double

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 12, 2026

Motivation:

Reduce allocation overhead in common numeric rendering paths.

Key Design Decision:

Keep this PR focused on the renderDouble optimization. The original Builtin1.apply1 / Builtin3.apply3 specialization has already landed in current master via #807, so it is intentionally no longer part of this PR diff after the rebase.

Modification:

  • Materializer delegates number stringification to RenderUtils.renderDouble.
  • RenderUtils.renderDouble reuses a small integer string cache for exact integer doubles in the range 0 to 255.

Benchmark Results:

JMH throughput, normalized from ms/op to ops/ms (ops/ms = 1 / ms_per_op; higher is better):

Benchmark master ms/op PR ms/op master ops/ms PR ops/ms Delta ops/ms
large_string_join 1.972 0.545 0.507 1.835 +261.8%
large_string_template 3.260 1.635 0.307 0.612 +99.4%
realistic2 128.661 47.007 0.0078 0.0213 +173.7%
parseInt 0.057 0.033 17.54 30.30 +72.7%

Scala Native hyperfine against source-built jrsonnet (git@github.com:CertainLach/jrsonnet.git, origin/master 80cd36a; lower is better):

Benchmark master-native PR-native jrsonnet-source Result
realistic2 86.3 +/- 1.2 ms 87.1 +/- 1.2 ms 97.4 +/- 1.9 ms Native neutral; PR is 1.01x slower than master within noise
parseInt 5.5 +/- 0.7 ms 5.6 +/- 0.8 ms 4.7 +/- 1.4 ms Startup-dominated and noisy; PR is neutral vs master

Analysis:

The JMH signal is strongly positive on numeric-heavy rendering workloads because exact small integer doubles avoid repeated string allocation through the shared renderDouble path. Scala Native is effectively neutral on these two hyperfine cases; the parseInt run reported outliers and is dominated by process startup at this input size.

Verification:

  • ./mill --no-server bench.runRegressions bench/resources/cpp_suite/large_string_join.jsonnet bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/realistic2.jsonnet bench/resources/go_suite/parseInt.jsonnet
  • hyperfine --warmup 10 --min-runs 50 -N
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
  • ./mill --no-server 'sjsonnet.jvm[3.3.7].test'

References:

  • PR branch: perf/cached-render-double
  • Base: master at 192bb3374b8008f8dd14e1c8f0724237c31da8e1
  • Head: eb50f7f1d831693fd1c14e5919bf467d66df5155
  • Source-built jrsonnet 0.5.0-pre98

Result:

Ready. The PR now contains only the cached renderDouble work; the Builtin1/3 apply override work is already covered by #807 on master.

@He-Pin He-Pin force-pushed the perf/cached-render-double branch from 12b870c to 4c8c5ab Compare April 12, 2026 17:32
@He-Pin He-Pin marked this pull request as ready for review April 12, 2026 17:48
@He-Pin He-Pin marked this pull request as draft April 12, 2026 18:57
@He-Pin He-Pin force-pushed the perf/cached-render-double branch 3 times, most recently from 3ae5a56 to 5e55449 Compare April 26, 2026 10:46
@He-Pin He-Pin marked this pull request as ready for review April 26, 2026 10:48
@stephenamar-db
Copy link
Copy Markdown
Collaborator

I'm questioning the usefulness of this tweak (for numbers) - this seems quite neutral

@He-Pin He-Pin marked this pull request as draft April 28, 2026 21:00
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 28, 2026

It just a bit optimization around realistic2 where many small nunbers.

stephenamar-db pushed a commit that referenced this pull request Apr 30, 2026
Split out from #763.

Motivation:

Reduce allocation and dispatch overhead when one- and three-argument
builtins are called through the dynamic `Val.Func.apply1` / `apply3`
path.

Key Design Decision:

Keep the optimization local and semantics-preserving. `Builtin2` already
has an exact-arity `apply2` override; this adds matching
`Builtin1.apply1` and `Builtin3.apply3` overrides. Exact positional
calls directly invoke the structured `evalRhs` overload and skip
constructing an intermediate `Array`. Non-exact paths still fall back to
the generic parent application path.

Correctness:

- The direct path matches the existing `Builtin1.apply` /
`Builtin3.apply` exact positional behavior: force the supplied `Eval`
values, then call the typed `evalRhs`.
- Named arguments, missing defaults, too many arguments, and other
non-exact calls still use the generic function application logic.
- Static `Expr.ApplyBuiltin1` / `Expr.ApplyBuiltin3` paths are
unchanged; this only helps dynamic builtin calls such as a builtin
stored in a local or returned from another function.

Modification:

- Add `Builtin1.apply1`.
- Add `Builtin3.apply3`.

Validation:

- `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'`
- `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` (`141/141, SUCCESS`)
- `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'`
- `./mill --no-server '_.jvm[_].__.test'` (`1104/1104, SUCCESS`)
- Dynamic builtin smoke checks:
  - `local f = std.length; f([1, 2, 3])` -> `3`
  - `local f = std.substr; f("abcdef", 1, 3)` -> `"bcd"`
  - `local f = std.substr; f(str="abcdef", from=1, len=3)` -> `"bcd"`

Hyperfine:

Toolchain:

- `hyperfine 1.20.0`
- `--warmup 3 --min-runs 25` for targeted dynamic builtin benchmarks
- `--warmup 3 --min-runs 20` for `realistic2`
- JVM assemblies built with `./mill --no-server show
'sjsonnet.jvm[3.3.7].assembly'`
- Base: `upstream/master` at `c04fc804`
- Branch: `2067d8b5`

Targeted `Builtin1` dynamic call benchmark:

```jsonnet
local identity(x) = x;
local len = identity(std.length);
std.foldl(
  function(acc, i) acc + len("abcdef"),
  std.range(1, 5000000),
  0
)
```

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `master builtin1_dynamic` | 649.8 +/- 48.3 | 557.4 | 726.8 | 1.00 |
| `branch builtin1_dynamic` | 661.7 +/- 41.0 | 606.1 | 747.3 | 1.02 +/-
0.10 |

Result: statistically neutral in this hyperfine run.

Targeted `Builtin3` dynamic call benchmark:

```jsonnet
local identity(x) = x;
local substr = identity(std.substr);
std.foldl(
  function(acc, i) acc + std.length(substr("abcdef", 1, 3)),
  std.range(1, 3000000),
  0
)
```

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `master builtin3_dynamic` | 742.5 +/- 156.1 | 594.1 | 1254.7 | 1.12
+/- 0.30 |
| `branch builtin3_dynamic` | 660.4 +/- 110.9 | 534.0 | 962.6 | 1.00 |

Result: branch was faster in this run, but variance is high.

End-to-end `realistic2`:

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `master realistic2` | 544.9 +/- 95.3 | 414.1 | 706.5 | 1.27 +/- 0.27 |
| `branch realistic2` | 428.4 +/- 54.1 | 378.3 | 565.8 | 1.00 |

Result: branch was faster in this run; due JVM-startup and system noise,
treat this as a non-regression signal rather than a guaranteed 1.27x
speedup.
Motivation:
Reduce allocation overhead in common numeric rendering paths.

Modification:
1. RenderUtils.renderDouble reuses pre-cached string representations for exact integer doubles in the range 0-255.
2. Materializer.stringify delegates number stringification to RenderUtils.renderDouble, removing its duplicate integer fast path.

Result:
Numeric materialization uses the shared renderDouble fast path. The Builtin1.apply1 / Builtin3.apply3 specialization from the original PR is already present in current master via databricks#807, so it is no longer part of this PR diff.
@He-Pin He-Pin force-pushed the perf/cached-render-double branch from 5e55449 to eb50f7f Compare April 30, 2026 04:34
@He-Pin He-Pin changed the title perf: cached renderDouble for small integers + Builtin1/3 apply overrides perf: cache renderDouble for small integers Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants