perf: cache renderDouble for small integers by He-Pin · Pull Request #763 · databricks/sjsonnet

He-Pin · 2026-04-12T12:21:36Z

Motivation:

Reduce allocation overhead in common numeric rendering paths.

Key Design Decision:

Keep this PR focused on the renderDouble optimization. The original Builtin1.apply1 / Builtin3.apply3 specialization has already landed in current master via #807, so it is intentionally no longer part of this PR diff after the rebase.

Modification:

Materializer delegates number stringification to RenderUtils.renderDouble.
RenderUtils.renderDouble reuses a small integer string cache for exact integer doubles in the range 0 to 255.

Benchmark Results:

JMH throughput, normalized from ms/op to ops/ms (ops/ms = 1 / ms_per_op; higher is better):

Benchmark	master ms/op	PR ms/op	master ops/ms	PR ops/ms	Delta ops/ms
`large_string_join`	1.972	0.545	0.507	1.835	+261.8%
`large_string_template`	3.260	1.635	0.307	0.612	+99.4%
`realistic2`	128.661	47.007	0.0078	0.0213	+173.7%
`parseInt`	0.057	0.033	17.54	30.30	+72.7%

Scala Native hyperfine against source-built jrsonnet (git@github.com:CertainLach/jrsonnet.git, origin/master 80cd36a; lower is better):

Benchmark	master-native	PR-native	jrsonnet-source	Result
`realistic2`	86.3 +/- 1.2 ms	87.1 +/- 1.2 ms	97.4 +/- 1.9 ms	Native neutral; PR is 1.01x slower than master within noise
`parseInt`	5.5 +/- 0.7 ms	5.6 +/- 0.8 ms	4.7 +/- 1.4 ms	Startup-dominated and noisy; PR is neutral vs master

Analysis:

The JMH signal is strongly positive on numeric-heavy rendering workloads because exact small integer doubles avoid repeated string allocation through the shared renderDouble path. Scala Native is effectively neutral on these two hyperfine cases; the parseInt run reported outliers and is dominated by process startup at this input size.

Verification:

./mill --no-server bench.runRegressions bench/resources/cpp_suite/large_string_join.jsonnet bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/realistic2.jsonnet bench/resources/go_suite/parseInt.jsonnet
hyperfine --warmup 10 --min-runs 50 -N
./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'
./mill --no-server 'sjsonnet.jvm[3.3.7].test'

References:

PR branch: perf/cached-render-double
Base: master at 192bb3374b8008f8dd14e1c8f0724237c31da8e1
Head: eb50f7f1d831693fd1c14e5919bf467d66df5155
Source-built jrsonnet 0.5.0-pre98

Result:

Ready. The PR now contains only the cached renderDouble work; the Builtin1/3 apply override work is already covered by #807 on master.

stephenamar-db · 2026-04-28T20:11:49Z

I'm questioning the usefulness of this tweak (for numbers) - this seems quite neutral

He-Pin · 2026-04-28T21:01:03Z

It just a bit optimization around realistic2 where many small nunbers.

Split out from #763. Motivation: Reduce allocation and dispatch overhead when one- and three-argument builtins are called through the dynamic `Val.Func.apply1` / `apply3` path. Key Design Decision: Keep the optimization local and semantics-preserving. `Builtin2` already has an exact-arity `apply2` override; this adds matching `Builtin1.apply1` and `Builtin3.apply3` overrides. Exact positional calls directly invoke the structured `evalRhs` overload and skip constructing an intermediate `Array`. Non-exact paths still fall back to the generic parent application path. Correctness: - The direct path matches the existing `Builtin1.apply` / `Builtin3.apply` exact positional behavior: force the supplied `Eval` values, then call the typed `evalRhs`. - Named arguments, missing defaults, too many arguments, and other non-exact calls still use the generic function application logic. - Static `Expr.ApplyBuiltin1` / `Expr.ApplyBuiltin3` paths are unchanged; this only helps dynamic builtin calls such as a builtin stored in a local or returned from another function. Modification: - Add `Builtin1.apply1`. - Add `Builtin3.apply3`. Validation: - `./mill --no-server 'sjsonnet.jvm[3.3.7].compile'` - `./mill --no-server 'sjsonnet.jvm[3.3.7].test'` (`141/141, SUCCESS`) - `./mill --no-server 'sjsonnet.jvm[3.3.7].checkFormat'` - `./mill --no-server '_.jvm[_].__.test'` (`1104/1104, SUCCESS`) - Dynamic builtin smoke checks: - `local f = std.length; f([1, 2, 3])` -> `3` - `local f = std.substr; f("abcdef", 1, 3)` -> `"bcd"` - `local f = std.substr; f(str="abcdef", from=1, len=3)` -> `"bcd"` Hyperfine: Toolchain: - `hyperfine 1.20.0` - `--warmup 3 --min-runs 25` for targeted dynamic builtin benchmarks - `--warmup 3 --min-runs 20` for `realistic2` - JVM assemblies built with `./mill --no-server show 'sjsonnet.jvm[3.3.7].assembly'` - Base: `upstream/master` at `c04fc804` - Branch: `2067d8b5` Targeted `Builtin1` dynamic call benchmark: ```jsonnet local identity(x) = x; local len = identity(std.length); std.foldl( function(acc, i) acc + len("abcdef"), std.range(1, 5000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin1_dynamic` | 649.8 +/- 48.3 | 557.4 | 726.8 | 1.00 | | `branch builtin1_dynamic` | 661.7 +/- 41.0 | 606.1 | 747.3 | 1.02 +/- 0.10 | Result: statistically neutral in this hyperfine run. Targeted `Builtin3` dynamic call benchmark: ```jsonnet local identity(x) = x; local substr = identity(std.substr); std.foldl( function(acc, i) acc + std.length(substr("abcdef", 1, 3)), std.range(1, 3000000), 0 ) ``` | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master builtin3_dynamic` | 742.5 +/- 156.1 | 594.1 | 1254.7 | 1.12 +/- 0.30 | | `branch builtin3_dynamic` | 660.4 +/- 110.9 | 534.0 | 962.6 | 1.00 | Result: branch was faster in this run, but variance is high. End-to-end `realistic2`: | Command | Mean [ms] | Min [ms] | Max [ms] | Relative | |:---|---:|---:|---:|---:| | `master realistic2` | 544.9 +/- 95.3 | 414.1 | 706.5 | 1.27 +/- 0.27 | | `branch realistic2` | 428.4 +/- 54.1 | 378.3 | 565.8 | 1.00 | Result: branch was faster in this run; due JVM-startup and system noise, treat this as a non-regression signal rather than a guaranteed 1.27x speedup.

Motivation: Reduce allocation overhead in common numeric rendering paths. Modification: 1. RenderUtils.renderDouble reuses pre-cached string representations for exact integer doubles in the range 0-255. 2. Materializer.stringify delegates number stringification to RenderUtils.renderDouble, removing its duplicate integer fast path. Result: Numeric materialization uses the shared renderDouble fast path. The Builtin1.apply1 / Builtin3.apply3 specialization from the original PR is already present in current master via databricks#807, so it is no longer part of this PR diff.

He-Pin force-pushed the perf/cached-render-double branch from 12b870c to 4c8c5ab Compare April 12, 2026 17:32

He-Pin marked this pull request as ready for review April 12, 2026 17:48

He-Pin mentioned this pull request Apr 12, 2026

performance optimization #666

Open

He-Pin marked this pull request as draft April 12, 2026 18:57

He-Pin force-pushed the perf/cached-render-double branch 3 times, most recently from 3ae5a56 to 5e55449 Compare April 26, 2026 10:46

He-Pin marked this pull request as ready for review April 26, 2026 10:48

He-Pin marked this pull request as draft April 28, 2026 21:00

He-Pin mentioned this pull request Apr 30, 2026

perf: specialize Builtin1 and Builtin3 apply paths #807

Merged

He-Pin force-pushed the perf/cached-render-double branch from 5e55449 to eb50f7f Compare April 30, 2026 04:34

He-Pin changed the title ~~perf: cached renderDouble for small integers + Builtin1/3 apply overrides~~ perf: cache renderDouble for small integers Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache renderDouble for small integers#763

perf: cache renderDouble for small integers#763
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/cached-render-double

He-Pin commented Apr 12, 2026 •

edited

Loading

Uh oh!

stephenamar-db commented Apr 28, 2026

Uh oh!

He-Pin commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

He-Pin commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephenamar-db commented Apr 28, 2026

Uh oh!

He-Pin commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

He-Pin commented Apr 12, 2026 •

edited

Loading