perf: pack format specs and avoid literal substrings#767
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: pack format specs and avoid literal substrings#767He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
Contributor
Author
|
Maybe bits operation is better for tagging. |
He-Pin
commented
Apr 12, 2026
He-Pin
commented
Apr 12, 2026
f0bb14f to
7fb2010
Compare
cb41844 to
5dcc543
Compare
5dcc543 to
3584eb1
Compare
Contributor
Author
|
Reviewed. Bit-packed/tagged dispatch may be useful as a separate evaluator-wide optimization, but I am not folding it into this PR: this branch is intentionally scoped to the simple named |
3584eb1 to
5e24587
Compare
5e24587 to
f70349f
Compare
f70349f to
902c33f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
Speed up format-heavy Jsonnet programs while keeping the formatter JIT- and GC-friendly. The original PR specialized the all-simple-named-string shape (
%(x)s), but the generic format path still allocated per-spec objects / substrings and repeated object lookups in common repeated-key templates.Design:
FormatSpecas a packedAnyValover aLong; store hot-loop specs inArray[Long].%(key)stemplates: one object lookup, one stringification, then offset-based appends.valuesArr,valuesObj,labels, andspecBitsbefore the generic hot loop to reduce repeated type tests / field loads.Modification:
databricks:master(3a9a492899420456070fb84eaa5b89f8b7dfe1bf).902c33ff(perf: pack format specs and avoid literal substrings).specAthelper; Scala 2.12/2.13 CI treats that warning as fatal.FormatSpec.bits; width/precision keep signed encoding for*behavior compatibility.singleNamedLabelmetadata for the all-simple same-key case.%sstringification intosimpleStringValuefor the fast path cache.StringBuildercapacity for the single-label path regressed the key JMH run (large_string_template1.666 ms/op,realistic245.131 ms/op).Correctness:
specAthelper, and cross-version CI rejects that warning as fatal../mill "_.jvm[_].__.test": pass (2.12.21,2.13.18,3.3.7)../mill "_.js[_].__.test": pass (2.13.18,3.3.7)../mill "_.wasm[_].__.test": pass (2.13.18,3.3.7).NO_COLOR=1 TERM=dumb ./mill "_.native[_].__.test": pass (2.13.18,3.3.7)../mill "_.jvm[_].__.checkFormat": pass (2.12.21,2.13.18,3.3.7).git diff --check: pass.Benchmark Setup:
3a9a492899420456070fb84eaa5b89f8b7dfe1bf.f70349f8; current PR head is902c33ff, which only removes an unused private helper required by fatal-warning CI and does not change the benchmarked runtime path.build.millpins Mill JVM to Zulu 21). Project compile target is Java 17.--enable-native-access=ALL-UNNAMED -Xmx4G -XX:+UseG1GC -Xss100m.bench.runJmh sjsonnet.bench.RegressionBenchmark.main -p path=<all 36 paths> -wi 3 -i 5 -w 1s -r 1s -f 2 -jvmArgsAppend -Xss100m -rf json.-wi 2 -i 3 -w 1s -r 1s -f 1 -prof gc.bench.runJmh 'sjsonnet.bench.(MainBenchmark|ParserBenchmark|OptimizerBenchmark|MaterializerBenchmark|MultiThreadedBenchmark).*' -wi 3 -i 5 -w 1s -r 1s -f 2 ....Speed Summary (
RegressionBenchmark.main, 36 cases):0.9674(-3.3%).14/36; cases slower by at least 3%:0/36.large_string_templaterealistic1realistic2bench.02bench.03bench.07large_string_joingen_big_objectFull RegressionBenchmark speed table
assertionsbase64base64Decodebase64DecodeBytesbase64_byte_arraybase64_stressbench.01bench.02bench.03bench.04bench.06bench.07bench.08bench.09comparisoncomparison2escapeStringJsonfoldlgen_big_objectlarge_string_joinlarge_string_templatelstripCharsmanifestJsonExmanifestTomlExmanifestYamlDocmemberparseIntrealistic1realistic2reverserstripCharssetDiffsetIntersetUnionstripCharssubstrAllocation Summary (
gc.alloc.rate.norm, 36 cases):0.9992(-0.1%).1/36; cases with at least 1% higher allocation:2/36.large_string_templatefrom avoiding literal substrings / repeated named-format work. Most other cases are effectively neutral; two tiny non-format cases moved by about +2 KB/op in this one GC run and had neutral speed.large_string_templaterealistic1realistic2bench.02bench.03bench.07large_string_joingen_big_objectFull RegressionBenchmark allocation table
assertionsbase64base64Decodebase64DecodeBytesbase64_byte_arraybase64_stressbench.01bench.02bench.03bench.04bench.06bench.07bench.08bench.09comparisoncomparison2escapeStringJsonfoldlgen_big_objectlarge_string_joinlarge_string_templatelstripCharsmanifestJsonExmanifestTomlExmanifestYamlDocmemberparseIntrealistic1realistic2reverserstripCharssetDiffsetIntersetUnionstripCharssubstrOther JMH Results:
MainBenchmark.mainOptimizerBenchmark.mainParserBenchmark.mainKnown benchmark failures:
MaterializerBenchmark.*fails on both master and this PR withNoSuchElementException: None.getatMaterializerBenchmark.scala:43.MultiThreadedBenchmark.mainfails on both master and this PR withstd.assertEqual/ExecutionException.Result:
The updated PR keeps the original format-heavy win, improves broader formatter hot-loop shape, and is JIT/GC friendly: primitive spec storage, indexed arrays, offset appends, no tuple/Option allocation in the hot loop, and a single lookup/stringification path for repeated same-key templates. Full JMH vs current master shows speed-positive or neutral behavior, with allocation improvement concentrated in the intended format-heavy template case.