Skip to content

[codex] Compact array and map decoding stacks#366

Draft
ruslandoga wants to merge 1 commit into
masterfrom
ruslandoga-conductor/array-map-accum-counter
Draft

[codex] Compact array and map decoding stacks#366
ruslandoga wants to merge 1 commit into
masterfrom
ruslandoga-conductor/array-map-accum-counter

Conversation

@ruslandoga
Copy link
Copy Markdown
Collaborator

Summary

Compact RowBinary array and map decoding by using accumulator frames with remaining counters instead of expanding every element type onto the decode stack.
This keeps large arrays and maps from inflating continuation state while preserving nested decoding behavior.
Added regression tests that assert large array and map continuations remain compact.

Validation

  • MIX_ENV=test mix format --check-formatted
  • mix test test/ch/row_binary_test.exs

@ruslandoga
Copy link
Copy Markdown
Collaborator Author

ruslandoga commented May 19, 2026

Updated benchmark/design notes after removing the narrow UInt8/Map(String, UInt8) special casing:

The implementation now follows the parser-state ideas from msgpax and json.erl more closely. msgpax keeps collection progress as counters (index/count) instead of pushing one operation per element; json.erl keeps continuation state as the current function/data plus a structural stack, then resumes directly into array_push, object_value, etc. This PR now does the same for RowBinary arrays: fixed-width scalar arrays decode through generated decode_array_<type>_items/7 loops with remaining in function arguments, and only store one compact continuation tuple when input runs out. Variable-width and parameterized array element types use a generic decode_one counter loop. Maps no longer expand all key/value types up front either; they use the generic compact map_acc frame.

Benchmarked current branch (b3924d8) against origin/master (5c9244a) on Apple M2, Elixir 1.19.5, Erlang 28.3 JIT, with Benchee using 1s warmup, 3s run, and 1s memory time.

Original collection benchmark:

Scenario master avg / mem PR avg / mem Result
Array(UInt8) x 10_000 312 us / 377 KB 90.8 us / 171 KB PR faster, less allocation
Array(UInt8) x 100_000 2.85 ms / 5.70 MB 994 us / 2.65 MB PR faster, less allocation
Map(String, UInt8) x 10_000 7.50 ms / 4.14 MB 4.55 ms / 1.95 MB PR faster, less allocation
Map(String, UInt8) x 100_000 113 ms / 43.0 MB 74.3 ms / 44.5 MB PR faster, similar allocation

Additional fixed-width array benchmark:

Scenario master avg / mem PR avg / mem Result
Array(UInt8) x 100_000 2.41 ms / 5.70 MB 1.20 ms / 2.65 MB PR faster, less allocation
Array(UInt64) x 100_000 2.97 ms / 4.86 MB 1.05 ms / 2.46 MB PR faster, less allocation
Array(Int64) x 100_000 2.90 ms / 4.86 MB 1.07 ms / 2.46 MB PR faster, less allocation
Array(Float64) x 100_000 5.50 ms / 6.39 MB 2.89 ms / 3.93 MB PR faster, less allocation
Array(Date) x 100_000 16.8 ms / 24.0 MB 19.3 ms / 20.5 MB PR slower, less allocation

Length-prefix-only streaming results:

Scenario master avg / mem PR avg / mem Result
Array(UInt8) x 100_000 1.43 ms / 2.65 MB 179 ns / 224 B constant continuation state
Array(UInt8) x 1_000_000 28.7 ms / 15.5 MB 153 ns / 224 B constant continuation state
Map(String, UInt8) x 100_000 12.5 ms / 3.93 MB 171 ns / 288 B constant continuation state
Map(String, UInt8) x 1_000_000 77.2 ms / 32.8 MB 165 ns / 288 B constant continuation state

The sub-microsecond continuation timings have high deviation, so the important signal there is allocation shape: continuation state is constant-size instead of proportional to collection length. The complete decode numbers are the key guardrail: this version is no longer buying a streaming-only win by regressing full scalar arrays. Array(Date) is the one measured fixed-width case that traded some speed for lower allocation; it likely needs a separate date-specific decision if that matters.

A larger follow-up could rework all RowBinary decoding around explicit continuation frames, like json.erl: rows, arrays, maps, tuples, and variants would each be resumable states instead of encoded as synthetic type-stack entries. That would let maps use the same counter-in-args approach generically without generating cross-product key/value special cases.

@ruslandoga ruslandoga force-pushed the ruslandoga-conductor/array-map-accum-counter branch from 229042e to 03c6cdf Compare May 19, 2026 06:17
@ruslandoga ruslandoga force-pushed the ruslandoga-conductor/array-map-accum-counter branch from 03c6cdf to b3924d8 Compare May 19, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant