Skip to content

feat: add support for posexplode and posexplode_outer#4270

Open
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:feat/posexplode
Open

feat: add support for posexplode and posexplode_outer#4270
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:feat/posexplode

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4269.

Rationale for this change

Comet already supports the explode generator for array inputs but rejects posexplode and posexplode_outer, forcing a fallback to Spark for any query that needs the position column. This PR closes that gap so workloads that use the posexplode(arr) AS (pos, col) pattern (or the SQL LATERAL VIEW posexplode(...) form) stay native.

What changes are included in this PR?

  • CometExplodeExec (Scala serde) now accepts both Explode and PosExplode Catalyst nodes and propagates a new position flag through the operator proto.
  • Explode proto message gets a bool position field (tag 4).
  • The native planner builds a parallel List<Int32> positions column from a new ListPositionsExpr and passes it to DataFusion's UnnestExec as a second ListUnnest entry. Output schema injects pos: Int32 not null before the unnested element.
  • ListPositionsExpr (native/core/src/execution/expressions/list_positions.rs) reuses the input ListArray offsets and null bitmap, populating each row with [0..len-1]. Sharing offsets with the source array is what lets UnnestExec unnest both columns in parallel without alignment drift.
  • Maps still fall back (Add support for map inputs to explode #2837) and outer=true keeps its Incompatible status (DataFusion #19053).
  • Hand-written notes in docs/source/user-guide/latest/operators.md and understanding-comet-plans.md updated.

The implement-comet-expression and audit-comet-expression skills were used to scaffold the implementation and follow-up coverage.

How are these changes tested?

CometGenerateExecSuite was extended with 12 new tests covering:

  • posexplode over arrays (simple, empty, null, strings, nullable elements)
  • posexplode_outer over a simple array (allowIncompat=true)
  • multiple projected columns alongside the generator
  • map input falls back with the expected reason
  • posexplode over an array of structs (with field projection out of the unnested struct)
  • posexplode in a SQL LATERAL VIEW
  • posexplode of a literal array
  • posexplode across batch boundaries with spark.comet.batchSize=4 to verify the parallel positions/values lists stay aligned across batches

The existing CometGenerateExecSuite continues to pass.

andygrove added 2 commits May 8, 2026 17:25
Extend `CometExplodeExec` to accept both `Explode` and `PosExplode`
Catalyst nodes. The Explode operator proto carries a new `position`
flag; when set, the native planner builds a parallel `List<Int32>`
positions column with `ListPositionsExpr` and unnests it alongside the
array column via DataFusion's `UnnestExec`.

Maps still fall back (apache#2837) and `outer=true` keeps its Incompatible
status (DataFusion #19053).

Closes apache#4269.
@andygrove andygrove marked this pull request as ready for review May 9, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for posexplode and posexplode_outer

1 participant