[CALCITE-7448] Add support for ':' field/item access syntax by tmater · Pull Request #4844 · apache/calcite

tmater · 2026-03-24T16:14:59Z

Changes Proposed

This PR adds opt-in support for : as a field/item access syntax behind a new SqlConformance#isColonFieldAccessAllowed() hook. No built-in conformance enables it, so the default parser behavior is unchanged.

The parser preserves : as a dedicated SqlColonOperator (SqlKind.COLON) in the SqlNode tree. The call has the shape (base, SqlNodeList<segments>), where each segment is an identifier (dot-notation), a string literal (bracket key) or an integer literal (bracket index). Calcite does not lower COLON to Rex — engines opt in via a custom conformance and supply their own convertlet, e.g. mapping to VARIANT_GET or any engine-specific function.

This keeps engines free to choose their own semantics for path access (typing, missing-key behavior, case sensitivity) without baking a specific lowering into Calcite's parser.

Supported forms include v:field, v:field.nested, v:['field name'], v:[0], arr[1]:field, obj['x']:nested['y'], and mixed chains such as v:a.b['c'][0].

To avoid grammar ambiguity when colon mode is enabled, JSON_OBJECT and JSON_OBJECTAGG must use the KEY ... VALUE form; the 'k': v and 'k', v shorthands are rejected.

Reproduction

select v:field, v:['field name'], arr[1]:field, obj['x']:nested from t;

select json_object(key v:field value arr[1]:field);

select v:a.b['c'][0]::integer from t;  -- Babel :: cast on a colon path

Testing

Added parser coverage in SqlParserTest and BabelParserTest for valid and invalid colon paths, JSON constructor disambiguation, and :: cast interactions in Babel. Added validator coverage in SqlValidatorTest confirming COLON validates as nullable ANY (concrete typing is the convertlet's responsibility) and that the base expression is still resolved against the schema.

Jira Link

CALCITE-7448

caicancai · 2026-03-25T02:27:24Z

I might wait until the CI issue is resolved before reviewing this PR.

caicancai · 2026-03-25T13:09:30Z

I don't understand Calcite JavaCC code very well, I might need to learn it. If no one reviews it this week, I'll start working on it next week.

tmater · 2026-03-25T14:17:09Z

I don't understand Calcite JavaCC code very well, I might need to learn it. If no one reviews it this week, I'll start working on it next week.

Thank you for taking a look, @caicancai. I agree this is still a fairly complex PR, and I wouldn’t claim to fully understand all of its implications yet.

That’s also why I tried to include a wide range of test variations and keep the changes gated behind a conformance config.

I’ve organized it into five commits (prior to the CI failures) to make the review easier. If it would help further, I’m happy to split it into 4–5 smaller PRs.

mihaibudiu · 2026-03-25T17:21:42Z

Is the behavior with :: needing parenthesis a new feature, or was this already present previously?

tmater · 2026-04-01T12:26:30Z

Is the behavior with :: needing parenthesis a new feature, or was this already present previously?

@mihaibudiu , good catch! The parenthesis requirement itself is not new, on main, v::varchar[1] already means "cast to VARCHAR[1]" (bracket binds to the type), so parentheses were always needed to subscript the cast result.

However, you're right that something was off: the bracket refactor broke parsing v::varchar[1] entirely, because InfixCast was implicitly relying on Expression2()'s loop to consume postfixes. This commit fixes that by adding an explicit AddRegularPostfixes(list) call after DataType() in InfixCast.

caicancai · 2026-04-05T06:56:51Z

+    [
+        LOOKAHEAD(2, <COLON> SimpleIdentifier(),
+            { this.conformance.isColonFieldAccessAllowed() })
+        <COLON>


Why is chained colon access not supported?

I kept : intentionally non-chainable here. After one : we are already back in the existing postfix world, so the follow-up access can be expressed with the normal operators:

a:b.c instead of a:b:c

a:['x'].y instead of a:['x']:y

(a:b)['x'] / a:b['x'] instead of a:b:['x']

So a second : would mostly be extra surface syntax, not extra expressiveness. Keeping it to a single colon also keeps the grammar narrower and avoids widening the ambiguous space around other colon uses, especially :: in Babel and JSON constructor : handling.

If we decide later that repeated : is a real dialect requirement, we can extend the current [...] to a loop, but I wanted to start with the minimal syntax that covers the targeted cases.

Does this cover the Databricks and Snowflake capabilities?

Yes, for the subset this patch is targeting, it covers the common Snowflake/Databricks colon-path surface.

Snowflake is proprietary, I could only reverse engineer/check docs. Examples in the docs are src:salesperson.name, src:vehicle[0].price, and src:vehicle[0].price::NUMBER, which matches the syntax this change adds. But experimented with it and s:payload.score::INT works too.

For Databricks, the public docs and the open Spark grammar match this implementation well. Spark treats : as semi-structured field access and :: as a separate cast operator. That matches the forms added here, such as v:field, v:['field'], arr[1]:field, v:field[1], and v:field::int.
Reference: SqlBaseParser.g4#L1316

caicancai · 2026-04-05T06:58:38Z

                s.pos()));
        list.add(dt);
    }
+    AddRegularPostfixes(list)


I'm not sure if it will affect existing semantics.

Before the refactor, InfixCast could stop after DataType() and rely on the outer Expression2() loop to notice any following . or [...]. After the refactor, that outer postfix branch was removed and the shared postfix handling moved earlier into AddExpression2b(). But Babel InfixCast is not part of that earlier path; it is an extraBinaryExpressions hook that runs later.

So if InfixCast does not invoke AddRegularPostfixes() itself, nobody else will. That is why it has to “own” it now: not because the semantics changed, but because the control flow changed. The shared postfix parser still exists, but this is now the only place on the Babel :: path where it can be called.

caicancai · 2026-04-07T14:31:16Z

@tmater Thank you for following up. I left a question on Jira. Could you answer it when you have time?

mihaibudiu · 2026-04-13T03:44:29Z

I was on vacation, I plan to review this PR again.

mihaibudiu · 2026-04-13T23:48:39Z

+
+    // Multiple brackets bind to the type
+    sql("select v::varchar[1][2] from t")
+        .ok("SELECT `V` :: VARCHAR[1][2]\nFROM `T`");


But VARCHAR[1] is not a type. What does this mean?

This is parser coverage rather than a meaningful type assertion. The test is checking that after ::, the parser still accepts postfix syntax on the RHS and groups it there, rather than attaching the postfix to the cast result. I agree the old example made that intent unclear, so I updated the test to make the associativity point clearer.

mihaibudiu · 2026-04-14T20:15:58Z

+    // Bracket after :: binds to the type, not as subscript on the cast result
+    sql("select v::varchar array[1] from t")
+        .ok("SELECT `V` :: VARCHAR ARRAY[1]\nFROM `T`");
+    f.sql("select v::varchar array[1] from t")


I still don't understand what the parse tree of this expression is.
I think the parser should actually reject this expression.

I see your point, this is quite ambiguous. The reason I added these cases was mainly to show that we did not introduce regressions while touching the INFIX_CAST grammar.

For v::varchar array[1], the tree is something like:

INFIX_CAST( v, ITEM(SqlDataTypeSpec(VARCHAR ARRAY), 1) )

So the [1] is attaching on the type side, not as a subscript on the cast result.

Also, I backported these tests onto the base of this PR as proof: #4885. The same behavior is already present there, so this is not something introduced by the colon-field-access change.

Would you prefer fixing this ambiguity in this PR, or filing a Jira to clean it up separately? I am leaning towards a separate PR and polishing these tests a bit, maybe leaving only one ambiguous expression test just in case.

Ok, if the behavior does not change, it's fine. If we think there's a bug in the parser, where it produces a nonsensical parse tree, we should probably file an issue which can be solved separately.

Filed CALCITE-7475 to cover this part and narrowed down the test surface.

mihaibudiu · 2026-04-15T17:27:51Z

+    ]
+}
+
+void AddBracketAccess(List<Object> list) :


please add a comment describing what this is expected to parse; I know this code has been moved, but since the context of the caller is missing, the comment will help maintainers.

mihaibudiu · 2026-04-15T17:54:00Z

+    [
+        LOOKAHEAD(2, <COLON> SimpleIdentifier(),
+            { this.conformance.isColonFieldAccessAllowed() })
+        <COLON>


Does this cover the Databricks and Snowflake capabilities?

tmater · 2026-04-17T07:07:23Z

+    [
+        LOOKAHEAD(2, <COLON> SimpleIdentifier(),
+            { this.conformance.isColonFieldAccessAllowed() })
+        <COLON>


Yes, for the subset this patch is targeting, it covers the common Snowflake/Databricks colon-path surface.

Snowflake is proprietary, I could only reverse engineer/check docs. Examples in the docs are src:salesperson.name, src:vehicle[0].price, and src:vehicle[0].price::NUMBER, which matches the syntax this change adds. But experimented with it and s:payload.score::INT works too.

For Databricks, the public docs and the open Spark grammar match this implementation well. Spark treats : as semi-structured field access and :: as a separate cast operator. That matches the forms added here, such as v:field, v:['field'], arr[1]:field, v:field[1], and v:field::int.
Reference: SqlBaseParser.g4#L1316

tmater · 2026-04-17T07:07:29Z

+    ]
+}
+
+void AddBracketAccess(List<Object> list) :


tmater · 2026-04-17T12:46:05Z

+    // Bracket after :: binds to the type, not as subscript on the cast result
+    sql("select v::varchar array[1] from t")
+        .ok("SELECT `V` :: VARCHAR ARRAY[1]\nFROM `T`");
+    f.sql("select v::varchar array[1] from t")


Filed CALCITE-7475 to cover this part and narrowed down the test surface.

tmater · 2026-04-17T13:14:35Z

+        ext = RowExpressionExtension() {
+            list.add(
+                new SqlParserUtil.ToTreeListItem(
+                    SqlStdOperatorTable.DOT, getPos()));


I ran into this downstream while building on top of this work. It gets hard later to tell whether the original syntax was : or ., since : is rewritten to DOT / ITEM immediately, so I think a dedicated follow-up makes sense there. I’d still prefer to keep this PR scoped to the parser change, and I opened CALCITE-7476 for the next step.

tmater · 2026-04-17T13:48:38Z

+/** Parses postfixes that are allowed after colon field access, for example
+ * {@code v:field.nested} or {@code v:[1]['x']}. Unlike regular postfixes,
+ * this path does not allow member-style calls such as {@code v:field.func()}. */
+void AddColonPostfixes(List<Object> list) :


I added a separate postfix method for colon access because this came up during downstream review. Reusing the regular postfix path was too permissive, since it would also allow the same member-style calls as dot access, while for : we only want path continuation with . and []; Spark/Databricks blocks that broader dot-style behavior as well.

mihaibudiu · 2026-04-17T15:48:08Z

Can you even parse the expression if you don't have a colon operator?
Your other issue about introducing it seems fine.

tmater · 2026-04-20T11:11:17Z

Can you even parse the expression if you don't have a colon operator?
Your other issue about introducing it seems fine.

Yes, it is usable as-is. The current parser implementation lowers : access into the normal DOT / ITEM AST shape, this unblocks the syntax problems and does not break anything. Then downstream I do a post-parse rewrite pass that re-identifies colon-origin access and rewrites those calls into dedicated colon operators, so later planner stages can still tell them apart. It is not clean, but a great first milestone.

I chose this shape intentionally to keep the initial upstream parser change small and unblock the syntax problem first, rather than mixing it with a broader AST/operator change in the same step. I’d be happy to follow up with adding the operator as a next step.

mihaibudiu · 2026-04-21T01:31:12Z

But isn't the right solution to first introduce a colon operator and just parse into that?

tmater · 2026-04-21T06:43:51Z

But isn't the right solution to first introduce a colon operator and just parse into that?

I think that is a reasonable follow-up direction, but this PR is intentionally parser-scoped: accept : syntax behind conformance, map it to existing DOT / ITEM, and avoid widening the patch into a broader operator change. I updated the title/description to make that boundary explicit. If you see a first-class colon operator as a prerequisite for this PR, let me know.

mihaibudiu · 2026-04-21T20:50:16Z

+        LOOKAHEAD(2) <DOT>
+        p = SimpleIdentifier() {
+            list.add(
+                new SqlParserUtil.ToTreeListItem(


So is the plan in a subsequent PR to add a Colon operator and replace this with colon?
Then I would suggest using the colon from the start. You don't want people to take a dependence on this parse tree and to expect a dot. Why change something if you can do it right from the start?

asolimando · 2026-04-22T05:33:39Z

But isn't the right solution to first introduce a colon operator and just parse into that?

IMO it's understandable, as a new contributor, to try and keep the changes minimal (or whatever will decrease friction), but since we are happy with having the final solution from the beginning, no need for doing it in two steps.

…cket access

…acket refactor The branch moved bracket ([...]) and dot (.) postfix handling out of Expression2()'s loop and into AddRegularPostfixes (called from AddExpression2b). Babel's InfixCast production was implicitly relying on Expression2()'s loop to consume postfixes after the DataType() call, so after the refactor, expressions like v::varchar[1] would fail to parse because no grammar rule picked up the trailing [1]. Fix: call AddRegularPostfixes(list) after DataType() in InfixCast, making the postfix consumption explicit rather than relying on the caller's loop. Also consolidate InfixCast test coverage: remove the now-redundant testInfixCastBracketAccessNeedsParentheses and extend testColonFieldAccessWithInfixCast with comprehensive cases covering bracket access, dot access, parenthesized forms, and array types—both with and without isColonFieldAccessAllowed.

Tighten Babel infix-cast parser coverage around postfix binding after '::' and rename the test to match its scope. Restrict postfix continuation after ':' to field and bracket path segments so member-style calls such as 'v:field.func()' are no longer accepted. Add parser coverage for the rejected colon cases and explanatory comments for the shared postfix helper productions in Parser.jj.

tmater · 2026-04-28T13:05:08Z

Thanks for the review and sorry for the delay, @mihaibudiu and @asolimando.

Yes, I wanted to keep the change small and build the feature iteratively. It helps me focus on small milestones, and I think it also makes the review easier.

The branch was a month old, so I figured a rebase was overdue, especially given the direction change.

The design is no longer a simple rewrite of COLON into DOT/ITEM. I reverted the parser refactors that were needed for that, added parsing for the COLON operator directly, and introduced a dedicated SqlColonOperator that engines can lower via their own convertlet (e.g. to VARIANT_GET).

PTAL when you have a moment, happy to iterate further.

sonarqubecloud · 2026-04-28T13:19:41Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
97.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mihaibudiu

I don't see a convertlet for this operator.
Don't you plan to add one to the standardConvertletTable?

mihaibudiu · 2026-04-29T02:15:59Z

+ * character literals (bracketed string keys), or numeric literals (bracketed
+ * indexes).
+ *
+ * <p>Calcite does not lower this operator to Rex. Engines must register a


Frankly, the way it's used is not the business of the class.
The JavaDoc should describe just what this class does, not what other classes do with it. If the other classes change, this comment will become obsolete.
This comment should probably be in the convertlet table if at all.

mihaibudiu · 2026-04-29T02:18:55Z

+  }
+
+  // Returns nullable ANY: path access can yield NULL on a missing key, and
+  // the concrete type is up to the engine's convertlet.


You again assume something about the convertlet table.
This type seems overly general, can't you compute the type of a colon expression from the types of its operands?

mihaibudiu · 2026-04-29T02:19:55Z

+   * are valid. In this mode, JSON constructors must use the
+   * {@code KEY ... VALUE} form rather than {@code :} or comma-pair syntax.
+   *
+   * <p>No built-in conformance level enables this; engines that want


Again you describe in JavaDoc what other classes do.
I think you should only refer to what is happening here.

caicancai reviewed Apr 5, 2026

View reviewed changes

caicancai self-assigned this Apr 7, 2026

mihaibudiu reviewed Apr 13, 2026

View reviewed changes

mihaibudiu reviewed Apr 14, 2026

View reviewed changes

tmater mentioned this pull request Apr 15, 2026

Infix cast baseline regression test #4885

Draft

mihaibudiu reviewed Apr 15, 2026

View reviewed changes

tmater commented Apr 17, 2026

View reviewed changes

tmater changed the title ~~[CALCITE-7448] Add support for : as a field/item access operator~~ [CALCITE-7448] Add conformance-gated parser support for ':' field/item access syntax Apr 21, 2026

tmater changed the title ~~[CALCITE-7448] Add conformance-gated parser support for ':' field/item access syntax~~ [CALCITE-7448] Add parser support for ':' field/item access syntax Apr 21, 2026

mihaibudiu reviewed Apr 21, 2026

View reviewed changes

tmater added 8 commits April 28, 2026 14:46

[CALCITE-7448] Refactor expression postfix parsing

25ee6b2

[CALCITE-7448] Add conformance hook for colon field access

a321071

[CALCITE-7448] Add conformance-gated colon field access

aab8550

[CALCITE-7448] Disambiguate JSON constructors from colon field access

084c554

[CALCITE-7448] Refine parser coverage and parenthesize infix cast bra…

268ff08

…cket access

[CALCITE-7448] Clarify infix cast bracket parser coverage

59fe2a6

[CALCITE-7448] Remove redundant colon field conformance override

22ba18e

tmater added 4 commits April 28, 2026 14:46

Add blank line between methods

6f9fdef

[CALCITE-7448] Clarify Babel infix cast parser tests

77e5808

[CALCITE-7448] Preserve colon syntax via dedicated operator

ca9ada3

tmater force-pushed the colon-syntax-mode branch from 8b64deb to ca9ada3 Compare April 28, 2026 12:51

tmater changed the title ~~[CALCITE-7448] Add parser support for ':' field/item access syntax~~ [CALCITE-7448] Add support for ':' field/item access syntax Apr 28, 2026

mihaibudiu reviewed Apr 29, 2026

View reviewed changes

Conversation

tmater commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Proposed

Reproduction

Testing

Jira Link

Uh oh!

caicancai commented Mar 25, 2026

Uh oh!

caicancai commented Mar 25, 2026

Uh oh!

tmater commented Mar 25, 2026

Uh oh!

mihaibudiu commented Mar 25, 2026

Uh oh!

tmater commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

caicancai commented Apr 7, 2026

Uh oh!

mihaibudiu commented Apr 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mihaibudiu commented Apr 17, 2026

Uh oh!

tmater commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihaibudiu commented Apr 21, 2026

Uh oh!

tmater commented Apr 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asolimando commented Apr 22, 2026

Uh oh!

tmater commented Apr 28, 2026

Uh oh!

sonarqubecloud Bot commented Apr 28, 2026

Quality Gate passed

tmater commented Mar 24, 2026 •

edited

Loading

tmater commented Apr 20, 2026 •

edited

Loading