Skip to content

[CALCITE-7463] UnionToFilterRule incorrectly rewrites UNION with LIMIT#4878

Open
xiedeyantu wants to merge 6 commits intoapache:mainfrom
xiedeyantu:CALCITE-7463
Open

[CALCITE-7463] UnionToFilterRule incorrectly rewrites UNION with LIMIT#4878
xiedeyantu wants to merge 6 commits intoapache:mainfrom
xiedeyantu:CALCITE-7463

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Jira Link

CALCITE-7463

Changes Proposed

The UnionToFilterRule produces incorrect results when applied to inputs that contain LIMIT.

Specifically, the rule incorrectly collapses:

(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2) 

into:

SELECT DISTINCT mgr, comm FROM emp LIMIT 2 

This transformation is not semantically equivalent.
Reproduction

(SELECT mgr, comm FROM emp LIMIT 2)
UNION
(SELECT mgr, comm FROM emp LIMIT 2) 

Plan Before

LogicalUnion(all=[false])
  LogicalSort(fetch=[2])
    LogicalProject(MGR=[$3], COMM=[$6])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
  LogicalSort(fetch=[2])
    LogicalProject(MGR=[$3], COMM=[$6])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]]) 

Plan After (Incorrect)

LogicalAggregate(group=[{0, 1}])
  LogicalSort(fetch=[2])
    LogicalProject(MGR=[$3], COMM=[$6])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]]) 

Expected Behavior
The transformation should NOT be applied when any input of UNION contains LogicalSort(That contains ORDER BY, LIMIT, OFFSET).

Copy link
Copy Markdown

@utafrali utafrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct: returning (input, null) when a Sort is detected in the input chain prevents the rule from incorrectly collapsing UNION branches that have LIMIT/OFFSET semantics. The helper method is clean and the traversal logic handles arbitrary nesting of Project/Filter above a Sort. Two minor gaps: the method name omits Filter from its description, and there is no test for UNION ALL with LIMIT.

Comment thread core/src/main/java/org/apache/calcite/rel/rules/SetOpToFilterRule.java Outdated
Comment thread core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java
@xiedeyantu
Copy link
Copy Markdown
Member Author

The fix is correct: returning (input, null) when a Sort is detected in the input chain prevents the rule from incorrectly collapsing UNION branches that have LIMIT/OFFSET semantics. The helper method is clean and the traversal logic handles arbitrary nesting of Project/Filter above a Sort. Two minor gaps: the method name omits Filter from its description, and there is no test for UNION ALL with LIMIT.

@utafrali I'm curious how you managed to review this PR in less than 5 minutes.

@utafrali
Copy link
Copy Markdown

The fix is correct: returning (input, null) when a Sort is detected in the input chain prevents the rule from incorrectly collapsing UNION branches that have LIMIT/OFFSET semantics. The helper method is clean and the traversal logic handles arbitrary nesting of Project/Filter above a Sort. Two minor gaps: the method name omits Filter from its description, and there is no test for UNION ALL with LIMIT.

@utafrali I'm curious how you managed to review this PR in less than 5 minutes.

I was already following this case closely, so I had the full context in mind. The changes were straightforward, so the review didn’t take long.

RelNode current = input.stripped();
while (true) {
if (current instanceof Sort) {
return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also check that there is actually a LIMIT, as you can have a Sort without LIMIT and that rewrite I think is correct

Copy link
Copy Markdown
Member Author

@xiedeyantu xiedeyantu Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's a great idea! I agree with you. Since SqlToRelConverter#convertOrder removes the ORDER BY clause, I used RelBuilder to construct the test case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this couldn't/shouldn't be generalized via a new metadata characterizing what makes this rewrite unsafe (and probably makes potentially other filter pushdowns unsound?

After reading the comments in Jira I am still trying to wrap my head around the problem and I think it's a more general pattern, but I couldn't spend much time thinking it through so I am just sharing my intuition hoping it helps.

I understand the LIMIT 2 unions up to a LIMIT 4 conceptually but it's an "at most 4" and non deterministic as discussed.

So what characterizes here a property of the union operands that makes the rewrite unsafe?

It's a sort+limit it seems, maybe this can become a metadata? Does this apply to other types of pushdowns like filter-sort transpose? Can they all be rewritten in terms of this new metadata?

I don't mean to block this effort until we figure this out. But I think we might have a more general fix.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I agree with your suggestion; this likely involves the transfer of physical properties, and I will proceed to investigate that area next. I would personally be very grateful to receive any further suggestions. Thank you!

final RelNode input = inputs.get(i).stripped();
final Pair<RelNode, @Nullable RexNode> pair = extractSourceAndCond(input);
boolean isSorted = false;
final List<RelCollation> inputCollations = mq.collations(input);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asolimando I use mq to check if the input is sorted. If the input is sorted, a sub-plan that has both limit/offset and sort is safe to be rewritten. Please check if this logic matches your suggestion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested let's bring the discussion in Jira, I will try to summarize there my idea, let's see if it makes sense for everyone, thanks for the code draft @xiedeyantu!

@xiedeyantu xiedeyantu requested a review from asolimando April 12, 2026 14:02
@sonarqubecloud
Copy link
Copy Markdown

@julianhyde
Copy link
Copy Markdown
Contributor

So there is no doubt:

-1

Three of us (me and Steven in Jira, Alessandro in the PR comments) have expressed doubts that this is even a bug.

Let's not merge any fix until we decide what, if anything, needs to be fixed.

@xiedeyantu
Copy link
Copy Markdown
Member Author

So there is no doubt:

-1

Three of us (me and Steven in Jira, Alessandro in the PR comments) have expressed doubts that this is even a bug.

Let's not merge any fix until we decide what, if anything, needs to be fixed.

@julianhyde Of course, I won't merge it before receiving approval from the committer. The conversation in the JIRA hasn't progressed for several days, so I assumed everyone agreed with this conservative approach. I would also like to thank Julian for the timely reminder. Perhaps I misunderstood. However, I agree with the solution described by Alessandro. This PR can be put on hold for now. Shall we move the discussion back to the JIRA?

@xiedeyantu xiedeyantu added the discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR label Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion-in-jira There's open discussion in JIRA to be resolved before proceeding with the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants