[CALCITE-7463] UnionToFilterRule incorrectly rewrites UNION with LIMIT#4878
[CALCITE-7463] UnionToFilterRule incorrectly rewrites UNION with LIMIT#4878xiedeyantu wants to merge 6 commits intoapache:mainfrom
Conversation
utafrali
left a comment
There was a problem hiding this comment.
The fix is correct: returning (input, null) when a Sort is detected in the input chain prevents the rule from incorrectly collapsing UNION branches that have LIMIT/OFFSET semantics. The helper method is clean and the traversal logic handles arbitrary nesting of Project/Filter above a Sort. Two minor gaps: the method name omits Filter from its description, and there is no test for UNION ALL with LIMIT.
@utafrali I'm curious how you managed to review this PR in less than 5 minutes. |
I was already following this case closely, so I had the full context in mind. The changes were straightforward, so the review didn’t take long. |
| RelNode current = input.stripped(); | ||
| while (true) { | ||
| if (current instanceof Sort) { | ||
| return true; |
There was a problem hiding this comment.
Shouldn't we also check that there is actually a LIMIT, as you can have a Sort without LIMIT and that rewrite I think is correct
There was a problem hiding this comment.
Thanks! It's a great idea! I agree with you. Since SqlToRelConverter#convertOrder removes the ORDER BY clause, I used RelBuilder to construct the test case.
There was a problem hiding this comment.
I wonder if this couldn't/shouldn't be generalized via a new metadata characterizing what makes this rewrite unsafe (and probably makes potentially other filter pushdowns unsound?
After reading the comments in Jira I am still trying to wrap my head around the problem and I think it's a more general pattern, but I couldn't spend much time thinking it through so I am just sharing my intuition hoping it helps.
I understand the LIMIT 2 unions up to a LIMIT 4 conceptually but it's an "at most 4" and non deterministic as discussed.
So what characterizes here a property of the union operands that makes the rewrite unsafe?
It's a sort+limit it seems, maybe this can become a metadata? Does this apply to other types of pushdowns like filter-sort transpose? Can they all be rewritten in terms of this new metadata?
I don't mean to block this effort until we figure this out. But I think we might have a more general fix.
There was a problem hiding this comment.
Of course, I agree with your suggestion; this likely involves the transfer of physical properties, and I will proceed to investigate that area next. I would personally be very grateful to receive any further suggestions. Thank you!
| final RelNode input = inputs.get(i).stripped(); | ||
| final Pair<RelNode, @Nullable RexNode> pair = extractSourceAndCond(input); | ||
| boolean isSorted = false; | ||
| final List<RelCollation> inputCollations = mq.collations(input); |
There was a problem hiding this comment.
@asolimando I use mq to check if the input is sorted. If the input is sorted, a sub-plan that has both limit/offset and sort is safe to be rewritten. Please check if this logic matches your suggestion.
There was a problem hiding this comment.
As suggested let's bring the discussion in Jira, I will try to summarize there my idea, let's see if it makes sense for everyone, thanks for the code draft @xiedeyantu!
|
|
So there is no doubt: -1 Three of us (me and Steven in Jira, Alessandro in the PR comments) have expressed doubts that this is even a bug. Let's not merge any fix until we decide what, if anything, needs to be fixed. |
@julianhyde Of course, I won't merge it before receiving approval from the committer. The conversation in the JIRA hasn't progressed for several days, so I assumed everyone agreed with this conservative approach. I would also like to thank Julian for the timely reminder. Perhaps I misunderstood. However, I agree with the solution described by Alessandro. This PR can be put on hold for now. Shall we move the discussion back to the JIRA? |



Jira Link
CALCITE-7463
Changes Proposed
The UnionToFilterRule produces incorrect results when applied to inputs that contain LIMIT.
Specifically, the rule incorrectly collapses:
into:
This transformation is not semantically equivalent.
Reproduction
Plan Before
Plan After (Incorrect)
Expected Behavior
The transformation should NOT be applied when any input of UNION contains LogicalSort(That contains ORDER BY, LIMIT, OFFSET).