Skip to content

[CALCITE-7405] Pre-process expressions for correlations before building projection in SqlToRelConverter#4779

Open
ian-bertolacci wants to merge 1 commit intoapache:mainfrom
ian-bertolacci:CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList
Open

[CALCITE-7405] Pre-process expressions for correlations before building projection in SqlToRelConverter#4779
ian-bertolacci wants to merge 1 commit intoapache:mainfrom
ian-bertolacci:CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList

Conversation

@ian-bertolacci
Copy link
Copy Markdown

@ian-bertolacci ian-bertolacci commented Jan 28, 2026

To circumvent the bloat optimizations, it is necessary to provide the correlation variables (if they exist) to the builder.
The main change here is to accomplish the logic done in SqlToRelConverter.getCorrelations without having built the Project node.
In this way we (a) extract the correlation id to use for the RelBuilder, and (b) normalize/resolve the correlation ids and rewrite the expressions; all before invoking the RelBuilder.

I've done my best to reuse existing logic. SqlToRelConverter.getCorrelations and SqlToRelConverter.massageExpressionsForCorrelation require a lot of the same logic for resolving the correlation variables for the current scope, so that was moved to a common utility method (SqlToRelConverter.getCorrelationInfo).

Copy link
Copy Markdown
Member

@xuzifu666 xuzifu666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix CI error first.

Comment thread .gitignore Outdated
/.vscode/*

# IDEA
/out
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change .gitignore file?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run with intellij, git detected a ton of extra files.

@ian-bertolacci ian-bertolacci force-pushed the CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList branch 3 times, most recently from 3784f65 to 5314e24 Compare January 30, 2026 00:01
@sonarqubecloud
Copy link
Copy Markdown

@ian-bertolacci ian-bertolacci changed the title [CALCITE-7405] Use working Project input instead of Blackboard root when rebuilding Project for correlated subqueries [CALCITE-7405] Pre-process expressions for correlations before building projection in SqlToRelConverter Jan 30, 2026
@silundong
Copy link
Copy Markdown
Contributor

If we create a temporary Project with LogicalProject.create before correlation detection, instead of using RelBuilder.projectNamed, would that resolve the issue? That could be much simpler.

@ian-bertolacci
Copy link
Copy Markdown
Author

@silundong

If we create a temporary Project with LogicalProject.create before correlation detection, instead of using RelBuilder.projectNamed, would that resolve the issue? That could be much simpler.

Thats not an unreasonable suggestion, however I expect that change would also create a lot of test changes, both in this project and anything downstream.
I'll check and see (though it'll take me a few days).

@xiedeyantu
Copy link
Copy Markdown
Member

I prefer a simpler, more intuitive processing logic, even if it affects some of the original plans. As you mentioned, are there other places where similar issues arise, and would it be better to have a unified logic for handling them?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@calcite.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Mar 6, 2026
@ian-bertolacci
Copy link
Copy Markdown
Author

ian-bertolacci commented Mar 6, 2026

I'm hoping to get more reviews on this.

I prefer a simpler, more intuitive processing logic, even if it affects some of the original plans.

Thats fine, but I imagine that solution will take significant amount of time and effort (which I do not have for this project), and my preference (as I imagine is the preference of many others) is that we solve this severe bug we know about sooner, rather than wait for a better approach to correlation in (at a minimum) SqlToRelConverter.

@github-actions github-actions Bot removed the stale label Mar 7, 2026
@xiedeyantu xiedeyantu added the request review request a review from committers/contributors label Mar 29, 2026
@caicancai
Copy link
Copy Markdown
Member

I'm hoping to get more reviews on this.

I prefer a simpler, more intuitive processing logic, even if it affects some of the original plans.

Thats fine, but I imagine that solution will take significant amount of time and effort (which I do not have for this project), and my preference (as I imagine is the preference of many others) is that we solve this severe bug we know about sooner, rather than wait for a better approach to correlation in (at a minimum) SqlToRelConverter.

@ian-bertolacci Hello, are you still working on this project? I can take some time to review it. Do you want to continue?

@ian-bertolacci
Copy link
Copy Markdown
Author

Yes I would like it to be reviewed.

Comment thread .gitignore Outdated
/.vscode/*

# IDEA
/out
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.gitignore changes should be in a separate PR

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #4887

}

List<RexNode> newExprs = new ArrayList<>(exprs);
Consumer<RelNode> callback = (RelNode r) -> { };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback first assigns a null operation, which is then overridden within the if block. If someone later inserts code between the two assignments and uses the callback again, the incorrect (null operation) version will be used.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this final and assign the no-op in an else block.

/** Wrapper around optionally returned results from {@link #massageExpressionsForCorrelation}. */
private static class MassagedCorrelationExpressions {
/** Modified expressions. */
public final List<RexNode> exprs;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

@ian-bertolacci ian-bertolacci force-pushed the CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList branch 2 times, most recently from 101efb2 to 9a044af Compare April 15, 2026 16:56
@ian-bertolacci
Copy link
Copy Markdown
Author

Changes made and commit rebased to current main head

@ian-bertolacci ian-bertolacci force-pushed the CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList branch from 9a044af to 06fb75b Compare April 15, 2026 17:17
@ian-bertolacci
Copy link
Copy Markdown
Author

Where do we stand on this?

@mihaibudiu
Copy link
Copy Markdown
Contributor

@silundong I think you are one of the most qualified to review this, if you have time

@silundong
Copy link
Copy Markdown
Contributor

To circumvent the bloat optimizations, it is necessary to provide the correlation variables (if they exist) to the builder.

As I mentioned before, I create a temporary Project with LogicalProject.create before correlation detection, instead of using RelBuilder.projectNamed, just like:

RelNode project =
    LogicalProject.create(bb.root(), ImmutableList.of(), exprs, uniqueFieldNames,
        ImmutableSet.of());

final RelNode r;
final CorrelationUse p = getCorrelationUse(bb, project);
if (p != null) {
  assert p.r instanceof Project;
  // correlation variables have been normalized in p.r, we should use expressions
  // in p.r instead of the original exprs
  Project project1 = (Project) p.r;
  r = relBuilder.push(bb.root())
      .projectNamed(project1.getProjects(), uniqueFieldNames, true,
          ImmutableSet.of(p.id))
      .build();
} else {
  r = relBuilder.push(bb.root()).projectNamed(exprs, uniqueFieldNames, true).build();
}

This approach resolves the issue without causing any other test regressions. IMO, it's not only effective but also simpler and more intuitive. Do you see any drawbacks to it?

@ian-bertolacci
Copy link
Copy Markdown
Author

ian-bertolacci commented Apr 27, 2026

@silundong that makes sense, and the added tests pass with your changes. I will push a commit with them.

Sorry that I did not understand this when you first made this comment several months ago.

@ian-bertolacci ian-bertolacci force-pushed the CALCITE-7405-use-correct-input-for-convertNonAggregateSelectList branch from 06fb75b to 56de3f3 Compare April 27, 2026 17:29
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request review request a review from committers/contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants