Skip to content

[CALCITE-7484] Add a rule to eliminate redundant aggregates functions over GROUP BY keys#4903

Merged
xuzifu666 merged 1 commit intoapache:mainfrom
xuzifu666:agg_function_rule
Apr 29, 2026
Merged

[CALCITE-7484] Add a rule to eliminate redundant aggregates functions over GROUP BY keys#4903
xuzifu666 merged 1 commit intoapache:mainfrom
xuzifu666:agg_function_rule

Conversation

@xuzifu666
Copy link
Copy Markdown
Member

sql(sql).withRule(CoreRules.AGGREGATE_REDUCE_FUNCTIONS, CoreRules.PROJECT_MERGE).check();
}

/** Test case for
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at this PR #4808, we should create new test files for the rule tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! done.

* <li>{@code MIN}</li>
* <li>{@code AVG}</li>
* <li>{@code ANY_VALUE}</li>
* <li>{@code FIRST_VALUE}</li>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FIRST_VALUE and LAST_VALUE only work for window aggregates.

Copy link
Copy Markdown
Member Author

@xuzifu666 xuzifu666 Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should not support FIRST_VALUE/LAST_VALUE, I had removed them.

*/
@Test void testAggregateFunctionOfGroupByKeys() {
final String sql = "select sal, max(sal) as sal_max, min(sal) as sal_min,\n"
+ "avg(sal) sal_avg, any_value(sal) as sal_val, first_value(sal) as sal_first,\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query should be illegal, since FIRST_VALUE and LAST_VALUE are only defined for window aggregates. Please file an issue - this program should not be accepted.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this issue; I have already documented the relevant jira https://issues.apache.org/jira/browse/CALCITE-7485.

@xuzifu666 xuzifu666 changed the title [CALCITE-7484] Add AggregateFunctionOfGroupByKeysRule to eliminate redundant aggregates over GROUP BY keys [CALCITE-7484] Add a rule to eliminate redundant aggregates over GROUP BY keys Apr 22, 2026
@xuzifu666 xuzifu666 changed the title [CALCITE-7484] Add a rule to eliminate redundant aggregates over GROUP BY keys [CALCITE-7484] Add a rule to eliminate redundant aggregates functions over GROUP BY keys Apr 22, 2026
default:
return null;
}
final int groupIndex = aggregate.getGroupSet().asList().indexOf(arg);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- query 1
SELECT
    deptno,
    MAX(deptno + sal) AS "max(deptno + sal)"
FROM emp
GROUP BY deptno, sal
ORDER BY deptno;

-- output
 deptno | max(deptno + sal) 
--------+-------------------
     10 |           1310.00
     10 |           5010.00
     10 |           2460.00
     20 |           2995.00
     20 |           3020.00
     20 |            820.00
     20 |           1120.00
     30 |           1530.00
     30 |           1280.00
     30 |           1630.00
     30 |           2880.00
     30 |            980.00
(12 rows)

-- query 2
SELECT
    deptno,
    deptno + sal AS "deptno + sal"
FROM emp
GROUP BY deptno, sal
ORDER BY deptno;

-- output
 deptno | deptno + sal 
--------+--------------
     10 |      1310.00
     10 |      5010.00
     10 |      2460.00
     20 |      2995.00
     20 |      3020.00
     20 |       820.00
     20 |      1120.00
     30 |      1530.00
     30 |      1280.00
     30 |      1630.00
     30 |      2880.00
     30 |       980.00
(12 rows)

Could you rewrite complex expressions such as AggFunc(col1 + col2)or something similar?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!This scenario likely involves many branching scenarios and requires consideration of many factors. Perhaps we could first enable basic single-field optimization in this version, and I would document an issue to gradually improve the relevant rules.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can file 2 new issues:

  • handle aggregates whose argument is a deterministic function involving only constants and group keys
  • handle aggregates that return something other than the input, e.g., COUNT, STDDEV, STDDEV_POP, where results are constant (1, 0, NULL respectively)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions! I think they provide clear directions for further optimization.
I've filed two jira for future follow-up.(https://issues.apache.org/jira/browse/CALCITE-7492 and https://issues.apache.org/jira/browse/CALCITE-7493)

* @see CoreRules#AGGREGATE_FUNCTION_OF_GROUP_BY_KEYS
*/
@Value.Enclosing
public class AggregateFunctionOfGroupByKeysRule
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the name AggregateReduceFunctionsOnGroupKeysRule might be more appropriate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, done.

@xuzifu666
Copy link
Copy Markdown
Member Author

Hi @mihaibudiu , do you agree with the current changes? I have documented you mentioned window syntax issue separately in jira and will resolve it in coming days.

@xuzifu666 xuzifu666 requested a review from mihaibudiu April 27, 2026 06:20
default:
return null;
}
final int groupIndex = aggregate.getGroupSet().asList().indexOf(arg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can file 2 new issues:

  • handle aggregates whose argument is a deterministic function involving only constants and group keys
  • handle aggregates that return something other than the input, e.g., COUNT, STDDEV, STDDEV_POP, where results are constant (1, 0, NULL respectively)

@xuzifu666 xuzifu666 added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Apr 28, 2026
@xuzifu666
Copy link
Copy Markdown
Member Author

If there are no further comments, I will merge this PR in 24 hours.

@sonarqubecloud
Copy link
Copy Markdown

@xuzifu666 xuzifu666 merged commit 3505dba into apache:main Apr 29, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants