Skip to content

Fix PPL CalciteException for non-ASCII string literals (e.g. Chinese characters)#5504

Open
gingeekrishna wants to merge 4 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21880-ppl-non-ascii-string-literal
Open

Fix PPL CalciteException for non-ASCII string literals (e.g. Chinese characters)#5504
gingeekrishna wants to merge 4 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21880-ppl-non-ascii-string-literal

Conversation

@gingeekrishna

@gingeekrishna gingeekrishna commented Jun 2, 2026

Copy link
Copy Markdown

Summary

Hi @dai-chen

PPL queries containing non-ASCII string literals (Chinese, Arabic, etc.) fail with a CalciteException on OpenSearch 3.6.0, while the identical query worked on 3.1 and the equivalent SQL query works fine on 3.6.0.

Root cause: In CalciteRexNodeVisitor.visitLiteral(), the STRING case builds a VARCHAR/CHAR type using typeFactory.createSqlType(SqlTypeName.VARCHAR) without specifying a charset. Calcite defaults to ISO-8859-1, which cannot encode non-Latin characters — causing the exception inside RexBuilder.makeLiteral()NlsString.<init>().

Fix: Explicitly create the type with UTF-8 charset and IMPLICIT collation via typeFactory.createTypeWithCharsetAndCollation() for both the CHAR(1) and VARCHAR branches of the STRING literal case.

org.apache.calcite.runtime.CalciteException: Failed to encode '未处置' in character set 'ISO-8859-1'
    at org.apache.calcite.util.NlsString.<init>(NlsString.java:155)
    at org.apache.calcite.rex.RexBuilder.clean(RexBuilder.java:2296)
    at org.apache.calcite.rex.RexBuilder.makeLiteral(RexBuilder.java:2070)
    at org.opensearch.sql.calcite.CalciteRexNodeVisitor.visitLiteral(CalciteRexNodeVisitor.java:127)

Changes

File Change
CalciteRexNodeVisitor.java Use UTF-8 charset when creating CHAR/VARCHAR types for string literals
CalciteRexNodeVisitorTest.java Add regression test with Chinese, Arabic, and single non-ASCII character literals

Test plan

  • testVisitLiteralNonAsciiStringDoesNotThrow — verifies Chinese (未处置), Arabic (مرحبا), and single non-ASCII char () literals build successfully without throwing CalciteException
  • All existing CalciteRexNodeVisitorTest tests continue to pass

Fixes opensearch-project/OpenSearch#21880

visitLiteral() built VARCHAR/CHAR types using
typeFactory.createSqlType(SqlTypeName.VARCHAR) without specifying a
charset. Calcite defaults to ISO-8859-1, which cannot encode non-Latin
characters, causing a CalciteException at query time.

Fix: explicitly create the type with UTF-8 charset and IMPLICIT collation
via typeFactory.createTypeWithCharsetAndCollation() for both the CHAR(1)
and VARCHAR branches of the STRING literal case.

This is a regression introduced in 3.6.0 when the PPL/Calcite
integration was added. SQL queries were unaffected because the SQL path
uses a different literal-building flow.

Fixes opensearch-project/OpenSearch#21880

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an explicit UTF-8 charset/collation when producing Calcite string literals to prevent non-ASCII literals from throwing, and introduces a regression test for the reported failure.

Changes:

  • Update visitLiteral to build CHAR/VARCHAR types with UTF-8 charset and implicit collation.
  • Add a regression test covering Chinese/Arabic literals and the CHAR(1) path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java Forces UTF-8 charset/collation for string literals to avoid Calcite NlsString rejection of non-ASCII.
core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.java Adds regression coverage for non-ASCII string literal visitation and CHAR(1) behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread core/src/test/java/org/opensearch/sql/calcite/CalciteRexNodeVisitorTest.java Outdated
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit c54a4c2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 9f484de

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent double charset application

The method calls super.createSqlType(typeName) which now invokes the overridden
createSqlType(SqlTypeName) that already applies UTF-8 charset for VARCHAR/CHAR. This
causes double application of charset/collation settings, potentially creating
inconsistent type hierarchies. Use super.createSqlType(typeName) from the parent
class directly to avoid the override.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java [117-123]

 public RelDataType createSqlType(SqlTypeName typeName, boolean nullable) {
-  RelDataType type = createTypeWithNullability(super.createSqlType(typeName), nullable);
+  RelDataType baseType = super.createSqlType(typeName);
+  RelDataType type = createTypeWithNullability(baseType, nullable);
   if (typeName == SqlTypeName.VARCHAR || typeName == SqlTypeName.CHAR) {
     return createTypeWithCharsetAndCollation(type, StandardCharsets.UTF_8, SqlCollation.IMPLICIT);
   }
   return type;
 }
Suggestion importance[1-10]: 8

__

Why: Valid concern about potential double application of charset settings. When createSqlType(SqlTypeName, boolean) calls super.createSqlType(typeName), it may invoke the overridden createSqlType(SqlTypeName) method, causing charset to be applied twice for VARCHAR/CHAR types, which could lead to inconsistent type hierarchies.

Medium

Previous suggestions

Suggestions up to commit cd5d733
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate single character for CHAR type

The CHAR type creation should validate that the string length is exactly 1 before
creating the type. If value.toString() somehow produces a multi-character string
despite the length check, this could cause inconsistencies between the type
definition and actual data.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [141-146]

+String strValue = value.toString();
+if (strValue.length() != 1) {
+    throw new IllegalStateException("Expected single character for CHAR type, got: " + strValue.length());
+}
 return rexBuilder.makeLiteral(
-    value.toString(),
+    strValue,
     typeFactory.createTypeWithCharsetAndCollation(
         typeFactory.createSqlType(SqlTypeName.CHAR),
         StandardCharsets.UTF_8,
         SqlCollation.IMPLICIT));
Suggestion importance[1-10]: 3

__

Why: The suggestion adds defensive validation, but the code already checks value.toString().length() == 1 at line 138 before entering this branch. Adding redundant validation would be unnecessary and reduce code readability. The suggestion overlooks the existing guard condition.

Low
Suggestions up to commit 9e379cd
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use VARCHAR for single-character strings

The single-character string handling creates a CHAR type, but multi-byte UTF-8
characters (like Chinese) may require more than one byte. Consider using VARCHAR for
all strings to avoid potential truncation or encoding issues with non-ASCII single
characters.

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java [141-146]

 return rexBuilder.makeLiteral(
     value.toString(),
     typeFactory.createTypeWithCharsetAndCollation(
-        typeFactory.createSqlType(SqlTypeName.CHAR),
+        typeFactory.createSqlType(SqlTypeName.VARCHAR),
         StandardCharsets.UTF_8,
-        SqlCollation.IMPLICIT));
+        SqlCollation.IMPLICIT),
+    true);
Suggestion importance[1-10]: 3

__

Why: While the concern about multi-byte UTF-8 characters is valid, the PR explicitly uses UTF-8 charset which handles multi-byte characters correctly. The CHAR vs VARCHAR distinction is intentional per the comment "To align Spark/PostgreSQL, Char(1) is useful, such as cast('1' to boolean) should return true". The test at line 114-117 confirms single-character handling works correctly with UTF-8.

Low

- Remove unused realRexBuilder variable (context.rexBuilder is already
  a real ExtendedRexBuilder backed by TYPE_FACTORY via the constructor)
- Add charset assertions to verify resulting RelDataType carries UTF-8,
  so future accidental charset drops are caught
- Remove unused RexBuilder import

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cd5d733

@lukeyan2023

Copy link
Copy Markdown

I tried to cherry-pick this PR locally, but ran into the following compilation errors:

FAIL: /workspace/sql/doctest/../docs/user/ppl/cmd/eval.md
Doctest: /workspace/sql/doctest/../docs/user/ppl/cmd/eval.md

Traceback (most recent call last):
File "/usr/lib/python3.12/doctest.py", line 2249, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for /workspace/sql/doctest/../docs/user/ppl/cmd/eval.md
File "/workspace/sql/doctest/../docs/user/ppl/cmd/eval.md", line 0


File "/workspace/sql/doctest/../docs/user/ppl/cmd/eval.md", line 106, in /workspace/sql/doctest/../docs/user/ppl/cmd/eval.md
Failed example:
ppl_cmd.process("source=accounts | eval greeting = 'Hello ' + firstname | fields firstname, greeting")
Expected:
fetched rows / total rows = 4/4
+-----------+---------------+
| firstname | greeting |
|-----------+---------------|
| Amber | Hello Amber |
| Hattie | Hello Hattie |
| Nanette | Hello Nanette |
| Dale | Hello Dale |
+-----------+---------------+
Got:
{'reason': 'Invalid Query', 'details': 'VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to VARCHAR', 'type': 'SqlValidatorException'}
Error: Query returned no data

File "/workspace/sql/doctest/../docs/user/ppl/cmd/eval.md", line 131, in /workspace/sql/doctest/../docs/user/ppl/cmd/eval.md
Failed example:
ppl_cmd.process("source=accounts | eval full_info = 'Name: ' + firstname + ', Age: ' + CAST(age AS STRING) | fields firstname, age, full_info")
Expected:
fetched rows / total rows = 4/4
+-----------+-----+------------------------+
| firstname | age | full_info |
|-----------+-----+------------------------|
| Amber | 32 | Name: Amber, Age: 32 |
| Hattie | 36 | Name: Hattie, Age: 36 |
| Nanette | 28 | Name: Nanette, Age: 28 |
| Dale | 33 | Name: Dale, Age: 33 |
+-----------+-----+------------------------+
Got:
{'reason': 'Invalid Query', 'details': 'VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to VARCHAR', 'type': 'SqlValidatorException'}
Error: Query returned no data


Ran 25 tests in 45.981s

FAILED (failures=1)

Task :doctest:doctest FAILED
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended

Task :core:jacocoTestReport
[ant:jacocoReport] Classes in bundle 'core' do not match with execution data. For report generation the same class files must be used as at runtime.
[ant:jacocoReport] Execution data for class org/opensearch/sql/utils/YamlFormatter does not match.

[Incubating] Problems report is available at: file:///workspace/sql/build/reports/problems/problems-report.html

FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':doctest:doctest'.

Process 'command '/workspace/sql/doctest/bin/test-docs'' finished with non-zero exit value 1

  • Try:

Run with --stacktrace option to get the stack trace.
Run with --info or --debug option to get more log output.
Run with --scan to generate a Build Scan (powered by Develocity).
Get more help at https://help.gradle.org.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 10.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/9.2.0/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD FAILED in 9m 42s

The previous fix added UTF-8 charset only to string literals in
visitLiteral(), leaving column VARCHAR types with no charset. Calcite
then rejected string concatenation (e.g. 'Hello ' + firstname) with:
VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to VARCHAR

Fix: move the UTF-8 + IMPLICIT collation enforcement into
OpenSearchTypeFactory.createSqlType() for VARCHAR/CHAR so both column
types and literal types carry the same charset consistently.
visitLiteral() reverts to plain createSqlType() calls since the factory
now handles encoding globally.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9f484de

@gingeekrishna

Copy link
Copy Markdown
Author

Thanks for catching this, @lukeyan2023! The root cause was that the original fix only applied UTF-8 charset to string literals but left column VARCHAR types as plain VARCHAR (no charset). Calcite rejects concatenation between VARCHAR CHARACTER SET "UTF-8" and VARCHAR as incompatible.

The fix moves the UTF-8 enforcement into OpenSearchTypeFactory.createSqlType() for VARCHAR/CHAR types, so both column types and literal types carry the same charset consistently. visitLiteral() now just calls createSqlType() as before — the factory handles encoding globally.

Updated the branch — please let me know if the doctest passes on your end now.

@lukeyan2023

Copy link
Copy Markdown

@gingeekrishna I pulled the latest changes and tested it locally again, but unfortunately, it's still failing with the errors below:
[2026-06-08T08:25:15,252][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [d50e0ad9-033a-43fc-933b-18d3ee1582ec] Incoming request source=table | bin identifier span=*** | fields + identifier | head 2
[2026-06-08T08:25:15,447][ERROR][o.o.s.p.r.RestPPLQueryAction] [docTestCluster-0] Error happened during query handling
org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945)
at org.apache.calcite.rex.RexCallBinding.newError(RexCallBinding.java:155)
at org.apache.calcite.sql.type.ReturnTypes.lambda$static$18(ReturnTypes.java:1127)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:59)
at org.apache.calcite.sql.fun.SqlStdOperatorTable.lambda$static$1(SqlStdOperatorTable.java:278)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:66)
at org.apache.calcite.sql.SqlOperator.inferReturnType(SqlOperator.java:562)
at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:364)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:763)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:770)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:741)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:39)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:19)
at org.opensearch.sql.calcite.utils.binning.handlers.LogSpanHelper.createLogSpanExpression(LogSpanHelper.java:75)
at org.opensearch.sql.calcite.utils.binning.handlers.SpanBinHandler.handleNumericOrLogSpan(SpanBinHandler.java:85)
at org.opensearch.sql.calcite.utils.binning.handlers.SpanBinHandler.createExpression(SpanBinHandler.java:42)
at org.opensearch.sql.calcite.utils.BinUtils.createBinExpression(BinUtils.java:35)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:972)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Bin.accept(Bin.java:55)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:732)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Head.accept(Head.java:44)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.analyze(CalciteRelNodeVisitor.java:204)
at org.opensearch.sql.executor.QueryService.analyze(QueryService.java:281)
at org.opensearch.sql.executor.QueryService.lambda$executeWithCalcite$0(QueryService.java:146)
at org.opensearch.sql.calcite.CalcitePlanContext.run(CalcitePlanContext.java:158)
at org.opensearch.sql.executor.QueryService.executeWithCalcite(QueryService.java:135)
at org.opensearch.sql.executor.QueryService.execute(QueryService.java:101)
at org.opensearch.sql.executor.execution.QueryPlan.execute(QueryPlan.java:82)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$schedule$1(OpenSearchQueryManager.java:84)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$withCurrentContext$2(OpenSearchQueryManager.java:111)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:952)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:605)
... 49 more
[2026-06-08T08:25:15,494][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [0557a9fd-601d-4145-bc32-8d4d99b6bc1d] Incoming request source=table | bin identifier span=*** | fields + identifier | head 3
[2026-06-08T08:25:15,498][ERROR][o.o.s.p.r.RestPPLQueryAction] [docTestCluster-0] Error happened during query handling
org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945)
at org.apache.calcite.rex.RexCallBinding.newError(RexCallBinding.java:155)
at org.apache.calcite.sql.type.ReturnTypes.lambda$static$18(ReturnTypes.java:1127)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:59)
at org.apache.calcite.sql.fun.SqlStdOperatorTable.lambda$static$1(SqlStdOperatorTable.java:278)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:66)
at org.apache.calcite.sql.SqlOperator.inferReturnType(SqlOperator.java:562)
at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:364)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:763)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:770)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:741)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:39)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:19)
at org.opensearch.sql.calcite.utils.binning.handlers.LogSpanHelper.createLogSpanExpression(LogSpanHelper.java:75)
at org.opensearch.sql.calcite.utils.binning.handlers.SpanBinHandler.handleNumericOrLogSpan(SpanBinHandler.java:85)
at org.opensearch.sql.calcite.utils.binning.handlers.SpanBinHandler.createExpression(SpanBinHandler.java:42)
at org.opensearch.sql.calcite.utils.BinUtils.createBinExpression(BinUtils.java:35)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:972)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Bin.accept(Bin.java:55)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:732)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Head.accept(Head.java:44)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.analyze(CalciteRelNodeVisitor.java:204)
at org.opensearch.sql.executor.QueryService.analyze(QueryService.java:281)
at org.opensearch.sql.executor.QueryService.lambda$executeWithCalcite$0(QueryService.java:146)
at org.opensearch.sql.calcite.CalcitePlanContext.run(CalcitePlanContext.java:158)
at org.opensearch.sql.executor.QueryService.executeWithCalcite(QueryService.java:135)
at org.opensearch.sql.executor.QueryService.execute(QueryService.java:101)
at org.opensearch.sql.executor.execution.QueryPlan.execute(QueryPlan.java:82)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$schedule$1(OpenSearchQueryManager.java:84)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$withCurrentContext$2(OpenSearchQueryManager.java:111)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:952)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:605)
... 49 more
[2026-06-08T08:25:15,507][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [6d3cff0f-f72f-4242-9871-c27385532d4b] Incoming request source=table | bin identifier bins=*** | fields + identifier | head 3
[2026-06-08T08:25:15,672][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [0b175f4c-2cde-4c9f-b67a-ed4a5386ee11] Incoming request source=table | bin identifier bins=*** | fields + identifier | head 1
[2026-06-08T08:25:15,720][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [3596cab3-ea90-46a4-8511-e07f2e8c0a6d] Incoming request source=table | bin identifier bins=*** | fields + identifier,identifier | head 3
[2026-06-08T08:25:15,786][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [c7ffff95-2934-413c-9926-6731d933779c] Incoming request source=table | bin identifier minspan=*** | fields + identifier,identifier | head 3
[2026-06-08T08:25:16,015][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [a0f72057-0756-460e-9c55-e651038db60a] Incoming request source=table | bin identifier minspan=*** | fields + identifier | head 1
[2026-06-08T08:25:16,076][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [c0783d20-a606-48ed-bade-ba287fd85848] Incoming request source=table | bin identifier start=*** end=*** | fields + identifier | head 1
[2026-06-08T08:25:16,155][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [33211fc6-73a2-47e8-b10e-cf11ce878eb1] Incoming request source=table | bin identifier start=*** end=*** | fields + identifier | head 1
[2026-06-08T08:25:16,217][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [5d3d56c2-7131-4a37-ae76-4d2fa15e9678] Incoming request source=table | bin identifier span=*** | fields + identifier | head 6
[2026-06-08T08:25:16,272][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [86eecd05-aa1f-425c-b5a4-f6fe5445e1c1] Incoming request source=table | bin time_identifier span=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,462][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [66101f90-26f9-4d16-afa3-b1227d02dd16] Incoming request source=table | bin time_identifier span=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,521][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [04521cc0-37c5-4654-a3a0-57f417b27de9] Incoming request source=table | bin time_identifier span=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,568][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [48e59bca-ea93-4732-b8dd-ebdc0a79fdd8] Incoming request source=table | bin time_identifier span=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,626][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [12ec86fb-ea28-42f0-a3b2-67c49cc3e65a] Incoming request source=table | bin time_identifier span=*** aligntime=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,680][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [51de4c82-6382-4946-bce6-cc5db9181471] Incoming request source=table | bin time_identifier span=*** aligntime=*** | fields + time_identifier,identifier | head 3
[2026-06-08T08:25:16,732][INFO ][o.o.s.p.PPLService ] [docTestCluster-0] [3259766d-1fe0-4a35-a62b-87577306bf09] Incoming request source=table | bin identifier | fields + identifier,identifier | head 3
[2026-06-08T08:25:16,735][ERROR][o.o.s.p.r.RestPPLQueryAction] [docTestCluster-0] Error happened during query handling
org.apache.calcite.runtime.CalciteContextException: At line 0, column 0: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:960)
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945)
at org.apache.calcite.rex.RexCallBinding.newError(RexCallBinding.java:155)
at org.apache.calcite.sql.type.ReturnTypes.lambda$static$18(ReturnTypes.java:1127)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:59)
at org.apache.calcite.sql.fun.SqlStdOperatorTable.lambda$static$1(SqlStdOperatorTable.java:278)
at org.apache.calcite.sql.type.SqlTypeTransformCascade.inferReturnType(SqlTypeTransformCascade.java:66)
at org.apache.calcite.sql.SqlOperator.inferReturnType(SqlOperator.java:562)
at org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:364)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:763)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:770)
at org.apache.calcite.tools.RelBuilder.call(RelBuilder.java:741)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:39)
at org.opensearch.sql.calcite.utils.binning.RangeFormatter.createRangeString(RangeFormatter.java:19)
at org.opensearch.sql.calcite.utils.binning.handlers.DefaultBinHandler.createNumericDefaultBinning(DefaultBinHandler.java:70)
at org.opensearch.sql.calcite.utils.binning.handlers.DefaultBinHandler.createExpression(DefaultBinHandler.java:42)
at org.opensearch.sql.calcite.utils.BinUtils.createBinExpression(BinUtils.java:35)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:972)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitBin(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Bin.accept(Bin.java:55)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:732)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitHead(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Head.accept(Head.java:44)
at org.opensearch.sql.ast.AbstractNodeVisitor.visitChildren(AbstractNodeVisitor.java:117)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitChildren(CalciteRelNodeVisitor.java:209)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:432)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.visitProject(CalciteRelNodeVisitor.java:189)
at org.opensearch.sql.ast.tree.Project.accept(Project.java:65)
at org.opensearch.sql.calcite.CalciteRelNodeVisitor.analyze(CalciteRelNodeVisitor.java:204)
at org.opensearch.sql.executor.QueryService.analyze(QueryService.java:281)
at org.opensearch.sql.executor.QueryService.lambda$executeWithCalcite$0(QueryService.java:146)
at org.opensearch.sql.calcite.CalcitePlanContext.run(CalcitePlanContext.java:158)
at org.opensearch.sql.executor.QueryService.executeWithCalcite(QueryService.java:135)
at org.opensearch.sql.executor.QueryService.execute(QueryService.java:101)
at org.opensearch.sql.executor.execution.QueryPlan.execute(QueryPlan.java:82)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$schedule$1(OpenSearchQueryManager.java:84)
at org.opensearch.sql.opensearch.executor.OpenSearchQueryManager.lambda$withCurrentContext$2(OpenSearchQueryManager.java:111)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:952)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: VARCHAR CHARACTER SET "UTF-8" NOT NULL is not comparable to CHAR(1) NOT NULL
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:511)
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:605)
... 48 more

The previous fix patched createSqlType() for the no-arg and boolean
variants, but Calcite has many code paths for char type creation:
  - createSqlType(SqlTypeName, int precision)
  - RexBuilder.makeLiteral(String) → getDefaultCharset()
  - RelBuilder.literal(String) → getDefaultCharset()

All of these bypassed the per-method overrides, causing residual
'VARCHAR CHARACTER SET UTF-8 is not comparable to CHAR(1)' errors
in RangeFormatter and other callers (e.g. bin command).

Fix: override getDefaultCharset() in OpenSearchTypeFactory to return
UTF-8. This is the single source of truth Calcite uses across all
char type creation paths, making every VARCHAR/CHAR consistently
UTF-8 without needing per-call patches.

The per-method createSqlType overrides are removed as redundant.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c54a4c2

@gingeekrishna

gingeekrishna commented Jun 8, 2026

Copy link
Copy Markdown
Author

@lukeyan2023 Thanks for catching the second failure!

Root cause: The bin command's RangeFormatter.createRangeString calls relBuilder.literal(BinConstants.DASH_SEPARATOR) (the "-" string). This goes through RexBuilder.makeLiteral(String)typeFactory.getDefaultCharset(), which returns ISO-8859-1 by default in JavaTypeFactoryImpl, so the "-" literal gets type CHAR(1) with no UTF-8 charset — making it incompatible with VARCHAR CHARACTER SET "UTF-8" from the cast.

Similarly, relBuilder.cast(binValue, SqlTypeName.VARCHAR) creates a VARCHAR type through SqlTypeFactoryImpl.createSqlType()SqlTypeUtil.addCharsetAndCollation()typeFactory.getDefaultCharset().

Fix: Rather than patching each call site individually, I've overridden getDefaultCharset() in OpenSearchTypeFactory to return StandardCharsets.UTF_8. Since Calcite's SqlTypeUtil.addCharsetAndCollation() always calls typeFactory.getDefaultCharset() for all char type creation paths — createSqlType, makeLiteral, and cast — this single override ensures UTF-8 charset consistency across the entire type system.

The branch has been updated. The previous per-method createSqlType override is removed since it's now redundant.

@lukeyan2023

lukeyan2023 commented Jun 8, 2026

Copy link
Copy Markdown

@gingeekrishna I'm still unable to complete a successful local build. The test suite is consistently failing。However, the error messages from the failing tests also appear to be tied to this issue:
CalcitePPLTransposeTest > testTransposeWithLimitColumnName FAILED
java.lang.AssertionError:
Expected: is "LogicalProject(column_names=[$0], row 1=[$1], row 2=[$2], row 3=[$3])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4])\n LogicalProject(value=[CAST($6):VARCHAR NOT NULL], $f7=[TRIM(FLAG(BOTH), ' ', $5)], $f8=[=($4, 1)], $f9=[=($4, 2)], $f10=[=($4, 3)])\n LogicalFilter(condition=[IS NOT NULL($6)])\n LogicalProject(ENAME=[$0], COMM=[$1], JOB=[$2], SAL=[$3], row_number_transpose=[$4], column_names=[$5], value=[CASE(=($5, 'ENAME'), CAST($0):VARCHAR NOT NULL, =($5, 'COMM'), NUMBER_TO_STRING($1), =($5, 'JOB'), CAST($2):VARCHAR NOT NULL, =($5, 'SAL'), NUMBER_TO_STRING($3), null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(ENAME=[$1], COMM=[$6], JOB=[$2], SAL=[$5], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ 'ENAME' }, { 'COMM' }, { 'JOB' }, { 'SAL' }]])\n"
but: was "LogicalProject(column_names=[$0], row 1=[$1], row 2=[$2], row 3=[$3])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4])\n LogicalProject(value=[CAST($6):VARCHAR CHARACTER SET "UTF-8" NOT NULL], $f7=[TRIM(FLAG(BOTH), _UTF-8' ', $5)], $f8=[=($4, 1)], $f9=[=($4, 2)], $f10=[=($4, 3)])\n LogicalFilter(condition=[IS NOT NULL($6)])\n LogicalProject(ENAME=[$0], COMM=[$1], JOB=[$2], SAL=[$3], row_number_transpose=[$4], column_names=[$5], value=[CASE(=($5, _UTF-8'ENAME'), CAST($0):VARCHAR CHARACTER SET "UTF-8" NOT NULL, =($5, _UTF-8'COMM'), NUMBER_TO_STRING($1), =($5, _UTF-8'JOB'), CAST($2):VARCHAR CHARACTER SET "UTF-8" NOT NULL, =($5, _UTF-8'SAL'), NUMBER_TO_STRING($3), null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(ENAME=[$1], COMM=[$6], JOB=[$2], SAL=[$5], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ _UTF-8'ENAME' }, { _UTF-8'COMM' }, { _UTF-8'JOB' }, { _UTF-8'SAL' }]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLTransposeTest.testTransposeWithLimitColumnName(CalcitePPLTransposeTest.java:225)

CalcitePPLSearchTest > testSearchWithFilter FAILED
java.lang.AssertionError:
Expected: is "LogicalFilter(condition=[query_string(MAP('query', 'DEPTNO:20':VARCHAR))])\n LogicalTableScan(table=[[scott, EMP]])\n"
but: was "LogicalFilter(condition=[query_string(MAP(_UTF-8'query', _UTF-8'DEPTNO:20':VARCHAR CHARACTER SET "UTF-8"))])\n LogicalTableScan(table=[[scott, EMP]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLSearchTest.testSearchWithFilter(CalcitePPLSearchTest.java:50)

CalcitePPLSearchTest > testSearchWithoutTimestampShouldThrow SKIPPED

CalcitePPLSearchTest > testSearchWithAbsoluteTimeRange FAILED
java.lang.AssertionError:
Expected: is "LogicalFilter(condition=[query_string(MAP('query', '(@timestamp:>=2020\-10\-11T00\:00\:00Z) AND (@timestamp:<=2025\-01\-01T00\:00\:00Z)':VARCHAR))])\n LogicalTableScan(table=[[scott, LOGS]])\n"
but: was "LogicalFilter(condition=[query_string(MAP(_UTF-8'query', _UTF-8'(@timestamp:>=2020\-10\-11T00\:00\:00Z) AND (@timestamp:<=2025\-01\-01T00\:00\:00Z)':VARCHAR CHARACTER SET "UTF-8"))])\n LogicalTableScan(table=[[scott, LOGS]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLSearchTest.testSearchWithAbsoluteTimeRange(CalcitePPLSearchTest.java:76)

CalcitePPLTransposeTest > testSimpleCountWithTranspose FAILED
java.lang.AssertionError:
Expected: is "LogicalProject(column=[$0], row 1=[$1], row 2=[$2], row 3=[$3], row 4=[$4], row 5=[$5])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4], row 4_null=[MAX($0) FILTER $5], row 5_null=[MAX($0) FILTER $6])\n LogicalProject(value=[CAST($3):VARCHAR NOT NULL], $f4=[TRIM(FLAG(BOTH), ' ', $2)], $f5=[=($1, 1)], $f6=[=($1, 2)], $f7=[=($1, 3)], $f8=[=($1, 4)], $f9=[=($1, 5)])\n LogicalFilter(condition=[IS NOT NULL($3)])\n LogicalProject(c=[$0], row_number_transpose=[$1], column=[$2], value=[CASE(=($2, 'c'), CAST($0):VARCHAR NOT NULL, null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(c=[$0], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalAggregate(group=[{}], c=[COUNT()])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ 'c' }]])\n"
but: was "LogicalProject(column=[$0], row 1=[$1], row 2=[$2], row 3=[$3], row 4=[$4], row 5=[$5])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4], row 4_null=[MAX($0) FILTER $5], row 5_null=[MAX($0) FILTER $6])\n LogicalProject(value=[CAST($3):VARCHAR CHARACTER SET "UTF-8" NOT NULL], $f4=[TRIM(FLAG(BOTH), _UTF-8' ', $2)], $f5=[=($1, 1)], $f6=[=($1, 2)], $f7=[=($1, 3)], $f8=[=($1, 4)], $f9=[=($1, 5)])\n LogicalFilter(condition=[IS NOT NULL($3)])\n LogicalProject(c=[$0], row_number_transpose=[$1], column=[$2], value=[CASE(=($2, _UTF-8'c'), CAST($0):VARCHAR CHARACTER SET "UTF-8" NOT NULL, null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(c=[$0], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalAggregate(group=[{}], c=[COUNT()])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ _UTF-8'c' }]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLTransposeTest.testSimpleCountWithTranspose(CalcitePPLTransposeTest.java:38)

CalcitePPLTrendlineTest > testTrendlineMultipleFields PASSED

CalcitePPLTransposeTest > testMultipleAggregatesWithAliasesTranspose FAILED
java.lang.AssertionError:
Expected: is "LogicalProject(column=[$0], row 1=[$1], row 2=[$2], row 3=[$3], row 4=[$4], row 5=[$5])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4], row 4_null=[MAX($0) FILTER $5], row 5_null=[MAX($0) FILTER $6])\n LogicalProject(value=[CAST($6):VARCHAR NOT NULL], $f7=[TRIM(FLAG(BOTH), ' ', $5)], $f8=[=($4, 1)], $f9=[=($4, 2)], $f10=[=($4, 3)], $f11=[=($4, 4)], $f12=[=($4, 5)])\n LogicalFilter(condition=[IS NOT NULL($6)])\n LogicalProject(avg_sal=[$0], max_sal=[$1], min_sal=[$2], cnt=[$3], row_number_transpose=[$4], column=[$5], value=[CASE(=($5, 'avg_sal'), NUMBER_TO_STRING($0), =($5, 'max_sal'), NUMBER_TO_STRING($1), =($5, 'min_sal'), NUMBER_TO_STRING($2), =($5, 'cnt'), CAST($3):VARCHAR NOT NULL, null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(avg_sal=[$0], max_sal=[$1], min_sal=[$2], cnt=[$3], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalAggregate(group=[{}], avg_sal=[AVG($0)], max_sal=[MAX($0)], min_sal=[MIN($0)], cnt=[COUNT()])\n LogicalProject(SAL=[$5])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ 'avg_sal' }, { 'max_sal' }, { 'min_sal' }, { 'cnt' }]])\n"
but: was "LogicalProject(column=[$0], row 1=[$1], row 2=[$2], row 3=[$3], row 4=[$4], row 5=[$5])\n LogicalAggregate(group=[{1}], row 1_null=[MAX($0) FILTER $2], row 2_null=[MAX($0) FILTER $3], row 3_null=[MAX($0) FILTER $4], row 4_null=[MAX($0) FILTER $5], row 5_null=[MAX($0) FILTER $6])\n LogicalProject(value=[CAST($6):VARCHAR CHARACTER SET "UTF-8" NOT NULL], $f7=[TRIM(FLAG(BOTH), _UTF-8' ', $5)], $f8=[=($4, 1)], $f9=[=($4, 2)], $f10=[=($4, 3)], $f11=[=($4, 4)], $f12=[=($4, 5)])\n LogicalFilter(condition=[IS NOT NULL($6)])\n LogicalProject(avg_sal=[$0], max_sal=[$1], min_sal=[$2], cnt=[$3], row_number_transpose=[$4], column=[$5], value=[CASE(=($5, _UTF-8'avg_sal'), NUMBER_TO_STRING($0), =($5, _UTF-8'max_sal'), NUMBER_TO_STRING($1), =($5, _UTF-8'min_sal'), NUMBER_TO_STRING($2), =($5, _UTF-8'cnt'), CAST($3):VARCHAR CHARACTER SET "UTF-8" NOT NULL, null:NULL)])\n LogicalJoin(condition=[true], joinType=[inner])\n LogicalProject(avg_sal=[$0], max_sal=[$1], min_sal=[$2], cnt=[$3], row_number_transpose=[ROW_NUMBER() OVER ()])\n LogicalAggregate(group=[{}], avg_sal=[AVG($0)], max_sal=[MAX($0)], min_sal=[MIN($0)], cnt=[COUNT()])\n LogicalProject(SAL=[$5])\n LogicalTableScan(table=[[scott, EMP]])\n LogicalValues(tuples=[[{ _UTF-8'avg_sal' }, { _UTF-8'max_sal' }, { _UTF-8'min_sal' }, { _UTF-8'cnt' }]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLTransposeTest.testMultipleAggregatesWithAliasesTranspose(CalcitePPLTransposeTest.java:88)

It seems these errors are caused by the test cases having hardcoded expected logical plan outputs. After setting the default charset to utf8, the actual logical plan no longer matches the one expected by the test cases.

@gingeekrishna

Copy link
Copy Markdown
Author

Thanks for investigating! The test failures are platform-specific and won't affect CI.

The CI runs on Ubuntu with Java 21, where Charset.defaultCharset() always returns UTF-8 (since Java 18+). In that environment, Util.getDefaultCharset() = UTF-8 = the charset our override sets, so Calcite's BasicSqlType.generateTypeString() suppresses the CHARACTER SET "UTF-8" annotation from the plan (it only shows the charset when it differs from the JVM default). Plan strings are therefore identical to before on CI — existing tests pass unchanged.

On a system with a non-UTF-8 JVM default (e.g. Windows with Java 17 or earlier), the annotation becomes visible, which is what you're seeing. The CI tests are not affected.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit c54a4c2

@lukeyan2023

Copy link
Copy Markdown

@gingeekrishna Hello, I tried building again on Ubuntu 22.04 with JDK 21, but the following errors still occur. Am I missing some configuration steps?
CalcitePPLNoMvTest > testNoMvBasic FAILED
java.lang.AssertionError:
Expected: is "LogicalProject(arr=[$8])\n LogicalSort(fetch=[1])\n LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], arr=[COALESCE(ARRAY_JOIN(ARRAY_COMPACT(array('web':VARCHAR, 'production':VARCHAR, 'east':VARCHAR)), '\n'), '':VARCHAR)])\n LogicalTableScan(table=[[scott, EMP]])\n"
but: was "LogicalProject(arr=[$8])\n LogicalSort(fetch=[1])\n LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], arr=[COALESCE(ARRAY_JOIN(ARRAY_COMPACT(array(_UTF-8'web':VARCHAR CHARACTER SET "UTF-8", _UTF-8'production':VARCHAR CHARACTER SET "UTF-8", _UTF-8'east':VARCHAR CHARACTER SET "UTF-8")), _UTF-8'\n'), _UTF-8'':VARCHAR CHARACTER SET "UTF-8")])\n LogicalTableScan(table=[[scott, EMP]])\n"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
at org.opensearch.sql.ppl.calcite.CalcitePPLAbstractTest.verifyLogical(CalcitePPLAbstractTest.java:162)
at org.opensearch.sql.ppl.calcite.CalcitePPLNoMvTest.testNoMvBasic(CalcitePPLNoMvTest.java:62)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] PPL CalciteException: Failed to encode Chinese characters in ISO-8859-1 on 3.6.0 (works on 3.1)

3 participants