Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1889,10 +1889,22 @@ public RelNode visitJoin(Join node, CalcitePlanContext context) {
}
} else {
// The join-with-criteria grammar doesn't allow empty join condition
RexNode joinCondition =
node.getJoinCondition()
.map(c -> rexVisitor.analyzeJoinCondition(c, context))
.orElse(context.relBuilder.literal(true));
RexNode joinCondition;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with a mixed condition like on a AND b.x = c.y ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this, collectBareFields returns Optional.empty(), since the Compare node isn't a bare field), so the whole expression falls through to analyzeJoinCondition unchanged. I added two tests (testJoinMixedConditionFallsThrough and testJoinMixedConditionWithInequality) to prove it produces the correct plan via the normal path.

List<String> bareFields = new ArrayList<>();
if (JoinAndLookupUtils.collectBareFields(node.getJoinCondition().get(), bareFields)) {
// Bare-field shorthand `on a [AND b ...]`. Resolving by stack position rather than by
// qualifier keeps a self-join `join on f t` a real cross-scan equi-join, not f = f.
joinCondition =
bareFields.stream()
.map(f -> buildJoinConditionByFieldName(context, f))
.reduce(context.rexBuilder::and)
.orElse(context.relBuilder.literal(true));
} else {
joinCondition =
node.getJoinCondition()
.map(c -> rexVisitor.analyzeJoinCondition(c, context))
.orElse(context.relBuilder.literal(true));
}
if (node.getJoinType() == SEMI || node.getJoinType() == ANTI) {
// semi and anti join only return left table outputs
context.relBuilder.join(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@
import org.apache.calcite.rel.core.JoinRelType;
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.util.Pair;
import org.opensearch.sql.ast.expression.And;
import org.opensearch.sql.ast.expression.Field;
import org.opensearch.sql.ast.expression.QualifiedName;
import org.opensearch.sql.ast.expression.UnresolvedExpression;
import org.opensearch.sql.ast.tree.Join;
import org.opensearch.sql.ast.tree.Lookup;
import org.opensearch.sql.calcite.CalcitePlanContext;
Expand All @@ -37,6 +41,24 @@ static JoinRelType translateJoinType(Join.JoinType joinType) {
}
}

/**
* Collects the names of bare single-part-field join criteria (e.g. `on a AND b` -> [a, b]).
* Returns false for anything else (qualified field, comparison, OR); {@code out} is only valid
* when this returns true, so callers must check the return before reading it.
*/
static boolean collectBareFields(UnresolvedExpression expr, List<String> out) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The out list is mutated as a side effect and is only valid when the method returns true — the Javadoc says so, which is good. But this is the classic boolean-return-plus-out-param pattern that's easy to misuse: a caller that reads out after a false return gets a partially-populated list (the recursion adds left-side fields before the right side returns false).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I refactored the code to Optional<List>, so therefore the list is only assembled when every node in the AND-tree is a bare field, so partial state can't exist.

I also added JoinAndLookupUtilsTest with 5 cases including mixedLeftSideReturnsEmptyNoPartialState, confirming that And(Compare(...), Field("b")) returns Optional.empty(), not a list containing "b".

if (expr instanceof And and) {
return collectBareFields(and.getLeft(), out) && collectBareFields(and.getRight(), out);
}
if (expr instanceof Field field
&& field.getField() instanceof QualifiedName qn
&& qn.getParts().size() == 1) {
out.add(qn.getParts().get(0));
return true;
}
return false;
}

/* ------For Lookup------ */

/**
Expand Down
7 changes: 5 additions & 2 deletions docs/user/ppl/cmd/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ source = table1 | left anti join left = l right = r on l.a = r.a table2
source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ]
source = table1 | inner join on table1.a = table2.a table2 | fields table1.a, table2.a, table1.b, table1.c
source = table1 | inner join on a = c table2 | fields a, b, c, d
source = table1 | inner join on a table2 | fields a, table2.a, b, c
source = table1 | inner join on a AND b table2 | fields a, table2.a, b, table2.b
source = table1 | join on a table2 | fields a, table2.a, b, c
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields l.a, r.a
source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a
source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a
Expand All @@ -40,7 +43,7 @@ The basic `join` syntax supports the following parameters.

| Parameter | Required/Optional | Description |
| --- | --- | --- |
| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |
| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. A bare field name `f` (or `f AND g ...`) is shorthand for an equi-join on the common field(s) and keeps both key columns. |
| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |
| `joinType` | Optional | The type of join to perform. Valid values are `left`, `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
| `left` | Optional | An alias for the left dataset (typically a subsearch) used to avoid ambiguous field names. Specify as `left = <leftAlias>`. |
Expand Down Expand Up @@ -71,7 +74,7 @@ The extended `join` syntax supports the following parameters.

| Parameter | Required/Optional | Description |
| --- | --- | --- |
| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. |
| `<joinCriteria>` | Required | A comparison expression specifying how to join the datasets. Must be placed after the `on` or `where` keyword in the query. A bare field name `f` (or `f AND g ...`) is shorthand for an equi-join on the common field(s); it keeps both key columns and differs from `<join-field-list>`, which merges duplicates. |
| `<right-dataset>` | Required | The right dataset, which can be an index or a subsearch, with or without an alias. |
| `type` | Optional | The join type when using extended syntax. Valid values are `left`, `outer` (same as `left`), `semi`, `anti`, and performance-sensitive types (`right`, `full`, and `cross`). Default is `inner`. |
| `<join-field-list>` | Optional | A list of fields used to build the join criteria. These fields must exist in both datasets. If not specified, all fields common to both datasets are used as join keys. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -928,6 +928,106 @@ public void testJoinWhenLegacyNotPreferred() throws IOException {
});
}

@Test
public void testJoinWithImplicitField() throws IOException {
// Keep-both, so co-named columns (other than the key) resolve to the left side.
JSONObject actual =
executeQuery(
String.format(
"source=%s | inner join on name %s | fields name, age, state, country, occupation,"
+ " salary",
TEST_INDEX_STATE_COUNTRY, TEST_INDEX_OCCUPATION));
verifySchema(
actual,
schema("name", "string"),
schema("age", "int"),
schema("state", "string"),
schema("country", "string"),
schema("occupation", "string"),
schema("salary", "int"));
verifyDataRows(
actual,
rows("Jake", 70, "California", "USA", "Engineer", 100000),
rows("Hello", 30, "New York", "USA", "Artist", 70000),
rows("John", 25, "Ontario", "Canada", "Doctor", 120000),
rows("Jane", 20, "Quebec", "Canada", "Scientist", 90000),
rows("David", 40, "Washington", "USA", "Doctor", 120000),
rows("David", 40, "Washington", "USA", "Unemployed", 0));

// The shorthand and the explicit qualified-equality form are output-identical.
JSONObject explicitForm =
executeQuery(
String.format(
"source=%s | inner join on %s.name = %s.name %s | fields name, age, state, country,"
+ " occupation, salary",
TEST_INDEX_STATE_COUNTRY,
TEST_INDEX_STATE_COUNTRY,
TEST_INDEX_OCCUPATION,
TEST_INDEX_OCCUPATION));
assertJsonEquals(explicitForm.toString(), actual.toString());
}

@Test
public void testJoinNoPrefixImplicitField() throws IOException {
// No-prefix `join on name` matches the explicit qualified form, not the field-list `join name`.
JSONObject actual =
executeQuery(
String.format(
"source=%s | join on name %s | fields name, age, state, occupation, salary",
TEST_INDEX_STATE_COUNTRY, TEST_INDEX_OCCUPATION));
JSONObject explicitForm =
executeQuery(
String.format(
"source=%s | join on %s.name = %s.name %s | fields name, age, state, occupation,"
+ " salary",
TEST_INDEX_STATE_COUNTRY,
TEST_INDEX_STATE_COUNTRY,
TEST_INDEX_OCCUPATION,
TEST_INDEX_OCCUPATION));
assertJsonEquals(explicitForm.toString(), actual.toString());
}

@Test
public void testLeftJoinWithImplicitField() throws IOException {
// Keep-both does not coalesce, so unmatched-left rows keep the non-null left key, not a NULL.
JSONObject actual =
executeQuery(
String.format(
"source=%s | left join on name %s | fields name, age, state, occupation, salary",
TEST_INDEX_STATE_COUNTRY, TEST_INDEX_OCCUPATION));
verifySchema(
actual,
schema("name", "string"),
schema("age", "int"),
schema("state", "string"),
schema("occupation", "string"),
schema("salary", "int"));
verifyDataRows(
actual,
rows("Jake", 70, "California", "Engineer", 100000),
rows("Hello", 30, "New York", "Artist", 70000),
rows("John", 25, "Ontario", "Doctor", 120000),
rows("Jane", 20, "Quebec", "Scientist", 90000),
rows("David", 40, "Washington", "Doctor", 120000),
rows("David", 40, "Washington", "Unemployed", 0),
rows("Jim", 27, "B.C", null, null),
rows("Peter", 57, "B.C", null, null),
rows("Rick", 70, "B.C", null, null));
}

@Test
public void testAliasedSelfJoinWithImplicitField() throws IOException {
// Self-join on the unique `name`: a real equi-join matches each of the 8 rows to itself (8
// rows), not the 64-row cross product a tautology would give.
JSONObject actual =
executeQuery(
String.format(
"source=%s | inner join left=l right=r on name %s | fields name, r.name",
TEST_INDEX_STATE_COUNTRY, TEST_INDEX_STATE_COUNTRY));
verifyNumOfRows(actual, 8);
verifySchema(actual, schema("name", "string"), schema("r.name", "string"));
}

@Test
public void testJoinComparing() throws IOException {
JSONObject actual =
Expand Down
5 changes: 3 additions & 2 deletions ppl/src/main/antlr/OpenSearchPPLParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -729,8 +729,9 @@ sourceFilterArg

// join
joinCommand
: JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause
| sqlLikeJoinType? JOIN (joinOption)* sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause
// Criteria alt listed first - so `join on a` reads `on` as the criteria keyword, not a field.
: sqlLikeJoinType? JOIN (joinOption)* sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before: the alternative reorder affects every join query's parse, not just the shorthand. testJoinFieldListStillParsesAsFieldList and testJoinPrefixWithoutCriteriaKeywordIsSyntaxError are good guards.

Please still call the grammar precedence change out explicitly in the PR description so reviewers don't miss it — the new comment covers the why well, but a note that existing queries were re-verified would close it out.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do-adding a note to the PR description that the reorder affects all join queries' parse path, and that existing behavior is verified by testJoinFieldListStillParsesAsFieldList + testJoinPrefixWithoutCriteriaKeywordIsSyntaxError (158 AstBuilder + 55 CalcitePPLJoin tests).

| JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause
;

sqlLikeJoinType
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,8 @@ public UnresolvedPlan visitJoinCommand(OpenSearchPPLParser.JoinCommandContext ct
if (ctx.fieldList() != null) {
joinFields = Optional.of(getFieldList(ctx.fieldList()));
}
// Keep a bare `on <field>` criteria verbatim; the planner expands it to an equi-join. Folding
// it into joinFields here would instead merge the key into one column.
return new Join(
projectExceptMeta(right),
leftAlias,
Expand Down
Loading
Loading