-
Notifications
You must be signed in to change notification settings - Fork 25.7k
ESQL: Fix double release in inline stats when LocalRelation is reused
#136467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Fix double release in inline stats when LocalRelation is reused
#136467
Conversation
- Replace `LocalSupplier.of` with new `CopyingLocalSupplier` in `PruneColumns` - Add `INLINE_STATS_AFTER_PRUNED_AGGREGATE` capability - Add test cases for inline stats after aggregate pruning
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
LocalRelation is reused
|
@kanoshiou thanks for this patch! I'm however thinking if we shouldn't have another approach here: instead of patching the optimiser rules to use the @astefan, WDYT? |
@kanoshiou Your changes to |
astefan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, look into @bpintea's suggestion for a less invasive and more targeted approach to this bug fix. Please, let us know if you are able to work on this bug fix in short time, otherwise we are ready to take it over. Thank you.
astefan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had something else in mind when I did my first review.
Using a CopyingLocalSupplier should happen in InlineJoin.firstSubPlan:
public static LogicalPlanTuple firstSubPlan(LogicalPlan optimizedPlan, Set<LocalRelation> subPlansResults) {
final Holder<LogicalPlan> stubReplacedSubPlanHolder = new Holder<>();
final Holder<LogicalPlan> originalSubPlanHolder = new Holder<>();
// Collect the first inlinejoin (bottom up in the tree)
optimizedPlan.forEachUp(InlineJoin.class, ij -> {
// extract the right side of the plan and replace its source
if (stubReplacedSubPlanHolder.get() == null) {
if (ij.right().anyMatch(p -> p instanceof StubRelation)) {
stubReplacedSubPlanHolder.set(replaceStub(ij.left(), ij.right()));
} else if (ij.right() instanceof LocalRelation relation
&& (subPlansResults.isEmpty() || subPlansResults.contains(relation) == false)
|| ij.right() instanceof LocalRelation == false && ij.right().anyMatch(p -> p instanceof LocalRelation)) {
// In case the plan was optimized further and the StubRelation was replaced with a LocalRelation
// or the right hand side became a LocalRelation alltogether, there is no need to replace the source of the
// right-hand side anymore.
stubReplacedSubPlanHolder.set(ij.right());
// TODO: INLINE STATS this is essentially an optimization similar to the one in PruneInlineJoinOnEmptyRightSide
// this further supports the idea of running the optimization step again after the substitutions (see EsqlSession
// executeSubPlan() method where we could run the optimizer after the results are replaced in place).
}
originalSubPlanHolder.set(ij.right());
}
});
LogicalPlanTuple tuple = null;
if (stubReplacedSubPlanHolder.get() != null) {
var plan = stubReplacedSubPlanHolder.get();
plan = plan.transformUp(LocalRelation.class, lr -> {
if (lr.supplier() instanceof CopyingLocalSupplier == false) {
return new LocalRelation(lr.source(), lr.output(), new CopyingLocalSupplier(lr.supplier().get()));
}
return lr;
});
plan.setOptimized();
tuple = new LogicalPlanTuple(plan, originalSubPlanHolder.get());
}
return tuple;
}
|
Thank you for the thoughtful patch, @astefan — it really helped clarify the underlying issue! It looks like your current approach addresses the problem outside the logical optimization phase, which seems like the right direction for tackling the root cause. While reviewing your code, I was wondering if the |
I don't know. It needs to go through a round of testing to double check this. |
|
@elasticmachine test this please |
|
@elasticmachine test this please |
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/inlinestats.csv-spec
Outdated
Show resolved
Hide resolved
…fix-inline-stats-pruned-aggregates
astefan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. CI is happy with the change in ReplaceRowAsLocalRelation, I am happy as well.
9eb00fc to
e5444f3
Compare
|
@elasticmachine test this please |
|
Thanks for your review and sorry for the delay in replying to your comment and updating the code, @astefan! I learned a lot about |
bpintea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
| | stats PRezTcny = median(network.cost), `network.cost` = count(network.cost) | ||
| | drop network.cost | ||
| | inline stats QaoRGYyf = count(*) | ||
| | drop `PRezTcny` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These queries are generated. This one in particular is relatively clean, but those below are randomised and those identifiers are eye pain (most `` quotations aren't needed either). Usually, they can also be shrunk down to a "minimal" query that reproduces the issue.
Ideally, we'd minimise the queries and use "human-friendly" identifiers, but at a minimum, maybe indicate their source by adding a comment with the originating issue.
| if (stubReplacedSubPlanHolder.get() != null) { | ||
| var plan = stubReplacedSubPlanHolder.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
| if (stubReplacedSubPlanHolder.get() != null) { | |
| var plan = stubReplacedSubPlanHolder.get(); | |
| var plan = stubReplacedSubPlanHolder.get(); | |
| if (plan != null) { | |
| * And uses the left-hand side's source as its source: | ||
| * Aggregate[[],[MAX(x{r}#99,true[BOOLEAN]) AS y#102]] | ||
| * \_Limit[1000[INTEGER],false] | ||
| * \_LocalRelation[[x{r}#99],[IntVectorBlock[vector=ConstantIntVector[positions=1, value=1]]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This javadoc might require an update to be inline with the added paragraph, but this is minor, can remain as is.
…fix-inline-stats-pruned-aggregates
|
@elasticmachine test this please |
💔 Backport failed
You can use sqren/backport to manually backport by running |
…ed (elastic#136467) When the optimizer creates a `LocalRelation` using `LocalSupplier.of()` (which delegates to `ImmediateLocalSupplier`), the same page instance can be referenced and released multiple times during inline stats execution. This happens because the `LocalRelation` serves as a source for both sides of the `InlineJoin` - once for the left-hand side and again for the right-hand side, causing a double release error and connection closure. This PR modifies the `InlineJoin.firstSubPlan()` method to ensure that all `LocalRelation` nodes use `CopyingLocalSupplier` (if they don't already) after extracting the subplan. This prevents double release issues when the same page is reused across executions. Closes elastic#135679 (cherry picked from commit 968b623)
…ed (elastic#136467) When the optimizer creates a `LocalRelation` using `LocalSupplier.of()` (which delegates to `ImmediateLocalSupplier`), the same page instance can be referenced and released multiple times during inline stats execution. This happens because the `LocalRelation` serves as a source for both sides of the `InlineJoin` - once for the left-hand side and again for the right-hand side, causing a double release error and connection closure. This PR modifies the `InlineJoin.firstSubPlan()` method to ensure that all `LocalRelation` nodes use `CopyingLocalSupplier` (if they don't already) after extracting the subplan. This prevents double release issues when the same page is reused across executions. Closes elastic#135679
When the optimizer creates a
LocalRelationusingLocalSupplier.of()(which delegates toImmediateLocalSupplier), the same page instance can be referenced and released multiple times during inline stats execution. This happens because theLocalRelationserves as a source for both sides of theInlineJoin- once for the left-hand side and again for the right-hand side, causing a double release error and connection closure.This PR modifies the
InlineJoin.firstSubPlan()method to ensure that allLocalRelationnodes useCopyingLocalSupplier(if they don't already) after extracting the subplan. This prevents double release issues when the same page is reused across executions.Closes #135679