Skip to content

[CELEBORN-2340] Explicitly trim Flusher thread pool cache when flusher queue is empty#3696

Open
s0nskar wants to merge 2 commits into
apache:mainfrom
s0nskar:fix_memory_leak
Open

[CELEBORN-2340] Explicitly trim Flusher thread pool cache when flusher queue is empty#3696
s0nskar wants to merge 2 commits into
apache:mainfrom
s0nskar:fix_memory_leak

Conversation

@s0nskar
Copy link
Copy Markdown
Contributor

@s0nskar s0nskar commented May 19, 2026

What changes were proposed in this pull request?

We are using 0.5.3 and noticed that after a worker processes a heavy shuffle workload, its Netty direct memory metric (usedDirectMemory) spikes to a high value and then gets stuck and never recovers — even after the all flush ends and the worker becomes completely idle. The memory stays stuck until the worker is restarted.

Screenshot 2026-05-19 at 2 25 38 PM

Although i noticed that there are fixes which tries to handle this by using pinnedMemory – #3018 and #3099 but i think that this PR is the correct way to handle this instead of relying on pinnedMemory.

Why are the changes needed?

PooledByteBufAllocator maintains a per-thread PoolThreadCache for each Flusher thread. When a ByteBuf is released, instead of immediately returning its slot to the PoolArena, it goes into this thread-local cache. But the PoolArena's view of the underlying PoolChunk is not updated. usedDirectMemory counts all PoolChunk native memory regardless of how much is truly in use vs. sitting in thread caches.

The thread cache is only swept when the owning thread crosses a allocation threshold, but since the worker is in pause state after all flush items end, all flush thread gets blocked on workingQueue(index).take(). The cache is never swept, PoolChunks in thread cache are never freed, and usedDirectMemory stays frozen.

In this PR, we are changing take() with poll() and on poll() timeout we are explicitly calling allocator.trimCurrentThreadCache() to flush cached PoolChunks back to the PoolArena, allowing fully-free chunks to be destroyed and usedDirectMemory to recover.

A similar pattern is already used by:

https://github.com/apache/celeborn/blob/main/worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/ReadBufferDispatcher.java#L131

Does this PR resolve a correctness bug?

  • Yes

Does this PR introduce any user-facing change?

  • Yes

How was this patch tested?

@s0nskar
Copy link
Copy Markdown
Contributor Author

s0nskar commented May 19, 2026

cc: @leixm @SteNicholas @FMX PTAL

Copy link
Copy Markdown
Contributor

@zaynt4606 zaynt4606 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but we need a JIRA ticket to merge

@zaynt4606 zaynt4606 changed the title Explicitly trim Flusher thread pool cache when flusher queue is empty [CELEBORN-xxxx] Explicitly trim Flusher thread pool cache when flusher queue is empty May 21, 2026
@SteNicholas SteNicholas requested a review from Copilot May 25, 2026 06:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the worker-side storage flusher loop to proactively release Netty PooledByteBufAllocator per-thread cache memory when flusher threads are idle, aiming to allow usedDirectMemory to drop after heavy shuffle workloads complete.

Changes:

  • Replace workingQueues(index).take() with poll(..., 1000ms) in flusher worker threads.
  • On poll timeout (queue idle), call PooledByteBufAllocator.trimCurrentThreadCache() to flush thread-local cached buffers back to the arenas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SteNicholas SteNicholas force-pushed the main branch 2 times, most recently from 1d92a40 to cf8d472 Compare May 27, 2026 02:11
@s0nskar s0nskar changed the title [CELEBORN-xxxx] Explicitly trim Flusher thread pool cache when flusher queue is empty [CELEBORN-2340] Explicitly trim Flusher thread pool cache when flusher queue is empty May 27, 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@s0nskar
Copy link
Copy Markdown
Contributor Author

s0nskar commented May 27, 2026

Added the jira and handled copilot suggestion.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment on lines +93 to 97
processIOException(e, DiskStatus.READ_OR_WRITE_FAILURE)
logWarning(s"Flusher-$this-thread-$index encounter exception.", t)
}
} catch {
case t: Throwable =>
val e = ExceptionUtils.wrapThrowableToIOException(t)
task.notifier.setException(e)
processIOException(e, DiskStatus.READ_OR_WRITE_FAILURE)
logWarning(s"Flusher-$this-thread-$index encounter exception.", t)
lastBeginFlushTime.set(index, -1)
}
Comment on lines +101 to +107
} else {
allocator match {
case alloc: PooledByteBufAllocator =>
// Free buffer pool memory to main direct memory when flush thread is idle.
alloc.trimCurrentThreadCache
case _ =>
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants