Skip to content

fix: add acquire fence in bthread_join for ARM memory visibility#3276

Open
guoliushui wants to merge 2 commits intoapache:masterfrom
guoliushui:fix_bthread_join_memory_alignment
Open

fix: add acquire fence in bthread_join for ARM memory visibility#3276
guoliushui wants to merge 2 commits intoapache:masterfrom
guoliushui:fix_bthread_join_memory_alignment

Conversation

@guoliushui
Copy link
Copy Markdown

Ensure memory visibility on ARM architecture after thread join.

What problem does this PR solve?

Issue Number: fix #3274

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

Ensure memory visibility on ARM architecture after thread join.
@guoliushui
Copy link
Copy Markdown
Author

已使用测试程序验证生效

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses weak-memory-ordering visibility issues where bthread_join() may return before the joining thread can reliably observe the joined bthread’s prior memory writes (reported on ARM/aarch64).

Changes:

  • Add an acquire fence after TaskGroup::join() observes the joined bthread has advanced version_butex.
  • Gate the fence with ARM architecture preprocessor checks (__aarch64__ / __arm__).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bthread/task_group.cpp Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@altman08
Copy link
Copy Markdown

altman08 commented Apr 22, 2026

我们最近也在用arm,想验证一下这个bug,可以提供一下复现bug的代码吗

@chenBright
Copy link
Copy Markdown
Contributor

butex_wait 有 acquire 语义。是不是只有没有进入循环的情况下才有问题?

while (*m->version_butex == expected_version) {
if (butex_wait(m->version_butex, expected_version, NULL) < 0 &&
errno != EWOULDBLOCK && errno != EINTR) {
return errno;
}
}

@guoliushui
Copy link
Copy Markdown
Author

guoliushui commented Apr 22, 2026

butex_wait 有 acquire 语义。是不是只有没有进入循环的情况下才有问题?

while (*m->version_butex == expected_version) {
if (butex_wait(m->version_butex, expected_version, NULL) < 0 &&
errno != EWOULDBLOCK && errno != EINTR) {
return errno;
}
}

当时也有过这方面的考虑,最后未做区分是否touch到了butex_wait原因如下:

  1. 即使之 butex_wait 内部有 acquire,从那个 acquire 到 while 退出的普通读之间,invalidate queue 可能又积累了新条目。只有在 while 退出后、读用户数据前加 fence,才能保证用户数据可见
  2. 区分逻辑增加复杂度,收益可忽略:多了变量和条件判断,只省下偶尔一条 DMB ISHLD(10~几十ns),在阻塞等待函数中意义不大

Copy link
Copy Markdown
Contributor

@chenBright chenBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenBright chenBright requested a review from wwbmmm April 22, 2026 06:06
@condy0919
Copy link
Copy Markdown
Contributor

@guoliushui 之前我在鲲鹏 950 上测试时也发现,bthread_join(..) 子任务时会卡住,导致客户端无法退出

@guoliushui
Copy link
Copy Markdown
Author

@guoliushui 之前我在鲲鹏 950 上测试时也发现,bthread_join(..) 子任务时会卡住,导致客户端无法退出

目前在鲲鹏920上,还未遇到过卡住的现象

@guoliushui
Copy link
Copy Markdown
Author

我们最近也在用arm,想验证一下这个bug,可以提供一下复现bug的代码吗

可以看下这里 https://www.cnblogs.com/guoliushui/p/19902091

@altman08
Copy link
Copy Markdown

我们最近也在用arm,想验证一下这个bug,可以提供一下复现bug的代码吗

可以看下这里 https://www.cnblogs.com/guoliushui/p/19902091

感谢,我们机器也是鲲鹏920的,复现了这个问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bthread_join lacks acquire fence on ARM, causing stale reads of joined bthread's memory writes

5 participants