Skip to content

Avoid publishing interpreter frame with invalid IP#129795

Open
BrzVlad wants to merge 1 commit into
dotnet:mainfrom
BrzVlad:fix-clrinterp-stack-overflow
Open

Avoid publishing interpreter frame with invalid IP#129795
BrzVlad wants to merge 1 commit into
dotnet:mainfrom
BrzVlad:fix-clrinterp-stack-overflow

Conversation

@BrzVlad

@BrzVlad BrzVlad commented Jun 24, 2026

Copy link
Copy Markdown
Member

When we need to obtain a new InterpMethodContextFrame for called method execution, we try to obtain a preallocated one from the list. If there is none, then we just allocate a new one with alloca and initialize the fields, including the ip to 0 initially. The problem is that the compiler is free to reorder this initialization in whichever way it considers optimal. If a stack overflow gets triggered at some point during execution (at a location that we don't normally expect), the unwinder might actually observe a pushed interp frame that has a non-null junk ip. We fix this by making sure the ip is properly zeroed before we publish the new interpreter frame on the list. Order is achieved just via compiler barriers.

Should fix stackoverflowtester on win-arm64.

"Assert failure(PID 46068 [0x0000b3f4], Thread: 46860 [0xb70c]): m_crawl.GetCodeInfo()->IsValid()"

CORECLR! StackFrameIterator::NextRaw + 0x764 (0x00007ff8`157b6e04)"
CORECLR! StackFrameIterator::Filter + 0xBD4 (0x00007ff8`157b57b4)"
CORECLR! StackFrameIterator::Init + 0x258 (0x00007ff8`157b60d8)"
CORECLR! Thread::StackWalkFramesEx + 0x178 (0x00007ff8`157b7e78)"
CORECLR! Thread::StackWalkFrames + 0x1A8 (0x00007ff8`157b7cd8)"
CORECLR! LogCallstackForLogWorker + 0x19C (0x00007ff8`1585139c)"
CORECLR! LogStackOverflowStackTraceThread + 0x10 (0x00007ff8`15851fe0)"
KERNEL32! BaseThreadInitThunk + 0x40 (0x00007ff8`8f128740)"
<no module>! <no symbol> + 0x0 (0x392cfff8`8fe64594)"
    File: D:\a\_work\1\s\src\coreclr\vm\stackwalk.cpp:2306"
    Image: C:\h\w\BD470A90\p\corerun.exe"

Copilot AI review requested due to automatic review settings June 24, 2026 11:41
@BrzVlad

BrzVlad commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

/azp run runtime-interpreter

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@BrzVlad

BrzVlad commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

/azp run runtime-interpreter

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@BrzVlad

BrzVlad commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

@janvorli Miraculously, this actually fixed the failing test. It would be great if you could reproduce the failure and validate the theory, but I believe the fix is simplistic enough anyway and it doesn't impact any hot path.

pFrame->pNext = pChildFrame;
// We make sure that a new frame can't be seen with invalid ip/next when a stack
// overflow is triggered at a location outside of our control.
VolatileStoreWithoutBarrier(&pChildFrame->ip, (const int32_t*)nullptr);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wish there was a less verbose way when we really need just a compiler barrier between the other fields writes and the pNext field write. But I understand that since compiler barrier like _ReadWriteBarrier on Windows is deprecated, we will have to live with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is what mono was doing with mono_compiler_barrier. Apparently there is the cross platform std::atomic_signal_fence(std::memory_order_acq_rel) which should achieve these everywhere and might be a good alternative.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it seems we could use that one. It doesn't seem to contradict any of the rules in https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/clr-code-guide.md#211-using-standard-headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants