Skip to content

Fix openStream race between timeout and stream closure#436

Open
ChrisSchinnerl wants to merge 4 commits into
masterfrom
chris/default-timeout
Open

Fix openStream race between timeout and stream closure#436
ChrisSchinnerl wants to merge 4 commits into
masterfrom
chris/default-timeout

Conversation

@ChrisSchinnerl
Copy link
Copy Markdown
Member

While digging through indexd logs due to a constant flow of failed sector integrity checks I noticed that a fair number of these failed sectors happen due to timeouts. After adding some logging it turns out that we are running into the following race:

  • Each integrity check batch has a 5 minute timeout to avoid starving hosts
  • That timeout is applied implicitly in openStream using SetDeadline
  • At the same time openStream spins up a goroutine to close the stream if ctx is closed
  • The RPC fails due to I/o timeout but if ctx.Err() != nil doesn't trigger because the deadline triggers before the context is closed

To avoid this race (because we rely on the context to determine whether an RPC failed due to us aborting it versus something else) this PR updates openStream to only rely on the goroutine for all timeouts. Only if no timeout is specified on the context we use SetDeadline on the stream with a sane default. Which should never be the case in indexd.

I have deployed this on Zeus for testing and haven't seen any false positives yet.

Copilot AI review requested due to automatic review settings May 28, 2026 13:27
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ce68fabc9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rhp/v4/rpc.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts RHPv4 stream timeout handling to avoid a race between net.Conn deadlines and context cancellation, ensuring callers can reliably attribute failures to context-driven aborts vs transport timeouts.

Changes:

  • Update openStream to only apply a SetDeadline fallback when the context has no deadline, and otherwise rely on context cancellation to close the stream.
  • Simplify the context/close goroutine to always close the stream when triggered.
  • Update the timeout-related test expectations and add a changeset entry.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
rhp/v4/rpc.go Changes how stream deadlines are applied and how streams are closed on context completion.
rhp/v4/rpc_test.go Adjusts timeout error detection in TestRPCTimeout to account for different error strings.
.changeset/only_set_default_timeout_in_openstream_when_the_context_has_no_deadline_and_handle_context_deadlines_via_goroutine.md Documents the behavioral change as a patch changeset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rhp/v4/rpc.go
Comment thread rhp/v4/rpc.go
Comment thread rhp/v4/rpc_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants