Skip to content

fix(auth): add HTTP timeouts to bridge-token and Client endpoints#558

Open
RapidPoseidon wants to merge 1 commit intomainfrom
poseidon/add-credential-http-timeouts
Open

fix(auth): add HTTP timeouts to bridge-token and Client endpoints#558
RapidPoseidon wants to merge 1 commit intomainfrom
poseidon/add-credential-http-timeouts

Conversation

@RapidPoseidon
Copy link
Copy Markdown
Contributor

Summary

_get_bridge_tokens, _poll_read_key, and _create_client in CredentialManager all hit identity endpoints with requests.* and no timeout=. A slow/hung identity server or a deliberate slowloris holds the calling thread forever.

poll_timeout caps the total polling loop only — a hang on any individual request still blocks forever. The bridge and create-client calls had no cap whatsoever.

Fix

Single _HTTP_TIMEOUT = (10.0, 30.0) class constant (connect, read), threaded through all three call sites.

Test plan

  • uv run pyright src/rapidata/rapidata_client → 0 errors
  • All requests.* call sites in the file now have timeout=.

🔗 Session: https://session-bc38cc85.poseidon.rapidata.internal/

`_get_bridge_tokens`, `_poll_read_key`, and `_create_client` all called
`requests.*` without a `timeout=` argument. A slow or hung identity
server — or a slowloris — would pin the calling thread indefinitely.
The existing `poll_timeout` only caps the total polling loop, not
individual requests, and the bridge/create-client calls had no cap at
all.

Add a shared `(connect=10s, read=30s)` pair and thread it through all
three call sites.

Session: https://session-bc38cc85.poseidon.rapidata.internal/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: lino <lino@rapidata.ai>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Code Review

Overview

This PR correctly identifies and fixes a real reliability issue: all three requests.* call sites in CredentialManager lacked HTTP timeouts, meaning a slow or hung identity server could pin the calling thread indefinitely. The fix is minimal and well-scoped.


Strengths

  • Correct approach: Using a (connect, read) tuple is the right pattern for requests — it independently caps the TCP handshake and the response read, which is strictly better than a single scalar timeout.
  • Single constant: Centralising the value as _HTTP_TIMEOUT means it can be adjusted in one place. Avoids scattered magic numbers.
  • All call sites covered: The diff touches all three requests.* calls in the file. No omissions.
  • Reasonable values: 10 s connect / 30 s read is appropriate for interactive auth flows.

Issues

1. Multi-line comment violates project conventions (minor)

CLAUDE.md says: "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max."

The 4-line block on _HTTP_TIMEOUT should be trimmed. The why is still worth keeping, but in one line:

# (connect, read) pair; outer poll_timeout only bounds the loop, not individual requests.
_HTTP_TIMEOUT: Tuple[float, float] = (10.0, 30.0)

2. Polling loop can exceed poll_timeout by up to N × 40 s (worth noting)

In _poll_read_key, the guard is:

while time.time() - start_time < self.poll_timeout:

But each iteration can block for up to connect_timeout + read_timeout = 40 s before the loop condition is re-evaluated. If the server consistently responds slowly, the actual wall-clock time becomes poll_timeout + (number_of_slow_requests × 40 s). With the default poll_timeout=300 s this is bounded — far better than the previous infinite hang — but callers assuming a hard cap of poll_timeout seconds will be surprised.

A low-cost improvement would be to pass the remaining budget as the read timeout:

remaining = self.poll_timeout - (time.time() - start_time)
if remaining <= 0:
    break
read_timeout = min(self._HTTP_TIMEOUT[1], remaining)
response = requests.get(..., timeout=(self._HTTP_TIMEOUT[0], read_timeout))

This is not a blocker, but worth a follow-up issue or a note in the docstring.

3. A single timeout on _get_bridge_tokens / _create_client aborts the entire auth flow (existing behaviour, worth noting)

requests.Timeout is caught by the except requests.RequestException blocks and returns None, which propagates as an auth failure. A transient 10 s connect stall on a cold server would fail the whole login. The current PR does not make this worse — it was already the case that any RequestException would abort — but adding a one-retry on Timeout specifically would improve resilience. Not required for this PR.

4. No automated tests

The test plan is manual (pyright + eyeball). A unit test using unittest.mock.patch on requests.post / requests.get to assert that timeout= is always forwarded would lock in this fix and prevent regression. Given the small scope of the change, even a single parametrised test would be sufficient.


Summary

The fix is correct and the values are sensible. The two actionable items before merge are:

  1. Trim the comment to one line (required per CLAUDE.md).
  2. Consider bounding the polling-loop read timeout to the remaining budget (optional but recommended).

Everything else is pre-existing behaviour or a follow-up. Good catch on the missing timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant