Skip to content

fix: handle missing content-length header in _get_tokenizer_config_size#763

Open
fern-support wants to merge 1 commit intomainfrom
fern-support/pylon-19987-tokenizer-none-guard
Open

fix: handle missing content-length header in _get_tokenizer_config_size#763
fern-support wants to merge 1 commit intomainfrom
fern-support/pylon-19987-tokenizer-none-guard

Conversation

@fern-support
Copy link
Copy Markdown
Collaborator

@fern-support fern-support commented May 5, 2026

Summary

  • _get_tokenizer_config_size assumes content-length or x-goog-stored-content-length headers are always present, but servers using chunked transfer encoding omit both, leaving size as None
  • int(None) then raises a TypeError
  • The code comment on line 94 even acknowledged this case but the guard was never added

Fix

Add an explicit None check that raises a ValueError before the int() cast. Both callers already wrap this function in try/except Exception with "Skip the size logging, this is not critical" — the ValueError is caught and the tokenizer download proceeds normally.

Closes #762


Note

Low Risk
Low risk: adds a simple guard in non-critical size-logging code to avoid a TypeError when servers omit Content-Length headers (e.g., chunked transfer).

Overview
Prevents _get_tokenizer_config_size from calling int(None) when tokenizer config HEAD responses omit Content-Length/x-goog-stored-content-length by explicitly raising a ValueError.

Callers already treat this as best-effort logging, so tokenizer downloads proceed while size logging is skipped when the header is missing.

Reviewed by Cursor Bugbot for commit f9b1fd1. Bugbot is set up for automated code reviews on this repo. Configure here.

When a server uses chunked transfer encoding, neither content-length nor
x-goog-stored-content-length headers are present, leaving size as None.
The subsequent int() cast then raises a TypeError.

Add an explicit None check that raises a ValueError instead, which is
caught gracefully by the existing try/except in both callers (get_hf_tokenizer
and async_get_hf_tokenizer) — tokenizer download proceeds normally.

Fixes: #762
@fern-support fern-support force-pushed the fern-support/pylon-19987-tokenizer-none-guard branch from 02aa9f9 to f9b1fd1 Compare May 5, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: uncaught type error in _get_tokenizer_config_size when server omits content length

1 participant