Bug: uncaught type error in _get_tokenizer_config_size when server omits content length

Hey team! While reviewing the manually_maintained modules, I noticed a potential failure point in tokenizers.py when initializing local tokenizers.
currently, _get_tokenizer_config_size assumes that the content length or x-goog-stored-content-length headers will always be present in the HTTP response. However, if the server responds with Chunked Transfer Encoding, these headers are omitted. When this happens, size defaults to None, and the subsequent int(typing.cast(int, size)) throws a TypeError, causing the tokenizer initialization to fail unexpectedly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: uncaught type error in _get_tokenizer_config_size when server omits content length #762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: uncaught type error in _get_tokenizer_config_size when server omits content length #762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions