Hey team! While reviewing the manually_maintained modules, I noticed a potential failure point in tokenizers.py when initializing local tokenizers.
currently, _get_tokenizer_config_size assumes that the content length or x-goog-stored-content-length headers will always be present in the HTTP response. However, if the server responds with Chunked Transfer Encoding, these headers are omitted. When this happens, size defaults to None, and the subsequent int(typing.cast(int, size)) throws a TypeError, causing the tokenizer initialization to fail unexpectedly.
Hey team! While reviewing the manually_maintained modules, I noticed a potential failure point in tokenizers.py when initializing local tokenizers.
currently, _get_tokenizer_config_size assumes that the content length or x-goog-stored-content-length headers will always be present in the HTTP response. However, if the server responds with Chunked Transfer Encoding, these headers are omitted. When this happens, size defaults to None, and the subsequent int(typing.cast(int, size)) throws a TypeError, causing the tokenizer initialization to fail unexpectedly.