Environment details
- Programming language: Python
- OS: Debian
- Language runtime version: Python 3.13.5
- Package version: 1.73.1
|
output_contents.append(chunk.candidates[0].content) |
In a streaming response, the model returns data in multiple chunks. If the model streams a thought followed by a tool call, they arrive in separate chunks. Because output_contents is populated with output_contents.append(chunk.candidates[0].content) during the stream, it is a list of Content objects.
By calling .extend(output_contents) in record_history, the SDK is dumping every single chunk as a separate Content entry into the history array. Instead of rolling up the parts into a single Content(role="model", parts=[...]), it’s producing history that looks like this:
[
Content(role="user", parts=["Do a task..."]),
Content(role="model", parts=["<thought>..."]),
Content(role="model", parts=["</thought>"]),
Content(role="model", parts=[FunctionCall(...)])
]
Gemini's Context Caching relies on strict left-to-right byte matching of the context payload. The cache expects alternating user and model turns. Even though the API might technically tolerate back-to-back model turns by silently collapsing them on the backend during inference, the raw JSON structure of the request has changed.
Because the serialized JSON array now contains multiple consecutive {"role": "model", ...} objects instead of one merged object, the byte-prefix of the request differs from whatever standard single-turn format was cached, resulting in a 100% cache miss (Cached Read = 0).
Environment details
python-genai/google/genai/chats.py
Line 318 in cd66b68
In a streaming response, the model returns data in multiple chunks. If the model streams a thought followed by a tool call, they arrive in separate chunks. Because output_contents is populated with output_contents.append(chunk.candidates[0].content) during the stream, it is a list of Content objects.
By calling .extend(output_contents) in record_history, the SDK is dumping every single chunk as a separate Content entry into the history array. Instead of rolling up the parts into a single Content(role="model", parts=[...]), it’s producing history that looks like this:
Gemini's Context Caching relies on strict left-to-right byte matching of the context payload. The cache expects alternating user and model turns. Even though the API might technically tolerate back-to-back model turns by silently collapsing them on the backend during inference, the raw JSON structure of the request has changed.
Because the serialized JSON array now contains multiple consecutive {"role": "model", ...} objects instead of one merged object, the byte-prefix of the request differs from whatever standard single-turn format was cached, resulting in a 100% cache miss (Cached Read = 0).