Description
When calling generate_content with candidate_count > 1, response.candidates[0].content.parts contains every candidate's text (one entry per candidate, in candidate order), not just candidate 0's own response. As a consequence, response.text — which joins all parts of candidates[0] — returns the concatenation of every candidate's response instead of a single candidate's text.
This is structurally surprising, undocumented (as far as I can find in the API reference and SDK docs), and easy to mishandle in downstream code that assumes parts of one candidate belong only to that candidate.
Environment
- Programming language: Python 3.10
- Package version: reproduced on
google-genai 1.69.0 and 2.6.0
- Model:
gemini-2.5-flash (also reproduces on gemini-2.5-flash-lite)
- API: Gemini Developer API (key-based, not Vertex)
Reproduction
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])
def non_thought_texts(parts):
return [p.text for p in (parts or [])
if getattr(p, 'text', None) and not getattr(p, 'thought', False)
and p.text.strip()]
for cc in (1, 2, 3):
print(f"\n=== candidate_count = {cc} ===")
response = client.models.generate_content(
model='gemini-2.5-flash',
contents=[types.Content(role='user',
parts=[types.Part(text='Write one short greeting.')])],
config=types.GenerateContentConfig(
max_output_tokens=4096,
temperature=1.0,
top_p=0.95,
candidate_count=cc if cc > 1 else None,
thinking_config=types.ThinkingConfig(thinking_budget=0),
),
)
candidates = response.candidates
print(f"len(response.candidates) = {len(candidates)}")
c0_texts = non_thought_texts(candidates[0].content.parts)
print(f"candidates[0].non_thought_texts: count={len(c0_texts)}, "
f"lens={[len(t) for t in c0_texts]}")
for i in range(1, len(candidates)):
ci_texts = non_thought_texts(candidates[i].content.parts)
print(f"candidates[{i}].non_thought_texts: count={len(ci_texts)}, "
f"lens={[len(t) for t in ci_texts]}")
if i < len(c0_texts) and ci_texts:
match = c0_texts[i] == ci_texts[0]
print(f" candidates[0].non_thought_texts[{i}] == "
f"candidates[{i}].non_thought_texts[0]? {match}")
rt = response.text or ''
total_c0 = sum(len(t) for t in c0_texts)
print(f"response.text len = {len(rt)}; sum(c[0].non_thought_lens) = {total_c0}")
print(f"response.text equals concat of c[0].non_thought_texts? "
f"{rt == ''.join(c0_texts)}")
Observed output
=== candidate_count = 1 ===
len(response.candidates) = 1
candidates[0].non_thought_texts: count=1, lens=[3]
response.text len = 3; sum(c[0].non_thought_lens) = 3
response.text equals concat of c[0].non_thought_texts? True
=== candidate_count = 2 ===
len(response.candidates) = 2
candidates[0].non_thought_texts: count=2, lens=[3, 3]
candidates[1].non_thought_texts: count=1, lens=[3]
candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]? True
response.text len = 6; sum(c[0].non_thought_lens) = 6
response.text equals concat of c[0].non_thought_texts? True
=== candidate_count = 3 ===
len(response.candidates) = 3
candidates[0].non_thought_texts: count=3, lens=[3, 9, 3]
candidates[1].non_thought_texts: count=1, lens=[9]
candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]? True
candidates[2].non_thought_texts: count=1, lens=[3]
candidates[0].non_thought_texts[2] == candidates[2].non_thought_texts[0]? True
response.text len = 15; sum(c[0].non_thought_lens) = 15
response.text equals concat of c[0].non_thought_texts? True
Note the byte-equality assertions: candidates[0].non_thought_texts[i] is byte-identical to candidates[i].non_thought_texts[0] for every i >= 1. So candidates[0].parts literally contains a copy of every sibling candidate's text.
With thinking enabled
The pattern extends: candidates[0].parts becomes [thought_0, text_0, thought_1, text_1, ..., thought_{N-1}, text_{N-1}], and each candidates[i].parts for i >= 1 is [thought_i, text_i]. So response.text (which now also drops thoughts via its property accessor — but the underlying packing is the same) still ends up joining sibling candidates' bodies.
Verified across
| Combination |
Lengths observed |
| flash, cc=2, temp=0.7 |
c[0]=[30,31], c[1]=[31] |
| flash, cc=3, temp=0.7 |
c[0]=[20,29,27], c[1]=[29], c[2]=[27] |
| flash, cc=3, temp=1.0 |
c[0]=[179,347,201], c[1]=[347], c[2]=[201] |
| flash, cc=3, long output |
c[0]=[2404,2347,1709], c[1]=[2347], c[2]=[1709] |
| flash, cc=3, thinking on |
c[0]=[t,47,t,43,t,75], c[1]=[t,43], c[2]=[t,75] |
| flash-lite, cc=3 |
same pattern |
In every case candidates[0].non_thought_texts[i] == candidates[i].non_thought_texts[0] (byte equality). Reproduced on both SDK 1.69.0 and 2.6.0.
Expected behavior
One of the following:
candidates[0].content.parts should contain only candidate 0's own content. Each sibling candidate's content lives in candidates[i] already, so duplicating it into candidates[0].parts is redundant and surprising.
- If the current packing is intentional, it should be documented prominently in the
candidate_count reference (both API docs and SDK docstrings), and the response.text property should either (a) materialize only candidates[0]'s own portion or (b) raise / warn more clearly than the current "returning text result from the first candidate" message, which doesn't hint that the result concatenates every sibling.
Actual behavior
response.text returns the joined text of every candidate when candidate_count > 1. Downstream code that uses response.text (a natural default) silently ships N candidates concatenated as one reply. Code that iterates response.candidates[i].content.parts and expects per-candidate isolation also breaks unless it knows to ignore parts[1:] of candidates[0].
Suggested fix
Either:
- Stop populating
candidates[0].content.parts with sibling text — let each candidate hold only its own content. This is the least-surprising shape and matches what the documentation implies.
- Or, if the underlying API legitimately returns the data this way, have the SDK normalize it before exposing
candidates[0] to the user, and make response.text raise on candidate_count > 1 rather than silently returning a concatenation.
Workaround
For each candidate i, read the first non-thought, non-empty text part rather than relying on response.text or joining candidates[0].content.parts:
def candidate_own_text(candidate):
for part in (candidate.content.parts or []):
if getattr(part, 'thought', False):
continue
text = (getattr(part, 'text', '') or '').strip()
if text:
return text
return None
per_candidate_texts = [candidate_own_text(c) for c in response.candidates]
Description
When calling
generate_contentwithcandidate_count > 1,response.candidates[0].content.partscontains every candidate's text (one entry per candidate, in candidate order), not just candidate 0's own response. As a consequence,response.text— which joins all parts ofcandidates[0]— returns the concatenation of every candidate's response instead of a single candidate's text.This is structurally surprising, undocumented (as far as I can find in the API reference and SDK docs), and easy to mishandle in downstream code that assumes
partsof one candidate belong only to that candidate.Environment
google-genai1.69.0and2.6.0gemini-2.5-flash(also reproduces ongemini-2.5-flash-lite)Reproduction
Observed output
Note the byte-equality assertions:
candidates[0].non_thought_texts[i]is byte-identical tocandidates[i].non_thought_texts[0]for everyi >= 1. Socandidates[0].partsliterally contains a copy of every sibling candidate's text.With thinking enabled
The pattern extends:
candidates[0].partsbecomes[thought_0, text_0, thought_1, text_1, ..., thought_{N-1}, text_{N-1}], and eachcandidates[i].partsfori >= 1is[thought_i, text_i]. Soresponse.text(which now also drops thoughts via its property accessor — but the underlying packing is the same) still ends up joining sibling candidates' bodies.Verified across
c[0]=[30,31],c[1]=[31]c[0]=[20,29,27],c[1]=[29],c[2]=[27]c[0]=[179,347,201],c[1]=[347],c[2]=[201]c[0]=[2404,2347,1709],c[1]=[2347],c[2]=[1709]c[0]=[t,47,t,43,t,75],c[1]=[t,43],c[2]=[t,75]In every case
candidates[0].non_thought_texts[i] == candidates[i].non_thought_texts[0](byte equality). Reproduced on both SDK 1.69.0 and 2.6.0.Expected behavior
One of the following:
candidates[0].content.partsshould contain only candidate 0's own content. Each sibling candidate's content lives incandidates[i]already, so duplicating it intocandidates[0].partsis redundant and surprising.candidate_countreference (both API docs and SDK docstrings), and theresponse.textproperty should either (a) materialize onlycandidates[0]'s own portion or (b) raise / warn more clearly than the current"returning text result from the first candidate"message, which doesn't hint that the result concatenates every sibling.Actual behavior
response.textreturns the joined text of every candidate whencandidate_count > 1. Downstream code that usesresponse.text(a natural default) silently ships N candidates concatenated as one reply. Code that iteratesresponse.candidates[i].content.partsand expects per-candidate isolation also breaks unless it knows to ignoreparts[1:]ofcandidates[0].Suggested fix
Either:
candidates[0].content.partswith sibling text — let each candidate hold only its own content. This is the least-surprising shape and matches what the documentation implies.candidates[0]to the user, and makeresponse.textraise oncandidate_count > 1rather than silently returning a concatenation.Workaround
For each candidate
i, read the first non-thought, non-emptytextpart rather than relying onresponse.textor joiningcandidates[0].content.parts: