candidate_count > 1: candidates[0].content.parts contains every candidate's text; response.text concatenates them all

## Description

When calling `generate_content` with `candidate_count > 1`, `response.candidates[0].content.parts` contains **every candidate's text** (one entry per candidate, in candidate order), not just candidate 0's own response. As a consequence, `response.text` — which joins all parts of `candidates[0]` — returns the concatenation of every candidate's response instead of a single candidate's text.

This is structurally surprising, undocumented (as far as I can find in the API reference and SDK docs), and easy to mishandle in downstream code that assumes `parts` of one candidate belong only to that candidate.

## Environment

- Programming language: Python 3.10
- Package version: reproduced on `google-genai` `1.69.0` and `2.6.0`
- Model: `gemini-2.5-flash` (also reproduces on `gemini-2.5-flash-lite`)
- API: Gemini Developer API (key-based, not Vertex)

## Reproduction

```python
import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ['GEMINI_API_KEY'])

def non_thought_texts(parts):
    return [p.text for p in (parts or [])
            if getattr(p, 'text', None) and not getattr(p, 'thought', False)
            and p.text.strip()]

for cc in (1, 2, 3):
    print(f"\n=== candidate_count = {cc} ===")
    response = client.models.generate_content(
        model='gemini-2.5-flash',
        contents=[types.Content(role='user',
                                parts=[types.Part(text='Write one short greeting.')])],
        config=types.GenerateContentConfig(
            max_output_tokens=4096,
            temperature=1.0,
            top_p=0.95,
            candidate_count=cc if cc > 1 else None,
            thinking_config=types.ThinkingConfig(thinking_budget=0),
        ),
    )
    candidates = response.candidates
    print(f"len(response.candidates) = {len(candidates)}")
    c0_texts = non_thought_texts(candidates[0].content.parts)
    print(f"candidates[0].non_thought_texts: count={len(c0_texts)}, "
          f"lens={[len(t) for t in c0_texts]}")
    for i in range(1, len(candidates)):
        ci_texts = non_thought_texts(candidates[i].content.parts)
        print(f"candidates[{i}].non_thought_texts: count={len(ci_texts)}, "
              f"lens={[len(t) for t in ci_texts]}")
        if i < len(c0_texts) and ci_texts:
            match = c0_texts[i] == ci_texts[0]
            print(f"  candidates[0].non_thought_texts[{i}] == "
                  f"candidates[{i}].non_thought_texts[0]?  {match}")
    rt = response.text or ''
    total_c0 = sum(len(t) for t in c0_texts)
    print(f"response.text len = {len(rt)}; sum(c[0].non_thought_lens) = {total_c0}")
    print(f"response.text equals concat of c[0].non_thought_texts? "
          f"{rt == ''.join(c0_texts)}")
```

### Observed output

```
=== candidate_count = 1 ===
len(response.candidates) = 1
candidates[0].non_thought_texts: count=1, lens=[3]
response.text len = 3; sum(c[0].non_thought_lens) = 3
response.text equals concat of c[0].non_thought_texts? True

=== candidate_count = 2 ===
len(response.candidates) = 2
candidates[0].non_thought_texts: count=2, lens=[3, 3]
candidates[1].non_thought_texts: count=1, lens=[3]
  candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]?  True
response.text len = 6; sum(c[0].non_thought_lens) = 6
response.text equals concat of c[0].non_thought_texts? True

=== candidate_count = 3 ===
len(response.candidates) = 3
candidates[0].non_thought_texts: count=3, lens=[3, 9, 3]
candidates[1].non_thought_texts: count=1, lens=[9]
  candidates[0].non_thought_texts[1] == candidates[1].non_thought_texts[0]?  True
candidates[2].non_thought_texts: count=1, lens=[3]
  candidates[0].non_thought_texts[2] == candidates[2].non_thought_texts[0]?  True
response.text len = 15; sum(c[0].non_thought_lens) = 15
response.text equals concat of c[0].non_thought_texts? True
```

Note the byte-equality assertions: `candidates[0].non_thought_texts[i]` is byte-identical to `candidates[i].non_thought_texts[0]` for every `i >= 1`. So `candidates[0].parts` literally contains a copy of every sibling candidate's text.

### With thinking enabled

The pattern extends: `candidates[0].parts` becomes `[thought_0, text_0, thought_1, text_1, ..., thought_{N-1}, text_{N-1}]`, and each `candidates[i].parts` for `i >= 1` is `[thought_i, text_i]`. So `response.text` (which now also drops thoughts via its property accessor — but the underlying packing is the same) still ends up joining sibling candidates' bodies.

### Verified across

| Combination | Lengths observed |
|---|---|
| flash, cc=2, temp=0.7 | `c[0]=[30,31]`, `c[1]=[31]` |
| flash, cc=3, temp=0.7 | `c[0]=[20,29,27]`, `c[1]=[29]`, `c[2]=[27]` |
| flash, cc=3, temp=1.0 | `c[0]=[179,347,201]`, `c[1]=[347]`, `c[2]=[201]` |
| flash, cc=3, long output | `c[0]=[2404,2347,1709]`, `c[1]=[2347]`, `c[2]=[1709]` |
| flash, cc=3, thinking on | `c[0]=[t,47,t,43,t,75]`, `c[1]=[t,43]`, `c[2]=[t,75]` |
| flash-lite, cc=3 | same pattern |

In every case `candidates[0].non_thought_texts[i] == candidates[i].non_thought_texts[0]` (byte equality). Reproduced on both SDK 1.69.0 and 2.6.0.

## Expected behavior

One of the following:

1. `candidates[0].content.parts` should contain only candidate 0's own content. Each sibling candidate's content lives in `candidates[i]` already, so duplicating it into `candidates[0].parts` is redundant and surprising.
2. If the current packing is intentional, it should be **documented prominently** in the `candidate_count` reference (both API docs and SDK docstrings), and the `response.text` property should either (a) materialize only `candidates[0]`'s own portion or (b) raise / warn more clearly than the current `"returning text result from the first candidate"` message, which doesn't hint that the result concatenates every sibling.

## Actual behavior

`response.text` returns the joined text of every candidate when `candidate_count > 1`. Downstream code that uses `response.text` (a natural default) silently ships N candidates concatenated as one reply. Code that iterates `response.candidates[i].content.parts` and expects per-candidate isolation also breaks unless it knows to ignore `parts[1:]` of `candidates[0]`.

## Suggested fix

Either:

- Stop populating `candidates[0].content.parts` with sibling text — let each candidate hold only its own content. This is the least-surprising shape and matches what the documentation implies.
- Or, if the underlying API legitimately returns the data this way, have the SDK normalize it before exposing `candidates[0]` to the user, and make `response.text` raise on `candidate_count > 1` rather than silently returning a concatenation.

## Workaround

For each candidate `i`, read the first non-thought, non-empty `text` part rather than relying on `response.text` or joining `candidates[0].content.parts`:

```python
def candidate_own_text(candidate):
    for part in (candidate.content.parts or []):
        if getattr(part, 'thought', False):
            continue
        text = (getattr(part, 'text', '') or '').strip()
        if text:
            return text
    return None

per_candidate_texts = [candidate_own_text(c) for c in response.candidates]
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

candidate_count > 1: candidates[0].content.parts contains every candidate's text; response.text concatenates them all #2507

Description

Environment

Reproduction

Observed output

With thinking enabled

Verified across

Expected behavior

Actual behavior

Suggested fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Combination	Lengths observed
flash, cc=2, temp=0.7	`c[0]=[30,31]`, `c[1]=[31]`
flash, cc=3, temp=0.7	`c[0]=[20,29,27]`, `c[1]=[29]`, `c[2]=[27]`
flash, cc=3, temp=1.0	`c[0]=[179,347,201]`, `c[1]=[347]`, `c[2]=[201]`
flash, cc=3, long output	`c[0]=[2404,2347,1709]`, `c[1]=[2347]`, `c[2]=[1709]`
flash, cc=3, thinking on	`c[0]=[t,47,t,43,t,75]`, `c[1]=[t,43]`, `c[2]=[t,75]`
flash-lite, cc=3	same pattern

candidate_count > 1: candidates[0].content.parts contains every candidate's text; response.text concatenates them all #2507

Description

Description

Environment

Reproduction

Observed output

With thinking enabled

Verified across

Expected behavior

Actual behavior

Suggested fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions