Skip to content

fix(results): correct column generation in compare/ranking to_pandas#582

Open
RapidPoseidon wants to merge 1 commit intomainfrom
fix(results)/compare-to-pandas-column-bugs
Open

fix(results): correct column generation in compare/ranking to_pandas#582
RapidPoseidon wants to merge 1 commit intomainfrom
fix(results)/compare-to-pandas-column-bugs

Conversation

@RapidPoseidon
Copy link
Copy Markdown
Contributor

Summary

RapidataResults.to_pandas() had four column-generation bugs in the compare/ranking branch. All four are fixed in this PR; positional A/B asset assignment is left as-is per backend ordering guarantees.

# Bug Before After
2 split_details=True bypassed the compare path. Asset metrics stayed as raw dict values in single columns. aggregatedResults: {a:0, b:3} (one column) A_aggregatedResults=0, B_aggregatedResults=3 (split + per-vote rows)
3 Asset detection picked "the first dict with ≥2 entries", so privateMetadata with 2+ keys hijacked detection and zeroed every metric column. assetA='campaign_id', A_aggregatedResults=0 assetA='aurora.png', A_aggregatedResults=3
4 Ranking with ≥3 assets silently dropped everything past index 1. only A_*/B_* columns per-asset <asset>_<metric> columns
5 detailedResults leaked into the dataframe as a list-valued column. detailedResults column with nested dicts omitted from row body; consumed only by split_details=True

Implementation notes

  • New _extract_compare_assets helper resolves the asset list deterministically: prefer assetUrls (the canonical list emitted by the backend), fall back to aggregatedResults (with Both/Neither filtered).
  • to_pandas now routes Compare/Ranking through the compare-aware path even when split_details=True, so A/B (or per-asset) splitting is preserved alongside per-vote row expansion.
  • A metric dict is only treated as comparative if it shares at least one key with the asset list — this keeps unrelated dicts (privateMetadata, summary, …) out of the metric columns instead of producing A_<key>=None noise.
  • Lists in result rows are skipped when building the base row; detailedResults is consumed exclusively by the split_details=True branch.

Test plan

Reproduced each bug against the current main and confirmed the fix on the new branch with the following inputs (run as a standalone script importing rapidata_results.py directly):

  • Standard compare with assetUrls + aggregatedResults + detailedResults → detailedResults no longer in columns; A_/B_ columns generated as expected
  • Compare with privateMetadata={...} (≥2 keys) preceding assetUrlsassetA/assetB correctly resolved; metric columns no longer all 0
  • Compare with split_details=TrueA_/B_ columns retained, one row per detailed vote, votedFor/userDetails_* flattened in
  • Compare with Both/Neither voting enabled → Both_<metric>/Neither_<metric> columns still emitted (regression)
  • Ranking with 3 assets → per-asset <asset>_<metric> columns; no data dropped
  • split_details=True with no detailedResults field still raises ValueError (regression)
  • pyright src/rapidata/rapidata_client/results/rapidata_results.py → 0 errors, 0 warnings

🔗 Session: https://session-aaa156f7.poseidon.rapidata.internal/

The compare/ranking branch of `RapidataResults.to_pandas` had four
column-generation bugs:

- `split_details=True` skipped the compare path entirely, leaving
  per-asset metric dicts (`aggregatedResults` etc.) as raw dict values
  in single columns instead of split into `A_`/`B_` columns.
- The asset list was inferred from "the first dict with >=2 entries",
  so a `privateMetadata` dict with multiple keys hijacked detection and
  zeroed every metric column. Asset detection now prefers `assetUrls`
  and falls back to `aggregatedResults`.
- Ranking results with 3+ assets silently dropped data past index 1.
  Ranking now emits one `<asset>_<metric>` column per asset.
- `detailedResults` (a list) leaked into the standard dataframe as a
  list-valued column. Lists are now excluded from the row body and
  only consumed when `split_details=True`.

Metric dicts that don't share any key with the asset list (e.g.
`privateMetadata`) are also skipped instead of producing
`A_<key>=None` columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: lino <lino@rapidata.ai>
@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Code Review

Good fix for a set of real, reproducible bugs. The PR description is excellent — the bug table and implementation notes make the intent clear. Here are my findings:


Critical

_extract_compare_assets type check may silently fail (line 207)

asset_urls = result.get("assetUrls")
if isinstance(asset_urls, dict) and asset_urls:
    return [k for k in asset_urls.keys() if k not in excluded]

The code assumes assetUrls is a dict. If the backend actually sends it as a list (e.g. ["img1.png", "img2.png"]), this check fails silently and falls through to the aggregatedResults fallback — defeating the whole purpose of preferring assetUrls. The docstring calls it "the canonical asset list", which strongly implies it's a list. If it is, the fix should be:

if isinstance(asset_urls, list) and asset_urls:
    return [k for k in asset_urls if k not in excluded]

Please verify the actual wire format before merging.


Important

is_compare should use orderType, not asset count (line 239)

is_compare = len(assets) == 2

A Ranking with exactly 2 candidates produces assetA/assetB + A_/B_ columns, which looks like Compare output. Ranking semantics are different (ordered position matters) so the column naming should reflect that. The orderType is available:

is_compare = self["info"].get("orderType") == "Compare"

Default changed from 0 to None — potential breaking change (lines 268–269)

# Before
row[f"A_{key}"] = values.get(asset, 0)

# After
base_row[f"A_{key}"] = values.get(assets[0])  # None if key absent

None is semantically correct (absence ≠ zero), but callers doing arithmetic on these columns (.sum(), .mean()) will now get NaN instead of 0. This is a silent behavioral change. Worth noting in the PR description or changelog.

split_details=True now raises ValueError for Compare/Ranking with no detailedResults

_has_detailed_results() is evaluated before routing to _compare_to_pandas, so Compare/Ranking results without detailedResults now raise ValueError when split_details=True. But _compare_to_pandas already handles this gracefully by falling back to rows.append(base_row). The PR marks this as an intentional regression-preservation, but it means compare results with split_details=True are now stricter than the compare-without-split path. If there's a use case where detailedResults is optional for compare, this will surprise users.

Silent row skipping with no warning (line 236)

if len(assets) < 2:
    continue

Results with fewer than 2 assets are silently dropped. This can produce a DataFrame with fewer rows than len(self["results"]) and no indication of why. A managed_print warning (consistent with the existing one on line 46) would help debugging.


Minor

_has_detailed_results() only inspects the first result (line 91)

If the first result lacks detailedResults but subsequent ones have it, the method returns False and the ValueError fires prematurely. This is pre-existing, but it becomes more visible now that Compare/Ranking also go through this check.

Ranking metric column naming (line 276)

For ranking with N>2 assets, asset identifiers are stored as asset_1, asset_2, ... but metric columns are named {actual_asset_name}_{metric}. If asset names contain special characters or are long paths, column names can be hard to work with. A urllib.parse.quote or basename extraction might help, but this is a design question for later.


Tests

There are no committed unit tests. Given the PR fixes four distinct bugs, each with specific triggering conditions (privateMetadata with 2+ keys, split_details=True on compare, ranking with N≥3, detailedResults leaking), regression tests would be valuable. Manual verification is described thoroughly, but these edge cases are easy to silently re-break.


Summary

The core logic improvements are sound — _extract_compare_assets, the asset_set.intersection guard, and the list-skipping in base_row are all clear improvements. The critical issue is verifying the assetUrls type assumption. The is_compare detection by asset count is a correctness concern for 2-asset Ranking cases. Everything else is lower priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant