fix(results): correct column generation in compare/ranking to_pandas#582
fix(results): correct column generation in compare/ranking to_pandas#582RapidPoseidon wants to merge 1 commit intomainfrom
Conversation
The compare/ranking branch of `RapidataResults.to_pandas` had four column-generation bugs: - `split_details=True` skipped the compare path entirely, leaving per-asset metric dicts (`aggregatedResults` etc.) as raw dict values in single columns instead of split into `A_`/`B_` columns. - The asset list was inferred from "the first dict with >=2 entries", so a `privateMetadata` dict with multiple keys hijacked detection and zeroed every metric column. Asset detection now prefers `assetUrls` and falls back to `aggregatedResults`. - Ranking results with 3+ assets silently dropped data past index 1. Ranking now emits one `<asset>_<metric>` column per asset. - `detailedResults` (a list) leaked into the standard dataframe as a list-valued column. Lists are now excluded from the row body and only consumed when `split_details=True`. Metric dicts that don't share any key with the asset list (e.g. `privateMetadata`) are also skipped instead of producing `A_<key>=None` columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: lino <lino@rapidata.ai>
Code ReviewGood fix for a set of real, reproducible bugs. The PR description is excellent — the bug table and implementation notes make the intent clear. Here are my findings: Critical
asset_urls = result.get("assetUrls")
if isinstance(asset_urls, dict) and asset_urls:
return [k for k in asset_urls.keys() if k not in excluded]The code assumes if isinstance(asset_urls, list) and asset_urls:
return [k for k in asset_urls if k not in excluded]Please verify the actual wire format before merging. Important
is_compare = len(assets) == 2A Ranking with exactly 2 candidates produces is_compare = self["info"].get("orderType") == "Compare"Default changed from # Before
row[f"A_{key}"] = values.get(asset, 0)
# After
base_row[f"A_{key}"] = values.get(assets[0]) # None if key absent
Silent row skipping with no warning (line 236) if len(assets) < 2:
continueResults with fewer than 2 assets are silently dropped. This can produce a DataFrame with fewer rows than Minor
If the first result lacks Ranking metric column naming (line 276) For ranking with N>2 assets, asset identifiers are stored as TestsThere are no committed unit tests. Given the PR fixes four distinct bugs, each with specific triggering conditions ( SummaryThe core logic improvements are sound — |
Summary
RapidataResults.to_pandas()had four column-generation bugs in the compare/ranking branch. All four are fixed in this PR; positional A/B asset assignment is left as-is per backend ordering guarantees.split_details=Truebypassed the compare path. Asset metrics stayed as raw dict values in single columns.aggregatedResults: {a:0, b:3}(one column)A_aggregatedResults=0, B_aggregatedResults=3(split + per-vote rows)privateMetadatawith 2+ keys hijacked detection and zeroed every metric column.assetA='campaign_id', A_aggregatedResults=0assetA='aurora.png', A_aggregatedResults=3A_*/B_*columns<asset>_<metric>columnsdetailedResultsleaked into the dataframe as a list-valued column.detailedResultscolumn with nested dictssplit_details=TrueImplementation notes
_extract_compare_assetshelper resolves the asset list deterministically: preferassetUrls(the canonical list emitted by the backend), fall back toaggregatedResults(withBoth/Neitherfiltered).to_pandasnow routes Compare/Ranking through the compare-aware path even whensplit_details=True, so A/B (or per-asset) splitting is preserved alongside per-vote row expansion.privateMetadata,summary, …) out of the metric columns instead of producingA_<key>=Nonenoise.detailedResultsis consumed exclusively by thesplit_details=Truebranch.Test plan
Reproduced each bug against the current
mainand confirmed the fix on the new branch with the following inputs (run as a standalone script importingrapidata_results.pydirectly):detailedResultsno longer in columns;A_/B_columns generated as expectedprivateMetadata={...}(≥2 keys) precedingassetUrls→assetA/assetBcorrectly resolved; metric columns no longer all 0split_details=True→A_/B_columns retained, one row per detailed vote,votedFor/userDetails_*flattened inBoth_<metric>/Neither_<metric>columns still emitted (regression)<asset>_<metric>columns; no data droppedsplit_details=Truewith nodetailedResultsfield still raisesValueError(regression)pyright src/rapidata/rapidata_client/results/rapidata_results.py→ 0 errors, 0 warnings🔗 Session: https://session-aaa156f7.poseidon.rapidata.internal/