Skip to content

feat(audience): add locate / line / transcription qualification examples#578

Open
RapidPoseidon wants to merge 1 commit intomainfrom
poseidon/audience-example-types
Open

feat(audience): add locate / line / transcription qualification examples#578
RapidPoseidon wants to merge 1 commit intomainfrom
poseidon/audience-example-types

Conversation

@RapidPoseidon
Copy link
Copy Markdown
Contributor

Summary

The Rapidata dashboard now offers Locate / Line / Transcription audience types alongside the existing Classify and Compare (app-frontend#2342, with the LineExampleTruth multi-box backend support landing in rapidata-backend#3992 + rapidata-backend#4051). The Python SDK only had typed helpers for Classify and Compare — adding the missing three so the SDK matches the dashboard's surface and the dashboard's "Create it with Python" snippet on those forms can point at a real public method.

What's new

Three typed methods on RapidataAudience, each calling through to a matching method on AudienceExampleHandler:

audience.add_locate_example(
    instruction="Click the dog's face",
    truth=[Box(x_min=0.3, y_min=0.2, x_max=0.55, y_max=0.45)],
    datapoint="https://assets.rapidata.ai/dog.jpg",
)

audience.add_line_example(
    instruction="Trace the horizon line",
    datapoint="https://assets.rapidata.ai/landscape.jpg",
    truth=[Box(x_min=0.0, y_min=0.45, x_max=1.0, y_max=0.55)],  # optional
)

audience.add_transcription_example(
    instruction="Select the words actually spoken",
    sentence="the quick brown fox jumps over the lazy dog",
    truth=[1, 2, 3],  # word indices
    datapoint="https://assets.rapidata.ai/fox.mp3",
)

All three:

  • mirror the keyword-arg shape of the existing add_classification_example / add_compare_example (same context, media_context, explanation, settings),
  • compute a sensible randomCorrectProbability (calculate_boxes_coverage(truth) for Locate, same for Line when boxes are given, 0.1 flat for Line when not, len(correct) / len(total) for Transcription),
  • run through _try_start_recruiting after the example is added.

Refactors / housekeeping

  • validation.rapids.box: extracted calculate_boxes_coverage from the private RapidsManager._calculate_boxes_coverage so both the rapid and audience layers can reuse the union-area sweep-line. The old method on RapidsManager is kept as a thin alias so any external callers (and any docs that reference it) keep working.
  • IExampleTruthLineExampleTruth codegen now carries boundingBoxes / requiredPrecision / requiredCompleteness (matching prod after rapidata-backend#4051). The local generator-cli can't materialise Java in this dev environment, so the model file was hand-patched in the same shape the generator would produce; the next CI regen will overwrite it idempotently.
  • Schema files refreshed from prod via bash openapi/generate-schema.sh.

Docs

  • docs/audiences.md: added a method-picker table for all five example types and a runnable snippet showing the three new helpers. Updated the existing 'settings' note to reflect that all add_*_example helpers accept settings, not just classify/compare.

Test plan

  • pyright src/rapidata/rapidata_client — no new errors above main's baseline (the 3 remaining errors are pre-existing missing-import warnings for pandas / pydantic in this isolated environment).
  • Smoke: audience.add_locate_example(...) against a staging audience, verify the resulting rapid carries a LocateBoxTruth with the expected boxes and requiredPrecision / requiredCompleteness propagated.
  • Smoke: audience.add_line_example(..., truth=None) and audience.add_line_example(..., truth=[Box(...)]) and verify both round-trip — the first produces an empty LineTruth, the second a BoundingBoxTruth (per the rapidata-backend#4051 mapping).
  • Smoke: audience.add_transcription_example(...) and verify correctWords reach the rapid as TranscriptionTruth.correctWords.

🔗 Session: session-2de1ad62

Match the dashboard's audience-creation flow — `Audience` now exposes a
typed helper for each of the new audience example types backed by
rapidata-backend#3992 and rapidata-backend#4051.

Public API on `RapidataAudience`:
- `add_locate_example(instruction, truth: list[Box], datapoint, ...)`
  — labelers click inside the correct region(s) on an image. Random
  baseline is the union-area of `truth`.
- `add_line_example(instruction, datapoint, truth: list[Box] | None, ...)`
  — labelers draw a line. When `truth` is omitted the example is scored
  by annotator consensus (empty `LineExampleTruth` on the wire); when
  provided the line points are graded against the union of the boxes
  exactly like Locate.
- `add_transcription_example(instruction, sentence, truth: list[int],
  datapoint, ...)` — labelers pick the correct words out of a spoken
  transcription. `truth` is a list of word indices (0-based).

All three forward to the existing `AudienceExampleHandler`, mirror the
existing `add_classification_example` / `add_compare_example` signature
shape (same context / media_context / explanation / settings keyword
args), and run through `_try_start_recruiting` once at least one
example exists.

Refactors:
- Extracted `RapidsManager._calculate_boxes_coverage` to a module-level
  `calculate_boxes_coverage` in `validation.rapids.box` so the audience
  example handler can compute the same union-area random baseline that
  the rapid layer does. The original method on `RapidsManager` is kept
  as a thin alias for backwards compatibility.
- Regenerated openapi schemas + clients against prod so the new
  optional fields on `IExampleTruthLineExampleTruth` (boundingBoxes,
  requiredPrecision, requiredCompleteness) are typed correctly. The
  generated model file was patched by hand because the local
  generator-cli run can't materialise Java; the next CI regen will
  produce the same code.

Docs:
- `docs/audiences.md`: added a method-pick-table covering all five
  example types and a runnable snippet showing Locate / Line /
  Transcription. Updated the settings note to reflect that all
  `add_*_example` helpers accept `settings`.

Tests:
- pyright on `rapidata_client` reports no new errors over the main
  baseline (the 3 remaining errors are pre-existing missing-import
  warnings for pandas / pydantic at lint time).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: manuel <manuel@rapidata.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant