Skip to content

refactor(sdk): SdkEventKind enum + typed discriminated SdkEvent — kill stringly-typed event boundaries #148

@hadamrd

Description

@hadamrd

Why

The hot-fix in #147 patched a two-field-name mismatch that broke critic + PO + brainstormer SDK paths in production. The bug shape:

# Producer (worker SDK):
emit({"kind": "final_result", ...})

# Consumer (critic SDK):
if event.get("type") == "result":  # WRONG: both field name AND value

This is exactly what enums and typed events prevent. Quality manifesto cites "no stringly-typed boundaries" — yet the SDK event boundary is the most-stringly-typed surface in the codebase.

The dogfood-week catch list (three iteration-probe bugs #97/#120/#128, the brainstormer empty-output bug fixed in #147) ALL had the same shape: state name as a string literal compared by == with no type-checker assistance.

What

  1. New forge_loop/_sdk_events.py (or extend events.py): SdkEventKind(str, Enum) enumerating EVERY kind the worker SDK emits.
    • ASSISTANT_TEXT
    • ASSISTANT_THINKING
    • TOOL_USE
    • TOOL_RESULT
    • SYSTEM_MESSAGE
    • TURN_START
    • WORKER_MCP_FILTERED
    • FINAL_RESULT
    • COST_TELEMETRY
    • ERROR
  2. _worker_sdk.emit emits typed SdkEvent (Pydantic discriminated union) instead of raw dicts.
  3. _critic_sdk._capture and every other consumer reads event.kind is SdkEventKind.FINAL_RESULT (identity check, type-checker enforced).
  4. SDKRunResult field names canonicalized — final_result_text is the source of truth; remove the legacy last_message field; consumers read result.final_result_text.
  5. Typed Pydantic discriminated union: SdkEvent = Annotated[Union[AssistantTextEvent, ToolUseEvent, ...], Field(discriminator="kind")].

Acceptance

  • No string-literal kind checks anywhere in _critic_sdk / _worker_sdk / _brainstormer_sdk.
  • mypy strict on these modules passes (after the broader strict cascade).
  • Every consumer uses the enum.
  • The hot-fix hot-fix(critic-sdk): event capture field-name mismatch broke brainstormer #147's regression tests still pass — but using the enum constants instead of string literals.
  • Manifesto-driven: critic flags any new string-literal kind comparison as sev1.

Quality manifesto rule this ticket adds

No stringly-typed cross-module event boundaries. If module A emits an event consumed by module B, the event KIND is an enum imported from a shared module. String comparisons on dynamic event keys are sev1.

Test plan

  • Refactor existing tests to use enums (regression-pin the enum names).
  • New adversarial test: a producer typo SdkEventKind("typo") fails at construction (str enums are validated).
  • Property test: every kind defined in the enum has a corresponding typed Event subclass.

File pointers

  • src/forge_loop/_sdk_events.py (new — the enum + discriminated union)
  • src/forge_loop/_worker_sdk.py — emit typed
  • src/forge_loop/_critic_sdk.py — consume typed
  • src/forge_loop/_brainstormer_sdk.py — consume typed
  • tests/test_sdk_events.py (new)

Metadata

Metadata

Assignees

No one assigned

    Labels

    loop:readyLoop runner will autonomously attempt this issuepriority:p1Important, near-term

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions