diff --git a/CHANGELOG.md b/CHANGELOG.md index d165dbb..b965f6d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] ### Added +- **PersonResolver — one answer to "who is this?"** (`dotty-behaviour/household/resolver.py`) — identity resolution was smeared across consumers, and the 2026-06-06 audit found four separate identity bugs because of it. All resolution now funnels through one module with `Person.id` as the canonical key space. Fixed by the consolidation: **room_view roster recognition silently failing whenever `id != display_name`** (the VLM echoes display names; validation compared ids — confirmed 3/3, both the core and greeter paths), **multi-word display names never matching** (the NAME parser was single-token, so "Mary Anne" was a 100% silent miss — confirmed 3/3), **the greeter's calendar lookup dropping a person's own events on a case mismatch** (`[Hudson]` ≠ `hudson` — confirmed 2/3), and **bracketless `calendar_prefix:` YAML never matching**. `summarize_for_prompt` now matches person tags case-insensitively and accepts the resolver's tag set; the room_view test fakes were also fixed to carry real ids (the old fakes re-derived ids from display names, which is exactly what masked the bug). - **Bridge systemd unit loads API keys from `${BRIDGE_DIR}/.env`** (#15) — `zeroclaw-bridge.service.template` and `scripts/install-bridge.sh` now emit `EnvironmentFile=-${BRIDGE_DIR}/.env`. `install-bridge.sh` creates a mode-0600 stub `.env` containing `OPENROUTER_API_KEY=` (and commented `VISION_API_KEY` / `VLM_API_KEY` placeholders) when one isn't already present, so the missing-vision-key failure surfaces as the bridge's existing ERROR ("camera offline") instead of a silent confabulation. Existing `.env` files are preserved. ### Changed diff --git a/CLAUDE.md b/CLAUDE.md index eae237d..78f2b1e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -199,3 +199,17 @@ For hardware specs, protocol details, model internals, latent capabilities, and - xiaozhi-esp32 firmware (upstream): https://github.com/78/xiaozhi-esp32 - StackChan (hardware + firmware patches): https://github.com/m5stack/StackChan - Emotion protocol: https://xiaozhi.dev/en/docs/development/emotion/ + +## Agent skills + +### Issue tracker + +Issues live as GitHub issues on `BrettKinny/dotty-stackchan` (the `origin` remote), managed via the `gh` CLI. See `docs/agents/issue-tracker.md`. + +### Triage labels + +Five canonical triage roles use their default label strings (`needs-triage`, `needs-info`, `ready-for-agent`, `ready-for-human`, `wontfix`), orthogonal to the existing `status:*` / `area:*` labels. See `docs/agents/triage-labels.md`. + +### Domain docs + +Single-context: one `CONTEXT.md` + `docs/adr/` at the repo root. See `docs/agents/domain.md`. diff --git a/docs/agents/domain.md b/docs/agents/domain.md new file mode 100644 index 0000000..a4f449f --- /dev/null +++ b/docs/agents/domain.md @@ -0,0 +1,38 @@ +# Domain Docs + +How the engineering skills should consume this repo's domain documentation when exploring the codebase. + +This is a **single-context** repo: one `CONTEXT.md` + `docs/adr/` at the repo root cover the whole Dotty stack. (Note: `CLAUDE.md` at the root is the always-on architecture reference and the authoritative starting point regardless — `CONTEXT.md` is the lazily-grown domain glossary the producer skills maintain.) + +## Before exploring, read these + +- **`CONTEXT.md`** at the repo root. +- **`docs/adr/`** — read ADRs that touch the area you're about to work in. + +If any of these files don't exist, **proceed silently**. Don't flag their absence; don't suggest creating them upfront. The producer skill (`/grill-with-docs`) creates them lazily when terms or decisions actually get resolved. + +## File structure + +Single-context repo: + +``` +/ +├── CLAUDE.md ← always-on architecture reference (already exists) +├── CONTEXT.md ← domain glossary (created lazily by /grill-with-docs) +├── docs/adr/ ← architectural decision records (created lazily) +│ ├── 0001-....md +│ └── 0002-....md +└── ... +``` + +## Use the glossary's vocabulary + +When your output names a domain concept (in an issue title, a refactor proposal, a hypothesis, a test name), use the term as defined in `CONTEXT.md`. Don't drift to synonyms the glossary explicitly avoids. + +If the concept you need isn't in the glossary yet, that's a signal — either you're inventing language the project doesn't use (reconsider) or there's a real gap (note it for `/grill-with-docs`). + +## Flag ADR conflicts + +If your output contradicts an existing ADR, surface it explicitly rather than silently overriding: + +> _Contradicts ADR-0007 (...) — but worth reopening because…_ diff --git a/docs/agents/issue-tracker.md b/docs/agents/issue-tracker.md new file mode 100644 index 0000000..afd0d0f --- /dev/null +++ b/docs/agents/issue-tracker.md @@ -0,0 +1,32 @@ +# Issue tracker: GitHub + +Issues and PRDs for this repo live as GitHub issues on `BrettKinny/dotty-stackchan`. Use the `gh` CLI for all operations. + +## Conventions + +- **Create an issue**: `gh issue create --title "..." --body "..."`. Use a heredoc for multi-line bodies. +- **Read an issue**: `gh issue view --comments`, filtering comments by `jq` and also fetching labels. +- **List issues**: `gh issue list --state open --json number,title,body,labels,comments --jq '[.[] | {number, title, body, labels: [.labels[].name], comments: [.comments[].body]}]'` with appropriate `--label` and `--state` filters. +- **Comment on an issue**: `gh issue comment --body "..."` +- **Apply / remove labels**: `gh issue edit --add-label "..."` / `--remove-label "..."` +- **Close**: `gh issue close --comment "..."` + +Infer the repo from `git remote -v` — `gh` does this automatically when run inside a clone. Note this clone has two remotes: `origin` → `BrettKinny/dotty-stackchan` (the canonical repo) and `fork` → `pboushy/dotty-stackchan`. Target `origin` unless told otherwise (e.g. `gh issue list --repo BrettKinny/dotty-stackchan`). + +## Existing label conventions + +This repo already uses workflow labels that the triage labels complement, not replace: + +- `status:active`, `status:bench-pending`, `status:blocked`, `status:speculative` — lifecycle state (see the `dotty-task-runner` skill). +- `area:firmware`, `area:bridge`, `area:xiaozhi`, `area:dashboard`, `area:infra`, `area:docs`, `area:behaviour` — which part of the stack. +- `safety` — child-safety / correctness bug. + +Keep applying these alongside the triage roles in `docs/agents/triage-labels.md`. + +## When a skill says "publish to the issue tracker" + +Create a GitHub issue. + +## When a skill says "fetch the relevant ticket" + +Run `gh issue view --comments`. diff --git a/docs/agents/triage-labels.md b/docs/agents/triage-labels.md new file mode 100644 index 0000000..f22fe45 --- /dev/null +++ b/docs/agents/triage-labels.md @@ -0,0 +1,19 @@ +# Triage Labels + +The skills speak in terms of five canonical triage roles. This file maps those roles to the actual label strings used in this repo's issue tracker. + +| Label in mattpocock/skills | Label in our tracker | Meaning | +| -------------------------- | -------------------- | ---------------------------------------- | +| `needs-triage` | `needs-triage` | Maintainer needs to evaluate this issue | +| `needs-info` | `needs-info` | Waiting on reporter for more information | +| `ready-for-agent` | `ready-for-agent` | Fully specified, ready for an AFK agent | +| `ready-for-human` | `ready-for-human` | Requires human implementation | +| `wontfix` | `wontfix` | Will not be actioned | + +`wontfix` already exists in this repo. The other four are created on first use (`gh label create ` or auto-created when applied via the API). + +These triage roles are **orthogonal** to the repo's existing `status:*` lifecycle labels (`status:active`, `status:bench-pending`, `status:blocked`, `status:speculative`) and `area:*` labels — apply both as appropriate rather than treating them as alternatives. + +When a skill mentions a role (e.g. "apply the AFK-ready triage label"), use the corresponding label string from this table. + +Edit the right-hand column to match whatever vocabulary you actually use. diff --git a/dotty-behaviour/calendar_/cache.py b/dotty-behaviour/calendar_/cache.py index f94189d..d0b659c 100644 --- a/dotty-behaviour/calendar_/cache.py +++ b/dotty-behaviour/calendar_/cache.py @@ -23,6 +23,11 @@ class Event(TypedDict): _EMAIL_RE = re.compile(r"\b[\w.+-]+@[\w-]+(?:\.[\w-]+)+\b") + + +def _fold_person_tag(value: str) -> str: + """Case/whitespace-fold a person tag for comparison.""" + return " ".join((value or "").lower().split()) _ISO_TS_RE = re.compile( r"\b\d{4}-\d{2}-\d{2}" r"(?:T\d{2}:\d{2}(?::\d{2})?(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)?\b" @@ -54,7 +59,7 @@ def bucket_by_person(events: list[Event]) -> dict[str, list[Event]]: def summarize_for_prompt( events: list[Event], *, - person: str | None = None, + person: str | set[str] | None = None, include_household: bool = True, household_bucket: str = "_household", ) -> list[str]: @@ -63,11 +68,25 @@ def summarize_for_prompt( Strips ISO timestamps, emails, calendar IDs. Emits only short `HH:MM summary` / `all-day summary` strings. Every prompt / response that surfaces calendar data MUST go through here. + + `person` filters to one person's events. It accepts either a single + name or a set of equivalent tags (id, display name, calendar + prefix — see PersonResolver.calendar_tags). Matching is case- and + whitespace-insensitive: the event's person comes from a free-typed + `[Name]` title prefix, so `[Hudson]` must match identity `hudson` + (audit 2026-06-06: the old exact compare dropped a person's own + events from their greeting). """ + if person is None: + wanted: set[str] | None = None + elif isinstance(person, str): + wanted = {_fold_person_tag(person)} + else: + wanted = {_fold_person_tag(p) for p in person} out: list[str] = [] for ev in events: - if person is not None: - if ev["person"] != person and not ( + if wanted is not None: + if _fold_person_tag(ev["person"]) not in wanted and not ( include_household and ev["person"] == household_bucket ): continue @@ -77,7 +96,7 @@ def summarize_for_prompt( clean_summary = " ".join(clean_summary.split()) if not clean_summary: continue - if ev["person"] != household_bucket and person is None: + if ev["person"] != household_bucket and wanted is None: tag = f"[{ev['person']}] " else: tag = "" diff --git a/dotty-behaviour/greeter/calendar_facade.py b/dotty-behaviour/greeter/calendar_facade.py index 47128d3..cb4779e 100644 --- a/dotty-behaviour/greeter/calendar_facade.py +++ b/dotty-behaviour/greeter/calendar_facade.py @@ -32,7 +32,7 @@ def summarize_for_prompt( self, events: list[Any], *, - person: str | None = None, + person: str | set[str] | None = None, include_household: bool = True, ) -> list[str]: return summarize_for_prompt( diff --git a/dotty-behaviour/greeter/greeter.py b/dotty-behaviour/greeter/greeter.py index 4c2da85..fe3df20 100644 --- a/dotty-behaviour/greeter/greeter.py +++ b/dotty-behaviour/greeter/greeter.py @@ -32,6 +32,7 @@ from typing import Any, Awaitable, Callable, Optional from zoneinfo import ZoneInfo +from household import PersonResolver from perception import PerceptionEvent, PerceptionState log = logging.getLogger("dotty-behaviour.greeter") @@ -80,6 +81,7 @@ def __init__( self._tts = tts_pusher self._kid_mode = kid_mode_provider self._household = household_registry + self._resolver = PersonResolver(household_registry) self._clock = clock tz_name = os.environ.get("TZ", "Australia/Brisbane") try: @@ -255,8 +257,16 @@ async def _generate_greeting( events_summary: list[str] = [] try: events = self._calendar.get_events() + # `identity` is a canonical person id; calendar events carry + # the free-typed `[Name]` title prefix. The resolver expands + # the id to every tag that means this person (id, display + # name, configured calendar_prefix) so their own events + # survive the case/name-space gap (audit 2026-06-06). + person_tags = self._resolver.calendar_tags(identity) events_summary = self._calendar.summarize_for_prompt( - events, person=identity, include_household=True, + events, + person=person_tags or identity, + include_household=True, ) or [] except Exception: log.warning( @@ -326,16 +336,9 @@ def _build_prompt( ) def _lookup_person(self, identity: str) -> Any: - if not self._household or not identity or identity == "unknown": - return None - try: - return self._household.get(identity) - except Exception: - log.debug( - "greeter: household lookup failed for %s", - identity, exc_info=True, - ) - return None + # PersonResolver owns the id lookup (case fold, unknown/empty + # handling, exception safety). + return self._resolver.resolve(identity) @staticmethod def _post_process(text: str) -> str: diff --git a/dotty-behaviour/household/__init__.py b/dotty-behaviour/household/__init__.py index 73edb69..aa6ad2e 100644 --- a/dotty-behaviour/household/__init__.py +++ b/dotty-behaviour/household/__init__.py @@ -10,10 +10,12 @@ HouseholdRegistry, Person, ) +from .resolver import PersonResolver __all__ = [ "DEFAULT_HOUSEHOLD_PATH", "DEFAULT_PERSON_FALLBACK", "HouseholdRegistry", "Person", + "PersonResolver", ] diff --git a/dotty-behaviour/household/registry.py b/dotty-behaviour/household/registry.py index 0a1b415..3dd0754 100644 --- a/dotty-behaviour/household/registry.py +++ b/dotty-behaviour/household/registry.py @@ -189,13 +189,13 @@ def roster_ids_with_appearance(self) -> set[str]: def get_by_calendar_prefix(self, prefix: str) -> Optional[Person]: """Look up a person by their `[Name]` calendar prefix. - Case-insensitive; brackets optional.""" + Case-insensitive; brackets optional — in the query AND in the + YAML (`calendar_prefix: Brett` and `calendar_prefix: "[Brett]"` + both work; the old code only matched the bracketed form).""" self._reload_if_changed() - if not prefix: + key = _normalise_calendar_prefix(prefix) + if not key: return None - key = prefix.strip().lower() - if not key.startswith("["): - key = f"[{key}]" person_id = self._by_prefix.get(key) return self._people.get(person_id) if person_id else None @@ -291,7 +291,9 @@ def _reload(self) -> None: continue people[person.id] = person if person.calendar_prefix: - by_prefix[person.calendar_prefix.strip().lower()] = person.id + key = _normalise_calendar_prefix(person.calendar_prefix) + if key: + by_prefix[key] = person.id for phrase in person.self_id_phrases: norm = phrase.strip().lower() if norm: @@ -379,6 +381,16 @@ def _parse_person(raw_id: str, entry: Any) -> Optional[Person]: ) +def _normalise_calendar_prefix(value: str) -> str: + """Canonical form for calendar-prefix keys: lowercase, trimmed, + brackets stripped. Used for both YAML storage and lookups so the + two can never disagree on bracket handling.""" + key = (value or "").strip().lower() + if key.startswith("[") and key.endswith("]"): + key = key[1:-1].strip() + return key + + def _opt_str(v: Any) -> Optional[str]: if v is None: return None diff --git a/dotty-behaviour/household/resolver.py b/dotty-behaviour/household/resolver.py new file mode 100644 index 0000000..1679d53 --- /dev/null +++ b/dotty-behaviour/household/resolver.py @@ -0,0 +1,154 @@ +"""PersonResolver — the single answer to "who is this?". + +Identity-bearing strings reach dotty-behaviour from three directions, +each speaking its own name space: + + * the room_view VLM reply — a display name (possibly multi-word, any + case, maybe trailing punctuation), + * perception-bus ``identity`` fields — canonical person ids, + * calendar ``[Name]`` title prefixes — free-typed by humans. + +Before this module each consumer re-implemented its own mapping, and +each grew its own bug: roster recognition silently failing whenever +``id != display_name`` (twice), multi-word display names never +matching, and the greeter's calendar lookup dropping a person's own +events on a case mismatch (AUDIT-REPORT 2026-06-06). All resolution now +funnels through here. The canonical key space is ``Person.id`` +(lowercase) — resolve once at the edge, pass ids everywhere else. + +Stateless and cheap: every call delegates to the (hot-reloading) +HouseholdRegistry, so a household.yaml edit is picked up without a +restart, same as the registry itself. +""" +from __future__ import annotations + +import logging +import re +from typing import Iterable, Optional + +from .registry import HouseholdRegistry, Person + +log = logging.getLogger("dotty-behaviour.household.resolver") + +_TRAILING_PUNCT_RE = re.compile(r"[\s.!?,:;]+$") + + +def _fold(value: str) -> str: + """Case- and whitespace-fold a name for comparison.""" + return " ".join((value or "").lower().split()) + + +def _strip_brackets(value: str) -> str: + v = (value or "").strip() + if v.startswith("[") and v.endswith("]"): + v = v[1:-1].strip() + return v + + +class PersonResolver: + """Maps identity-bearing strings onto household ``Person`` records. + + Construct with a HouseholdRegistry (or None — every resolve then + returns None, the same graceful degrade as an empty registry). + """ + + def __init__(self, registry: Optional[HouseholdRegistry]) -> None: + self._registry = registry + + # ------------------------------------------------------------------ + # Canonical-id lookup + # ------------------------------------------------------------------ + def resolve(self, identity: Optional[str]) -> Optional[Person]: + """Look up a canonical person id (case-folded). ``unknown`` and + empty strings resolve to None.""" + if self._registry is None or not identity: + return None + ident = identity.strip() + if not ident or ident.lower() == "unknown": + return None + try: + return self._registry.get(ident) + except Exception: + log.debug("resolve(%r) raised", identity, exc_info=True) + return None + + # ------------------------------------------------------------------ + # VLM name-space (room_view replies) + # ------------------------------------------------------------------ + def resolve_vlm_name(self, name: Optional[str]) -> Optional[Person]: + """Map a room_view ``NAME:`` token back to a Person. + + The prompt offers the VLM *display names*, so the reply must be + matched against display_name as well as id — case- and + whitespace-folded, trailing punctuation tolerated, multi-word + names supported. + """ + if self._registry is None or not name: + return None + folded = _fold(_TRAILING_PUNCT_RE.sub("", name)) + if not folded or folded == "unknown": + return None + person = self.resolve(folded) + if person is not None: + return person + for p in self._iter_people(): + if _fold(p.display_name) == folded: + return p + return None + + # ------------------------------------------------------------------ + # Calendar name-space ([Name] title prefixes) + # ------------------------------------------------------------------ + def resolve_calendar_tag(self, tag: Optional[str]) -> Optional[Person]: + """Map a calendar ``[Name]`` prefix (brackets optional) to a + Person: explicit ``calendar_prefix`` first, then id, then + display name.""" + if self._registry is None or not tag: + return None + bare = _strip_brackets(tag) + if not bare: + return None + try: + person = self._registry.get_by_calendar_prefix(bare) + except Exception: + log.debug("get_by_calendar_prefix(%r) raised", tag, exc_info=True) + person = None + if person is not None: + return person + person = self.resolve(bare) + if person is not None: + return person + folded = _fold(bare) + for p in self._iter_people(): + if _fold(p.display_name) == folded: + return p + return None + + def calendar_tags(self, identity: Optional[str]) -> set[str]: + """Folded tag set that refers to ``identity`` in calendar events + — feed to ``summarize_for_prompt(person=...)`` so a person's own + events match whether the human typed ``[hudson]``, ``[Hudson]``, + or the configured ``calendar_prefix``. Unknown identities fall + back to ``{folded identity}`` (today's exact-string behaviour, + minus the case sensitivity).""" + person = self.resolve(identity) + if person is None: + folded = _fold(identity or "") + return {folded} if folded and folded != "unknown" else set() + tags = {person.id, _fold(person.display_name)} + if person.calendar_prefix: + tags.add(_fold(_strip_brackets(person.calendar_prefix))) + return {t for t in tags if t} + + # ------------------------------------------------------------------ + def _iter_people(self) -> Iterable[Person]: + if self._registry is None: + return () + try: + return self._registry.iter() + except Exception: + log.debug("registry.iter() raised", exc_info=True) + return () + + +__all__ = ["PersonResolver"] diff --git a/dotty-behaviour/routes/vision.py b/dotty-behaviour/routes/vision.py index 462f9ee..6b67a96 100644 --- a/dotty-behaviour/routes/vision.py +++ b/dotty-behaviour/routes/vision.py @@ -29,7 +29,7 @@ import config from dispatch import VLMClient -from household import HouseholdRegistry +from household import HouseholdRegistry, PersonResolver from perception import PerceptionEvent, PerceptionState from vision.room_view import ( ROOM_VIEW_NO_PERSON, @@ -188,18 +188,17 @@ async def vision_explain( state.state.setdefault(device_id, {})["last_room_view_capture_t"] = ( now_wall ) - roster_ids = ( - household.roster_ids_with_appearance() - if household is not None else set() - ) raw = await vlm.describe_image( b64_image, room_view_question, system_prompt=ROOM_VIEW_SYSTEM_PROMPT, timeout_s=config.VISION_TIMEOUT_SEC, ) + # The VLM echoes back a *display name* from the prompt; the + # resolver maps it to the canonical Person.id that everything + # downstream (bus identity, registry.get, greeters) keys on. parsed_desc, room_match_person_id, parsed_mood = ( - parse_room_view_response(raw, roster_ids) + parse_room_view_response(raw, PersonResolver(household)) ) description = parsed_desc or ROOM_VIEW_NO_PERSON effective_question = room_view_question diff --git a/dotty-behaviour/tests/test_calendar.py b/dotty-behaviour/tests/test_calendar.py index acd5f59..8ba6197 100644 --- a/dotty-behaviour/tests/test_calendar.py +++ b/dotty-behaviour/tests/test_calendar.py @@ -99,6 +99,29 @@ def test_summarize_for_prompt_includes_household_for_person() -> None: assert any("brett thing" in line for line in out) +def test_summarize_for_prompt_person_match_is_case_insensitive() -> None: + # Audit 2026-06-06: the event person comes from a free-typed + # `[Hudson]` title prefix while identities are lowercase ids — the + # old exact compare dropped a person's own events. + events = [_event(person="Hudson", summary="library day")] + out = summarize_for_prompt(events, person="hudson") + assert any("library day" in line for line in out) + + +def test_summarize_for_prompt_accepts_tag_set() -> None: + # PersonResolver.calendar_tags hands over a set of equivalent tags + # (id, display name, calendar prefix) — any of them must match. + events = [ + _event(person="Mum", summary="yoga"), + _event(person="hudson", summary="hudson thing"), + ] + out = summarize_for_prompt( + events, person={"maryanne", "mary anne", "mum"}, + ) + assert any("yoga" in line for line in out) + assert all("hudson thing" not in line for line in out) + + def test_summarize_for_prompt_skips_household_when_excluded() -> None: events = [ _event(person="brett", summary="brett thing"), diff --git a/dotty-behaviour/tests/test_greeter.py b/dotty-behaviour/tests/test_greeter.py index ab1b010..d5a9e3c 100644 --- a/dotty-behaviour/tests/test_greeter.py +++ b/dotty-behaviour/tests/test_greeter.py @@ -122,6 +122,54 @@ async def body() -> None: asyncio.run(go()) +def test_greeting_prompt_includes_own_calendar_events_despite_case() -> None: + # Audit 2026-06-06 (confirmed 2/3): identity is a lowercase person id + # ("hudson") but the calendar event's person tag comes from the + # `[Hudson]` title prefix uncased — the old exact compare dropped the + # person's own events from their greeting prompt. + async def go() -> None: + with tempfile.TemporaryDirectory() as td: + tdp = Path(td) + household = _household_with( + tdp, + """ +people: + hudson: + display_name: Hudson +""", + ) + cache = CalendarCache() + cache.events = [ + Event( + person="Hudson", time="10:00", summary="library day", + start_iso="2026-06-11T10:00:00", calendar_id="cal", + ) + ] + llm, llm_calls = _llm_factory("Morning Hudson, library day!") + tts = _RecordingTTS() + g, state = _make( + tdp, llm=llm, tts=tts, household=household, cache=cache, + ) + + async def body() -> None: + state.broadcast( + PerceptionEvent( + device_id="dev-1", + name="face_recognized", + data={"identity": "hudson"}, + ts=time.time(), + ) + ) + await let_consumer_settle() + await asyncio.sleep(0.02) + assert len(llm_calls) == 1 + assert "library day" in llm_calls[0] + + await _drive(g, body) + + asyncio.run(go()) + + def test_unknown_face_skipped_by_default() -> None: async def go() -> None: with tempfile.TemporaryDirectory() as td: diff --git a/dotty-behaviour/tests/test_household_registry.py b/dotty-behaviour/tests/test_household_registry.py index a1332e7..0188003 100644 --- a/dotty-behaviour/tests/test_household_registry.py +++ b/dotty-behaviour/tests/test_household_registry.py @@ -77,6 +77,25 @@ def test_lookup_by_calendar_prefix() -> None: assert r.get_by_calendar_prefix("nobody") is None +def test_lookup_by_calendar_prefix_bracketless_yaml() -> None: + # Audit 2026-06-06: `calendar_prefix: Brett` (no brackets) was stored + # bare but looked up bracketed, so it never matched. Both YAML forms + # must behave identically now. + with tempfile.TemporaryDirectory() as td: + path = _write_yaml( + Path(td), + """ +people: + brett: + display_name: Brett + calendar_prefix: Brett +""", + ) + r = HouseholdRegistry(path=path) + assert r.get_by_calendar_prefix("Brett").id == "brett" # type: ignore[union-attr] + assert r.get_by_calendar_prefix("[Brett]").id == "brett" # type: ignore[union-attr] + + def test_match_self_id_strips_leading_punct() -> None: with tempfile.TemporaryDirectory() as td: path = _write_yaml( diff --git a/dotty-behaviour/tests/test_person_resolver.py b/dotty-behaviour/tests/test_person_resolver.py new file mode 100644 index 0000000..8b4bb54 --- /dev/null +++ b/dotty-behaviour/tests/test_person_resolver.py @@ -0,0 +1,135 @@ +"""PersonResolver tests — the single seam mapping identity-bearing +strings (VLM names, bus identities, calendar tags) onto Person.id. + +Built on the real HouseholdRegistry + a temp household.yaml so the +resolver is exercised against the registry's actual case-folding and +prefix-normalisation behaviour, not a re-implementation (the audit +found the old room_view test fake masked the id/display_name bug by +re-implementing the helper). +""" + +from __future__ import annotations + +import tempfile +from pathlib import Path + +from household import HouseholdRegistry, PersonResolver + +_YAML = """ +people: + brett: + display_name: Brett + appearance: tall, dark hair + calendar_prefix: "[Brett]" + dad: + display_name: Greg + appearance: grey beard + maryanne: + display_name: Mary Anne + appearance: curly hair + calendar_prefix: Mum +""" + + +def _resolver(yaml_text: str = _YAML) -> PersonResolver: + td = Path(tempfile.mkdtemp(prefix="dotty-resolver-")) + path = td / "household.yaml" + path.write_text(yaml_text, encoding="utf-8") + return PersonResolver(HouseholdRegistry(path=path)) + + +# --------------------------------------------------------------------------- +# resolve — canonical-id lookup +# --------------------------------------------------------------------------- + + +def test_resolve_by_id_case_folded() -> None: + r = _resolver() + assert r.resolve("brett").id == "brett" # type: ignore[union-attr] + assert r.resolve("BRETT").id == "brett" # type: ignore[union-attr] + assert r.resolve(" brett ").id == "brett" # type: ignore[union-attr] + + +def test_resolve_unknown_and_empty_return_none() -> None: + r = _resolver() + assert r.resolve("unknown") is None + assert r.resolve("UNKNOWN") is None + assert r.resolve("") is None + assert r.resolve(None) is None + assert r.resolve("nobody") is None + + +def test_resolve_with_no_registry_returns_none() -> None: + r = PersonResolver(None) + assert r.resolve("brett") is None + assert r.resolve_vlm_name("Brett") is None + assert r.resolve_calendar_tag("[Brett]") is None + assert r.calendar_tags("brett") == {"brett"} + + +# --------------------------------------------------------------------------- +# resolve_vlm_name — room_view NAME tokens +# --------------------------------------------------------------------------- + + +def test_vlm_name_matches_display_name_when_id_differs() -> None: + # id "dad", display "Greg" — the audit's silent-miss case. + r = _resolver() + assert r.resolve_vlm_name("Greg").id == "dad" # type: ignore[union-attr] + + +def test_vlm_name_matches_multi_word_display_name() -> None: + r = _resolver() + assert r.resolve_vlm_name("Mary Anne").id == "maryanne" # type: ignore[union-attr] + # Case + internal-whitespace folding + assert r.resolve_vlm_name("mary anne").id == "maryanne" # type: ignore[union-attr] + + +def test_vlm_name_tolerates_trailing_punctuation() -> None: + r = _resolver() + assert r.resolve_vlm_name("Greg.").id == "dad" # type: ignore[union-attr] + assert r.resolve_vlm_name("Mary Anne! ").id == "maryanne" # type: ignore[union-attr] + + +def test_vlm_name_id_takes_priority_then_unknown_is_none() -> None: + r = _resolver() + assert r.resolve_vlm_name("brett").id == "brett" # type: ignore[union-attr] + assert r.resolve_vlm_name("unknown") is None + assert r.resolve_vlm_name("Alice") is None + assert r.resolve_vlm_name("") is None + + +# --------------------------------------------------------------------------- +# resolve_calendar_tag / calendar_tags — [Name] prefixes +# --------------------------------------------------------------------------- + + +def test_calendar_tag_resolves_prefix_id_and_display() -> None: + r = _resolver() + # Explicit calendar_prefix (bracketed in YAML, any query form) + assert r.resolve_calendar_tag("[Brett]").id == "brett" # type: ignore[union-attr] + assert r.resolve_calendar_tag("brett").id == "brett" # type: ignore[union-attr] + # Bare calendar_prefix in YAML ("Mum") — audit F832's broken case + assert r.resolve_calendar_tag("[Mum]").id == "maryanne" # type: ignore[union-attr] + assert r.resolve_calendar_tag("mum").id == "maryanne" # type: ignore[union-attr] + # Fallbacks: id, then display name + assert r.resolve_calendar_tag("[dad]").id == "dad" # type: ignore[union-attr] + assert r.resolve_calendar_tag("[Greg]").id == "dad" # type: ignore[union-attr] + assert r.resolve_calendar_tag("[Granny]") is None + + +def test_calendar_tags_expand_id_display_and_prefix() -> None: + r = _resolver() + assert r.calendar_tags("maryanne") == {"maryanne", "mary anne", "mum"} + assert r.calendar_tags("dad") == {"dad", "greg"} + # The case-mismatch bug: identity "brett" must match a "[Brett]" tag + # after both sides fold. + assert "brett" in r.calendar_tags("brett") + + +def test_calendar_tags_unknown_identity_falls_back_to_folded_self() -> None: + r = _resolver() + assert r.calendar_tags("Visitor") == {"visitor"} + assert r.calendar_tags("unknown") == set() + assert r.calendar_tags("") == set() + assert r.calendar_tags(None) == set() diff --git a/dotty-behaviour/tests/test_routes_vision.py b/dotty-behaviour/tests/test_routes_vision.py index fa80627..29b3dab 100644 --- a/dotty-behaviour/tests/test_routes_vision.py +++ b/dotty-behaviour/tests/test_routes_vision.py @@ -188,13 +188,19 @@ def test_unregister_vision_waiter_after_signal_is_idempotent() -> None: @dataclass class _FakePerson: + id: str display_name: str appearance: str = "" @dataclass class _FakeHousehold: - """Minimal HouseholdRegistry stand-in for room_view tests.""" + """Minimal HouseholdRegistry stand-in for room_view tests. + + Carries real `id` fields distinct from display names — the audit + found the previous fake derived ids as `display_name.lower()`, + which masked the id-vs-display-name parsing bug in production. + """ people: list[_FakePerson] = field(default_factory=list) @@ -207,10 +213,16 @@ def render_roster_for_vlm(self, *, max_line_chars: int = 80) -> str: def iter(self): # noqa: A003 — matches HouseholdRegistry method return tuple(self.people) + def get(self, person_id: str): # matches HouseholdRegistry method + if not person_id: + return None + for p in self.people: + if p.id == person_id.lower(): + return p + return None + def roster_ids_with_appearance(self) -> set[str]: - return { - p.display_name.lower() for p in self.people if p.appearance - } + return {p.id for p in self.people if p.appearance} def _override_household(client: TestClient, fake: _FakeHousehold) -> None: @@ -230,8 +242,8 @@ def test_room_view_sentinel_matches_roster_and_broadcasts_face_recognized() -> N ) _override_vlm(client, fake_vlm) _override_household(client, _FakeHousehold(people=[ - _FakePerson("Brett", appearance="tall, dark hair, goatee"), - _FakePerson("Hudson", appearance="small child, blond"), + _FakePerson("brett", "Brett", appearance="tall, dark hair, goatee"), + _FakePerson("hudson", "Hudson", appearance="small child, blond"), ])) state = client.app.state.perception @@ -293,7 +305,7 @@ def test_room_view_no_match_does_not_broadcast() -> None: ) _override_vlm(client, fake_vlm) _override_household(client, _FakeHousehold(people=[ - _FakePerson("Brett", appearance="tall, dark hair"), + _FakePerson("brett", "Brett", appearance="tall, dark hair"), ])) state = client.app.state.perception @@ -330,7 +342,7 @@ def test_room_view_cooldown_blocks_second_call() -> None: ) _override_vlm(client, fake_vlm) _override_household(client, _FakeHousehold(people=[ - _FakePerson("Brett", appearance="tall, dark hair"), + _FakePerson("brett", "Brett", appearance="tall, dark hair"), ])) state = client.app.state.perception diff --git a/dotty-behaviour/tests/test_vision_room_view.py b/dotty-behaviour/tests/test_vision_room_view.py index dc0091f..e99ee06 100644 --- a/dotty-behaviour/tests/test_vision_room_view.py +++ b/dotty-behaviour/tests/test_vision_room_view.py @@ -5,6 +5,7 @@ from dataclasses import dataclass from typing import Iterable, Optional +from household import PersonResolver from vision.room_view import ( ROOM_VIEW_MOODS, ROOM_VIEW_SENTINEL, @@ -16,6 +17,7 @@ @dataclass class _FakePerson: + id: str display_name: str appearance: Optional[str] = None @@ -31,6 +33,14 @@ def render_roster_for_vlm(self, *, max_line_chars: int = 80) -> str: def iter(self): # noqa: A003 — matches HouseholdRegistry's method return tuple(self.people) + def get(self, person_id: str): # matches HouseholdRegistry's method + if not person_id: + return None + for p in self.people: + if p.id == person_id.lower(): + return p + return None + # --------------------------------------------------------------------------- # Sentinel / constants @@ -75,9 +85,9 @@ def test_build_question_substitutes_roster_and_name_choices() -> None: reg = _FakeRegistry( roster=" Brett: tall, dark hair\n Hudson: small child, blond", people=( - _FakePerson("Brett", appearance="tall, dark hair"), - _FakePerson("Hudson", appearance="small child, blond"), - _FakePerson("Ghost", appearance=""), # no appearance — excluded + _FakePerson("brett", "Brett", appearance="tall, dark hair"), + _FakePerson("hudson", "Hudson", appearance="small child, blond"), + _FakePerson("ghost", "Ghost", appearance=""), # no appearance — excluded ), ) q = build_room_view_question(reg) @@ -106,23 +116,34 @@ def iter(self): # --------------------------------------------------------------------------- -_ROSTER = {"brett", "hudson"} +# Roster deliberately includes: +# * dad — id != lowercased display name (the audit's silent-miss case) +# * maryanne — multi-word display name (the audit's 100%-miss case) +_RESOLVER = PersonResolver(_FakeRegistry( # type: ignore[arg-type] + roster="(unused)", + people=( + _FakePerson("brett", "Brett", appearance="tall"), + _FakePerson("hudson", "Hudson", appearance="small child"), + _FakePerson("dad", "Greg", appearance="grey beard"), + _FakePerson("maryanne", "Mary Anne", appearance="curly hair"), + ), +)) def test_parse_empty_string_returns_all_none() -> None: - assert parse_room_view_response("", _ROSTER) == (None, None, None) + assert parse_room_view_response("", _RESOLVER) == (None, None, None) def test_parse_whitespace_only_returns_all_none() -> None: - assert parse_room_view_response(" \n ", _ROSTER) == (None, None, None) + assert parse_room_view_response(" \n ", _RESOLVER) == (None, None, None) def test_parse_no_one_in_view_returns_all_none() -> None: - assert parse_room_view_response("no one in view", _ROSTER) == ( + assert parse_room_view_response("no one in view", _RESOLVER) == ( None, None, None, ) # Case-insensitive + tolerates surrounding noise - assert parse_room_view_response("No one in view.", _ROSTER) == ( + assert parse_room_view_response("No one in view.", _RESOLVER) == ( None, None, None, ) @@ -132,18 +153,45 @@ def test_parse_valid_format_with_roster_match() -> None: "DESC: adult with goatee and dark sweater | " "NAME: Brett | MOOD: engaged" ) - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "adult with goatee and dark sweater" assert pid == "brett" assert mood == "engaged" +def test_parse_resolves_display_name_to_canonical_id() -> None: + # Audit 2026-06-06 (confirmed 3/3): the prompt vocabulary is display + # names but validation used ids, so id != display_name was a silent + # miss. The VLM says "Greg"; the canonical id is "dad". + raw = "DESC: adult with grey beard | NAME: Greg | MOOD: neutral" + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) + assert desc == "adult with grey beard" + assert pid == "dad" + assert mood == "neutral" + + +def test_parse_matches_multi_word_display_name() -> None: + # Audit 2026-06-06 (confirmed 3/3): the old single-token NAME regex + # could never match "Mary Anne". + raw = "DESC: woman with curly hair | NAME: Mary Anne | MOOD: excited" + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) + assert desc == "woman with curly hair" + assert pid == "maryanne" + assert mood == "excited" + + +def test_parse_name_match_is_case_insensitive() -> None: + raw = "DESC: small child | NAME: HUDSON | MOOD: engaged" + _, pid, _ = parse_room_view_response(raw, _RESOLVER) + assert pid == "hudson" + + def test_parse_valid_format_with_unknown_name_strips_identity() -> None: raw = ( "DESC: stranger in a red jacket | " "NAME: unknown | MOOD: neutral" ) - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "stranger in a red jacket" assert pid is None assert mood == "neutral" @@ -154,7 +202,7 @@ def test_parse_valid_format_with_off_roster_name_strips_identity() -> None: "DESC: young woman with curly hair | " "NAME: Alice | MOOD: excited" ) - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "young woman with curly hair" assert pid is None # Alice not in roster assert mood == "excited" @@ -162,7 +210,7 @@ def test_parse_valid_format_with_off_roster_name_strips_identity() -> None: def test_parse_format_mismatch_falls_back_to_description_only() -> None: raw = "I see a person sitting at a desk reading a book." - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "I see a person sitting at a desk reading a book." assert pid is None assert mood is None @@ -170,7 +218,7 @@ def test_parse_format_mismatch_falls_back_to_description_only() -> None: def test_parse_invalid_mood_drops_mood_keeps_match() -> None: raw = "DESC: small child | NAME: Hudson | MOOD: chaotic" - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "small child" assert pid == "hudson" assert mood is None @@ -179,7 +227,7 @@ def test_parse_invalid_mood_drops_mood_keeps_match() -> None: def test_parse_missing_mood_field_still_returns_match() -> None: # Older replies / non-conforming models may omit MOOD entirely. raw = "DESC: small child | NAME: Hudson" - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "small child" assert pid == "hudson" assert mood is None @@ -191,7 +239,7 @@ def test_parse_trailing_punctuation_tolerated() -> None: # before ` | MOOD: ...`) is NOT supported; the comment in bridge.py # was wishful — keep the test honest. raw = "DESC: tall adult | NAME: Brett." - desc, pid, mood = parse_room_view_response(raw, _ROSTER) + desc, pid, mood = parse_room_view_response(raw, _RESOLVER) assert desc == "tall adult" assert pid == "brett" assert mood is None diff --git a/dotty-behaviour/vision/room_view.py b/dotty-behaviour/vision/room_view.py index fda0213..04b3a9b 100644 --- a/dotty-behaviour/vision/room_view.py +++ b/dotty-behaviour/vision/room_view.py @@ -13,7 +13,7 @@ import logging import re -from typing import Iterable, Optional, Protocol +from typing import Any, Iterable, Optional, Protocol log = logging.getLogger("dotty-behaviour.vision.room_view") @@ -69,9 +69,12 @@ # Parser regex. Anchored at start, allows whitespace flexibility, and # tolerates trailing punctuation around the name (e.g. `NAME: Hudson.`). +# The NAME group permits internal spaces/apostrophes — the prompt offers +# display names, and "Mary Anne" must parse (audit 2026-06-06: the old +# single-token pattern made multi-word members a 100% silent miss). _ROOM_VIEW_RESP_RE = re.compile( r"^\s*DESC:\s*(?P.+?)\s*" - r"\|\s*NAME:\s*(?P[A-Za-z_][A-Za-z0-9_-]*)\s*" + r"\|\s*NAME:\s*(?P[A-Za-z_][A-Za-z0-9_' -]*?)\s*" # Accept ANY single-word MOOD value here so an out-of-vocab reply # ("chaotic") still parses the desc + name cleanly — the parser # validates the vocab and drops invalid moods to None. @@ -95,6 +98,13 @@ def render_roster_for_vlm(self, *, max_line_chars: int = ...) -> str: ... def iter(self) -> Iterable: ... # noqa: A003 — matches registry method +class _NameResolverLike(Protocol): + """Structural shape we need from household.PersonResolver — the one + method that maps a VLM NAME token back to a Person (or None).""" + + def resolve_vlm_name(self, name: str) -> Optional[Any]: ... + + def build_room_view_question( registry: Optional[_RegistryLike], ) -> Optional[str]: @@ -126,15 +136,23 @@ def build_room_view_question( def parse_room_view_response( raw: str, - roster_ids: set[str], + resolver: _NameResolverLike, ) -> tuple[Optional[str], Optional[str], Optional[str]]: """Parse the VLM's room_view reply into `(description, person_id, mood)`. + The returned person_id is CANONICAL (`Person.id`) — the NAME token + the VLM echoes back is a *display name* from the prompt vocabulary, + and `resolver.resolve_vlm_name()` owns the mapping back (case fold, + multi-word names, id-vs-display-name). The old set-membership check + compared display names against ids, so any member whose id differed + from their lowercased display name was a silent miss (audit + 2026-06-06, confirmed 3/3). + Behaviour: * Empty input → (None, None, None) * "no one in view" sentinel → (None, None, None) - * Format match + name in roster → (desc, person_id, mood_or_None) - * Format match + name == "unknown" or off-roster → (desc, None, mood_or_None) + * Format match + name resolves → (desc, person_id, mood_or_None) + * Format match + name == "unknown" or unresolvable → (desc, None, mood_or_None) * Format mismatch → (raw_stripped, None, None) — graceful degrade to v1 behaviour so we never lose the description signal even when the model deviates from the requested format. @@ -157,11 +175,18 @@ def parse_room_view_response( # v1 path so a botched format never costs us the description. return cleaned, None, None desc = m.group("desc").strip() - name = m.group("name").strip().lower() + name = m.group("name").strip() raw_mood = (m.group("mood") or "").strip().lower() mood = raw_mood if raw_mood in ROOM_VIEW_MOODS else None if not desc: desc = None - if name == "unknown" or name not in roster_ids: + if name.lower() == "unknown": + return desc, None, mood + try: + person = resolver.resolve_vlm_name(name) + except Exception: + log.exception("room_view: name resolution raised for %r", name) + person = None + if person is None: return desc, None, mood - return desc, name, mood + return desc, person.id, mood