Skip to content

Windows: lone-surrogate UnicodeEncodeError crashes every autosave (shell.py text writes need errors="replace") #97

@DogmaLabsTech

Description

@DogmaLabsTech

Summary

On Windows, the PostToolUse autosave crashes on every run with:

UnicodeEncodeError: 'utf-8' codec can't encode character '\udc8f' ... surrogates not allowed
  File ".../pipeline/shell.py", line 166, in cmd_parse_haiku
    f.write(r.text)

Because the save dies before save-position ever runs, last-save.json never advances, so post-tool-hook.sh re-fires the save on nearly every tool call. On a busy session that means hundreds of 0-byte logs in .remember/logs/autonomous/, a logs/hook-errors.log full of identical tracebacks, and remember.md / now.md getting clobbered with failed-save noise instead of real handoffs.

Environment

  • Windows 11, Git Bash (MSYS), Python 3.13
  • remember plugin v0.7.3 (claude-plugins-official marketplace)

Root cause

The model response text read from the pipe (raw = sys.stdin.read()_parse_response) can contain lone surrogates (\udcXX) — the standard artefact of surrogateescape decoding of non-UTF-8 bytes on Windows shells. Writing that text to a file opened with strict encoding="utf-8" raises UnicodeEncodeError. Every text-output write in shell.py is exposed:

  • cmd_extract — extract temp file
  • cmd_build_prompt / cmd_build_ndc_prompt — prompt temp file
  • cmd_parse_haiku — haiku temp file and output_file ← the observed crash
  • cmd_consolidate — recent/archive temp files

Repro (deterministic, no API needed)

import tempfile, os
s = "summary\udc8f with \udc9d lone surrogates"
fd, p = tempfile.mkstemp()
with open(p, "w", encoding="utf-8") as f:   # current behaviour
    f.write(s)                               # -> UnicodeEncodeError

Fix

Add errors="replace" to the text-mode writes in shell.py (alternatively errors="surrogatepass" to round-trip the bytes, or sanitize r.text at the source in _parse_response). Representative diff:

-    with os.fdopen(fd, "w", encoding="utf-8") as f:
+    with os.fdopen(fd, "w", encoding="utf-8", errors="replace") as f:
         f.write(r.text)
@@
-        with open(output_file, "w", encoding="utf-8") as f:
+        with open(output_file, "w", encoding="utf-8", errors="replace") as f:
             f.write(r.text)

Applied to all text-output writes: extract, build-prompt, build-ndc-prompt, parse-haiku (×2), consolidate (×2).

Note on a config-only mitigation

There isn't one on a box without jq: config() in scripts/log.sh only reads config.json when real jq is on PATH, otherwise it silently falls back to defaults. So raising thresholds.delta_lines_trigger to throttle the spam doesn't work without first installing jq. The encoding fix is the only reliable path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions