diff --git a/app.yaml b/app.yaml index f7fad526d..023fb1f15 100644 --- a/app.yaml +++ b/app.yaml @@ -6,11 +6,25 @@ build_env_variables: instance_class: F4 automatic_scaling: max_concurrent_requests: 100 + # Cost cap: bounds worst-case fan-out if a render storm or crash loop drives + # autoscaling (issue #694). 8 F4 instances is far above routine load; raise + # deliberately if organic traffic ever needs it. Mirror this into + # .app.prod.yaml -- scripts/validate-app-prod-config.mjs enforces it there. + max_instances: 8 handlers: - url: /static static_dir: public/static secure: always + http_headers: + # The embeddable web component (sd-component.js) is hotlinked from + # third-party origins (issue #688); its engine worker's WASM fetch (and + # any @font-face/CSS asset loads) are cross-origin CORS requests against + # these assets. They are public, immutable, content-hashed files served + # without credentials, so a wildcard origin is appropriate and adds no + # risk. Mirror this into .app.prod.yaml -- + # scripts/validate-app-prod-config.mjs enforces it there. + Access-Control-Allow-Origin: "*" - url: /$ static_files: public/index.html diff --git a/docs/dev/deploy.md b/docs/dev/deploy.md index 439ee399f..87e56ffdf 100644 --- a/docs/dev/deploy.md +++ b/docs/dev/deploy.md @@ -1,6 +1,6 @@ # Deploying to production -**Last reviewed:** 2026-05-08 +**Last reviewed:** 2026-07-01 The web app at `app.simlin.com` runs on Google App Engine standard. GAE serves the static React SPA built from `src/app` and runs the Express backend in `src/server` (Firebase Auth, models persisted in Firestore as protobuf). `@simlin/mcp`, `@simlin/serve`, `pysimlin`, and `simlin-cli` are released separately to npm/PyPI -- they aren't part of this deploy. @@ -27,6 +27,7 @@ export NODE_ENV=production pnpm clean # cargo clean + each package's clean script pnpm build # pnpm -r run build: Rust+WASM, then every TS package pnpm --filter @simlin/app run deploy:assemble # copy build/ and build-component/ into public/; drop symlinks +scripts/check-upload-file-count.sh . # abort if the upload set would trip GAE's 10,000-file cap gcloud app deploy ./.app.prod.yaml # upload the repo minus .gcloudignore, switch traffic # A bash trap runs the cleanup below on EXIT/INT/TERM, even if any step above fails: pnpm --filter @simlin/app run deploy:clean # git checkout the symlinks and index.html; rm build artifacts @@ -103,6 +104,8 @@ On the instance GAE runs `pnpm install`, then the root `start` script: `node src - `runtime`: `nodejs24`. (`nodejs18` is EOL on GAE; `nodejs16`, the runtime of the last deploy before May 2026, is gone entirely -- which is why there's no "redeploy the old commit" rollback.) - `build_env_variables.GOOGLE_NODE_RUN_SCRIPTS`: `''`. The local deploy script already runs the monorepo build and stages the exact artifact to upload; this prevents App Engine's Node buildpack from running the root `build` script again during staging. +- `automatic_scaling.max_instances`: `8`. Cost cap so a render storm or crash loop can't fan out F4 instances without bound (issue #694). **Mirror this into `.app.prod.yaml`** -- both deploy scripts run `scripts/validate-app-prod-config.mjs`, which fails the deploy if it's missing or not a positive integer. Raise it deliberately if organic traffic ever needs more instances. +- The `/static` handler's `http_headers` with `Access-Control-Allow-Origin: "*"`. Third-party pages hotlink `sd-component.js`, and its engine worker's WASM fetch (plus fonts/CSS assets) are cross-origin CORS requests against `/static` (issue #688); without this header embeds load their data but the engine never initializes. The assets are public, immutable, and served without credentials, so the wildcard adds no risk. **Mirror this into `.app.prod.yaml`** -- `scripts/validate-app-prod-config.mjs` fails the deploy if the header is missing. - The handler list: `/static`, `/`, `/new`, `/legal*`, `/privacy`, `robots.txt`, `ads.txt`, favicon, `manifest.json`, then `/.*` -> `script: auto`. The `/` and `/new` static HTML handlers carry CSP/HSTS headers because they bypass Express Helmet. The SPA's dynamic routes like `/:username/:projectName` fall through to `/.*`, i.e. the Express server. - `env_variables`: the committed `app.yaml` has none; the server needs a couple (next section). They live in `.app.prod.yaml`. @@ -122,10 +125,10 @@ The server loads `config/default.json`, then `config/production.json` when `NODE ## Pre-deploy checklist - [ ] CI is green on the commit you're deploying. -- [ ] `git status` is clean. +- [ ] `git status` is clean. Note this only makes the tracked-`public/` mutation and cleanup reliable -- it does NOT bound what gets uploaded. The upload set is whatever `.gcloudignore` leaves in, so gitignored/untracked junk (venvs, nested cargo `target/` dirs, scratch checkouts) still counts toward GAE Standard's hard 10,000-file cap (issue #695). Both deploy scripts run [`scripts/check-upload-file-count.sh`](/scripts/check-upload-file-count.sh) immediately before `gcloud app deploy` and abort with a per-top-level-directory breakdown if the cap would be hit; fix by deleting the offending dirs or adding them to `.gcloudignore`. - [ ] `gcloud config get-value project` is the production project. - [ ] `gcloud app versions list --service=default` shows a known-good current version -- note its ID, that's your rollback target. (If GAE has garbage-collected it, there is no rollback; see below.) -- [ ] `.app.prod.yaml` reconciled against `app.yaml` (`runtime: nodejs24`, `build_env_variables.GOOGLE_NODE_RUN_SCRIPTS: ''`, handlers, `authentication__seshcookie__key` set to the value already in use). +- [ ] `.app.prod.yaml` reconciled against `app.yaml` (`runtime: nodejs24`, `build_env_variables.GOOGLE_NODE_RUN_SCRIPTS: ''`, `automatic_scaling.max_instances: 8`, handlers including the `/static` `Access-Control-Allow-Origin: "*"` header, `authentication__seshcookie__key` set to the value already in use). - [ ] `wasm-opt --version` works; `rustup show` lists the `wasm32-unknown-unknown` target. ## Post-deploy smoke test @@ -133,8 +136,11 @@ The server loads `config/default.json`, then `config/production.json` when `NODE Against the `--no-promote` version URL, then again on production: - `curl -sI https:///` -> 200 HTML. View source: it links a hashed `/static/js/index..js` (literal `<%= PUBLIC_URL %>` means the build was skipped) and `/static/css/index..css`. +- `curl -s https:///healthz` -> 200 `ok`. This is the only check that exercises the Node server: `/` is a GAE static handler and stays green even when every Express instance is crash-looping (e.g. `ServerInitError`). A WASM preload failure aborts boot before the route mounts, so it shows up as a non-responding instance (connection failure / GAE 5xx), not a 503 -- treat any non-200 here as down. (The route's 503 branch is defense-in-depth, not the expected failure signal.) - `curl -sI https:///static/js/sd-component.js` -> 200 -- the embeddable web component; external sites `` from a different origin (e.g. `python3 -m http.server` on localhost) and confirm the diagram renders and simulates with no console errors. - `curl -sI` on `/robots.txt`, `/manifest.json`, `/favicon.ico`, `/legal/`, `/privacy/` -> 200; `curl -I http:///` -> 301 to https. - Browser: log in with Google, land on Home, no console errors. - New-user flow: sign in with a fresh account, claim a username, confirm the example projects appear and one opens and simulates. @@ -161,5 +167,5 @@ The `frontend` job in [`.github/workflows/ci.yaml`](/.github/workflows/ci.yaml) Things to know that don't have a clean fix yet: - `pnpm deploy:web` deploys from the workspace root, so GAE's Node buildpack installs the *whole workspace's* dependency set on the instance -- `@rsbuild/*`, `jest`, `slate`, `radix`, rspress, vite, and every other package's deps (~590 MB / 1171 packages), none needed by the server at runtime. App Engine standard always reinstalls from the deployed `package.json` + lockfile and has no vendored-`node_modules` escape hatch, so the only lever is the deployed manifest. The smaller-deploy fix is implemented as **`pnpm deploy:web:staged`** (see below); it is locally proven but still pending a real `gcloud --no-promote` test, so `deploy:web` remains the default. Tracked in [docs/tech-debt.md](/docs/tech-debt.md) "Web deploy uploads the whole monorepo and GAE installs the full dep set". -- Server-side PNG preview (`src/server/render.ts`) parses and rasterizes user-uploaded models in-process with no size cap beyond the 10 MB request body limit and no timeout. -- There's no error reporting or alerting. Cloud Logging and the GAE metrics dashboard are it. +- Server-side PNG preview (`src/server/render.ts`) renders user-uploaded models in per-request `worker_threads` workers (each with its own WASM instance) with a 10 s total wall-clock budget per request (queue wait included) and at most 2 concurrent renders -- restoring the isolation the 2022 deploy had (issue #694). What remains rough: there's no model-complexity cap below the 10 MB request body limit, so a pathological model still costs a bounded 10 s worker per attempt before failing with a 500. +- There's no error reporting or alerting. Cloud Logging and the GAE metrics dashboard are it. The Express `/healthz` route exists as an uptime-check target (see the smoke test above), but no Cloud Monitoring notification channel, uptime check, or alerting policy points at it yet -- that ops-side setup is tracked in [issue #693](https://github.com/bpowers/simlin/issues/693). diff --git a/scripts/build-deploy-staging.mjs b/scripts/build-deploy-staging.mjs index e27d39f55..eb7dba434 100644 --- a/scripts/build-deploy-staging.mjs +++ b/scripts/build-deploy-staging.mjs @@ -240,6 +240,12 @@ function verify(stagingDir) { }; check(fs.existsSync(path.join(stagingDir, 'lib/index.js')), 'lib/index.js missing'); + // render.ts spawns this sibling via __dirname at runtime (issue #694); a + // deploy without it 500s every preview while everything else looks healthy. + check( + fs.existsSync(path.join(stagingDir, 'lib/render-worker.js')), + 'lib/render-worker.js missing (preview renders would 500 at runtime)', + ); check(fs.existsSync(path.join(stagingDir, 'config/production.json')), 'config/production.json missing'); check( fs.existsSync(path.join(stagingDir, 'default_projects')) && diff --git a/scripts/check-upload-file-count.sh b/scripts/check-upload-file-count.sh new file mode 100755 index 000000000..c86f278d2 --- /dev/null +++ b/scripts/check-upload-file-count.sh @@ -0,0 +1,57 @@ +#!/usr/bin/env bash +# +# Pre-flight gate: fail if the gcloud upload set for DIR meets or exceeds +# App Engine Standard's hard per-deploy file cap (10,000 files). +# +# Why this exists (issue #695): `gcloud app deploy` uploads everything that +# .gcloudignore does not exclude, which is INDEPENDENT of git tracking +# status. A `git status`-clean tree can still hold gitignored artifacts +# (cargo target dirs, Python venvs, third_party checkouts) or brand-new +# untracked scratch dirs that blow the cap -- and without this gate the +# failure only surfaces inside `gcloud app deploy`, after the expensive +# clean+build. `gcloud meta list-files-for-upload` is the only check that +# reflects the real upload set, so we run it once here, immediately before +# the deploy, and turn a late upload failure into a fast, actionable error. +# +# Usage: check-upload-file-count.sh DIR [MAX_FILES] +# MAX_FILES defaults to 10000 (the GAE Standard cap). It is overridable +# so the failure path can be exercised by hand against a small directory +# without building a 10k-file fixture. + +set -euo pipefail + +if [ "$#" -lt 1 ] || [ "$#" -gt 2 ]; then + echo "usage: $0 DIR [MAX_FILES]" >&2 + exit 2 +fi + +DIR="$1" +MAX_FILES="${2:-10000}" + +UPLOAD_LIST="$(mktemp)" +trap 'rm -f "$UPLOAD_LIST"' EXIT + +# Enumerate exactly once: on a polluted tree this walk covers 100k+ files +# and is the slow part, so both the count and the per-directory breakdown +# below are derived from this single capture. Running from inside DIR makes +# gcloud emit paths relative to DIR, which the breakdown's `cut` relies on. +(cd "$DIR" && gcloud meta list-files-for-upload .) > "$UPLOAD_LIST" + +UPLOAD_COUNT="$(wc -l < "$UPLOAD_LIST" | tr -d '[:space:]')" + +if [ "$UPLOAD_COUNT" -ge "$MAX_FILES" ]; then + { + echo "ERROR: gcloud would upload $UPLOAD_COUNT files from $DIR, but App Engine" + echo " Standard rejects deploys of $MAX_FILES files or more." + echo "" + echo "The upload set is whatever .gcloudignore leaves in -- a clean 'git status'" + echo "does NOT bound it (gitignored and untracked files still upload). Largest" + echo "top-level directories in the upload set; delete the junk ones or add them" + echo "to .gcloudignore:" + echo "" + cut -d/ -f1 "$UPLOAD_LIST" | sort | uniq -c | sort -rn | head -10 + } >&2 + exit 1 +fi + +echo " upload set: $UPLOAD_COUNT files (cap: $MAX_FILES)" diff --git a/scripts/deploy-web-staged.sh b/scripts/deploy-web-staged.sh index cab12c233..4ca44d089 100755 --- a/scripts/deploy-web-staged.sh +++ b/scripts/deploy-web-staged.sh @@ -77,6 +77,13 @@ bash "$REPO_ROOT/scripts/verify-deploy-build.sh" echo "==> Assembling self-contained server staging dir (scripts/build-deploy-staging.mjs)" node "$REPO_ROOT/scripts/build-deploy-staging.mjs" "$STAGING_DIR" "$REPO_ROOT/.app.prod.yaml" +# The staging dir is bounded by construction (build-deploy-staging.mjs copies +# an explicit file list), so this gate is cheap here -- it exists to catch a +# regression in the staging assembly (e.g. accidentally vendoring a +# node_modules tree) before the upload starts. See issue #695. +echo "==> Checking upload file count against the GAE 10k cap (scripts/check-upload-file-count.sh)" +bash "$REPO_ROOT/scripts/check-upload-file-count.sh" "$STAGING_DIR" + echo "==> gcloud app deploy $STAGING_DIR/app.yaml" gcloud app deploy "$STAGING_DIR/app.yaml" "$@" diff --git a/scripts/deploy-web.sh b/scripts/deploy-web.sh index 722f99992..97a926301 100755 --- a/scripts/deploy-web.sh +++ b/scripts/deploy-web.sh @@ -70,6 +70,14 @@ pnpm build echo "==> Staging app build into public/ (pnpm --filter @simlin/app run deploy:assemble)" pnpm --filter @simlin/app run deploy:assemble +# Gate on the real upload set right before the deploy: this deploy uploads +# from the repo root, where the upload set is whatever .gcloudignore leaves +# in -- independent of git status, and including files the build steps above +# just created. Failing here (instead of inside gcloud app deploy) names the +# offending directories and still runs the cleanup trap. See issue #695. +echo "==> Checking upload file count against the GAE 10k cap (scripts/check-upload-file-count.sh)" +bash "$REPO_ROOT/scripts/check-upload-file-count.sh" "$REPO_ROOT" + echo "==> gcloud app deploy ./.app.prod.yaml" gcloud app deploy "$REPO_ROOT/.app.prod.yaml" "$@" diff --git a/scripts/tests/validate-app-prod-config.test.mjs b/scripts/tests/validate-app-prod-config.test.mjs index f9c9d1959..d86375c6d 100644 --- a/scripts/tests/validate-app-prod-config.test.mjs +++ b/scripts/tests/validate-app-prod-config.test.mjs @@ -3,6 +3,29 @@ import { describe, it } from 'node:test'; import { validateAppProdConfig } from '../validate-app-prod-config.mjs'; +const MAX_INSTANCES_MESSAGE = + 'automatic_scaling.max_instances must be set to a positive integer (cost cap; mirror the committed app.yaml)'; + +const STATIC_CORS_MESSAGE = + 'handlers must include a /static handler with http_headers.Access-Control-Allow-Origin set to "*" (cross-origin embeds, issue #688; mirror the committed app.yaml)'; + +// Every fixture that isn't specifically exercising the max_instances check +// carries this block so its expected messages stay focused on one concern. +const scalingBlock = ` +automatic_scaling: + max_instances: 8 +`; + +// Likewise for fixtures not exercising the /static CORS check. +const staticHandlerBlock = ` +handlers: +- url: /static + static_dir: public/static + secure: always + http_headers: + Access-Control-Allow-Origin: "*" +`; + const validConfig = ` runtime: nodejs24 @@ -12,7 +35,7 @@ build_env_variables: env_variables: NODE_ENV: production authentication__seshcookie__key: production-secret -`; +${scalingBlock}${staticHandlerBlock}`; function messagesFor(source) { return validateAppProdConfig(source, '.app.prod.yaml').map((error) => error.message); @@ -27,6 +50,7 @@ describe('validateAppProdConfig', () => { const messages = messagesFor(` # build_env_variables.GOOGLE_NODE_RUN_SCRIPTS: '' # env_variables.authentication__seshcookie__key: production-secret +# automatic_scaling.max_instances: 8 runtime: nodejs24 `); @@ -34,6 +58,8 @@ runtime: nodejs24 'build_env_variables.GOOGLE_NODE_RUN_SCRIPTS must be set to an empty string', 'env_variables.NODE_ENV must be set to production', 'env_variables.authentication__seshcookie__key must be set to the existing production session key', + MAX_INSTANCES_MESSAGE, + STATIC_CORS_MESSAGE, ]); }); @@ -43,7 +69,7 @@ env_variables: NODE_ENV: production GOOGLE_NODE_RUN_SCRIPTS: '' authentication__seshcookie__key: production-secret -`); +${scalingBlock}${staticHandlerBlock}`); assert.deepEqual(messages, ['build_env_variables.GOOGLE_NODE_RUN_SCRIPTS must be set to an empty string']); }); @@ -55,7 +81,7 @@ build_env_variables: env_variables: NODE_ENV: production authentication__seshcookie__key: production-secret -`); +${scalingBlock}${staticHandlerBlock}`); assert.deepEqual(messages, ['build_env_variables.GOOGLE_NODE_RUN_SCRIPTS must be set to an empty string']); }); @@ -67,7 +93,7 @@ build_env_variables: authentication__seshcookie__key: production-secret env_variables: NODE_ENV: production -`); +${scalingBlock}${staticHandlerBlock}`); assert.deepEqual(messages, [ 'env_variables.authentication__seshcookie__key must be set to the existing production session key', @@ -83,7 +109,7 @@ build_env_variables: env_variables: NODE_ENV: production authentication__seshcookie__key: ${value} -`), +${scalingBlock}${staticHandlerBlock}`), ['env_variables.authentication__seshcookie__key must be set to the existing production session key'], ); } @@ -95,7 +121,7 @@ build_env_variables: GOOGLE_NODE_RUN_SCRIPTS: '' env_variables: authentication__seshcookie__key: production-secret -`); +${scalingBlock}${staticHandlerBlock}`); assert.deepEqual(messages, ['env_variables.NODE_ENV must be set to production']); }); @@ -108,19 +134,119 @@ build_env_variables: NODE_ENV: production env_variables: authentication__seshcookie__key: production-secret -`, +${scalingBlock}${staticHandlerBlock}`, ` build_env_variables: GOOGLE_NODE_RUN_SCRIPTS: '' env_variables: NODE_ENV: development authentication__seshcookie__key: production-secret -`, +${scalingBlock}${staticHandlerBlock}`, ]) { assert.deepEqual(messagesFor(source), ['env_variables.NODE_ENV must be set to production']); } }); + it('rejects a missing automatic_scaling block', () => { + const messages = messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +${staticHandlerBlock}`); + + assert.deepEqual(messages, [MAX_INSTANCES_MESSAGE]); + }); + + it('rejects max_instances values that are not positive integers', () => { + for (const value of ['0', '-2', '2.5', 'unlimited', "''"]) { + assert.deepEqual( + messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +automatic_scaling: + max_instances: ${value} +${staticHandlerBlock}`), + [MAX_INSTANCES_MESSAGE], + `max_instances: ${value} should be rejected`, + ); + } + }); + + it('rejects a config with no handlers list', () => { + const messages = messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +${scalingBlock}`); + + assert.deepEqual(messages, [STATIC_CORS_MESSAGE]); + }); + + it('rejects a /static handler without the CORS header', () => { + const messages = messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +${scalingBlock} +handlers: +- url: /static + static_dir: public/static + secure: always +`); + + assert.deepEqual(messages, [STATIC_CORS_MESSAGE]); + }); + + it('rejects a /static handler whose CORS header is not the wildcard', () => { + const messages = messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +${scalingBlock} +handlers: +- url: /static + static_dir: public/static + secure: always + http_headers: + Access-Control-Allow-Origin: https://app.simlin.com +`); + + assert.deepEqual(messages, [STATIC_CORS_MESSAGE]); + }); + + it('finds the /static handler anywhere in the handlers list', () => { + const messages = messagesFor(` +build_env_variables: + GOOGLE_NODE_RUN_SCRIPTS: '' +env_variables: + NODE_ENV: production + authentication__seshcookie__key: production-secret +${scalingBlock} +handlers: +- url: /$ + static_files: public/index.html + upload: public/index.html +- url: /static + static_dir: public/static + secure: always + http_headers: + Access-Control-Allow-Origin: '*' +`); + + assert.deepEqual(messages, []); + }); + it('rejects malformed YAML', () => { const messages = messagesFor(` build_env_variables: diff --git a/scripts/validate-app-prod-config.mjs b/scripts/validate-app-prod-config.mjs index f017949e3..da056e90e 100644 --- a/scripts/validate-app-prod-config.mjs +++ b/scripts/validate-app-prod-config.mjs @@ -60,6 +60,34 @@ export function validateAppProdConfig(source, filename = '.app.prod.yaml') { }); } + // Cost cap: without max_instances a render storm or crash loop can fan out + // F4 instances without bound (issue #694). The committed app.yaml carries + // the reference value; the operator must mirror it here. + const scaling = config.automatic_scaling; + const maxInstances = isRecord(scaling) ? scaling.max_instances : undefined; + if (!Number.isInteger(maxInstances) || maxInstances <= 0) { + errors.push({ + message: + 'automatic_scaling.max_instances must be set to a positive integer (cost cap; mirror the committed app.yaml)', + }); + } + + // Cross-origin embed contract (issue #688): third-party pages hotlink + // sd-component.js, and its engine worker/WASM loads are cross-origin + // requests against /static. Without the wildcard ACAO header the embed + // silently fails to initialize the engine -- a regression no same-origin + // smoke check can catch, so enforce the committed app.yaml's header here. + const handlers = Array.isArray(config.handlers) ? config.handlers : []; + const staticHandler = handlers.find((handler) => isRecord(handler) && handler.url === '/static'); + const headers = isRecord(staticHandler) ? staticHandler.http_headers : undefined; + const allowOrigin = isRecord(headers) ? headers['Access-Control-Allow-Origin'] : undefined; + if (allowOrigin !== '*') { + errors.push({ + message: + 'handlers must include a /static handler with http_headers.Access-Control-Allow-Origin set to "*" (cross-origin embeds, issue #688; mirror the committed app.yaml)', + }); + } + return errors; } diff --git a/scripts/verify-deploy-build.sh b/scripts/verify-deploy-build.sh index 186145225..50fc48b2c 100755 --- a/scripts/verify-deploy-build.sh +++ b/scripts/verify-deploy-build.sh @@ -182,6 +182,16 @@ else pass "src/server/lib/index.js exists" fi +# 9. The preview render worker entry ships next to the compiled server. +# render.ts spawns lib/render-worker.js via __dirname at runtime +# (issue #694); if it's missing, every preview request 500s while the +# rest of the server looks healthy. +if [ ! -f src/server/lib/render-worker.js ]; then + fail "src/server/lib/render-worker.js missing (preview renders would 500 at runtime)" +else + pass "src/server/lib/render-worker.js exists" +fi + echo "" if [ "$errors" -gt 0 ]; then echo "verify-deploy-build: FAILED ($errors error(s))" >&2 diff --git a/src/engine/CLAUDE.md b/src/engine/CLAUDE.md index 8fac063fc..20e9ba4c0 100644 --- a/src/engine/CLAUDE.md +++ b/src/engine/CLAUDE.md @@ -29,6 +29,7 @@ For build/test/lint commands, see [docs/dev/commands.md](/docs/dev/commands.md). - `src/patch.ts` -- Model patching logic - `src/worker-protocol.ts` -- Worker message protocol - `src/backend-factory.ts` / `.browser.ts` / `.node.ts` -- Platform-specific backend factories +- `src/worker-trampoline.ts` -- Cross-origin embed support (issue #688): pure decision/construction functions plus an injectable spawn shell that boots the engine worker through a same-origin blob: trampoline when the resolved chunk URL is cross-origin (third-party pages hotlinking sd-component.js). The bundler-facing constraints (inline `new Worker(new URL(...))` pattern, classic-worker downgrade under UMD, `publicPath: 'auto'` deriving from `self.location` in classic worker chunks) are documented in the module header; `backend-factory.browser.ts` and `engine-worker.ts` are the two consumers - `src/internal/` -- Internal modules (project, model, memory, error, import-export) - `src/internal/wasmgen.ts` -- `simlin_model_compile_to_wasm` FFI wrapper + the pure `parseWasmLayout` / `readStridedSeries` decoders for the per-model wasm blob (re-exported via `@simlin/engine/internal`) - `src/internal/canonicalize.ts` -- pure `canonicalizeIdent`, a faithful port of the Rust canonicalizer (used to resolve caller names to wasm-layout slots); not re-exported from the `internal` barrel @@ -54,6 +55,7 @@ For build/test/lint commands, see [docs/dev/commands.md](/docs/dev/commands.md). - `tests/race.test.ts` -- Concurrency tests - `tests/cleanup.test.ts` -- Resource cleanup tests - `tests/wasmgen.test.ts`, `tests/canonicalize.test.ts` -- Unit tests for the pure layout decoders and `canonicalizeIdent` +- `tests/worker-trampoline.test.ts` -- Unit tests for the cross-origin worker trampoline (origin decision, trampoline source, spawn interception with fake Worker/URL) - `tests/wasm-backend.test.ts`, `tests/wasm-model.test.ts`, `tests/worker-wasm.test.ts` -- wasm-vs-VM parity through `DirectBackend`, the `Model`/`Sim` facade, and the Web Worker - `tests/wasm-ltm.test.ts` -- LTM-on-wasm parity through the TypeScript surface: drives `Model.simulate({ engine: 'wasm', enableLtm: true })` end-to-end and asserts the resulting `Run.links` match the VM (link set, polarities, per-step scores). Includes a `WorkerBackend` twin and an Unsupported-LTM case that surfaces as a rejection without falling back to the VM - `tests/ltm-test-helpers.ts` -- shared helpers for the LTM tests (`linksByKey`, `expectScoresClose`); kept separate from the test files so the wasm and worker LTM suites compare links the same way diff --git a/src/engine/src/backend-factory.browser.ts b/src/engine/src/backend-factory.browser.ts index 2cae9716e..453aa2a35 100644 --- a/src/engine/src/backend-factory.browser.ts +++ b/src/engine/src/backend-factory.browser.ts @@ -9,21 +9,71 @@ * keeping the main thread free for UI interaction. The Worker is created * lazily on first access and reused for all subsequent operations. * + * When the resolved worker chunk URL is cross-origin -- the embeddable web + * component hotlinked from a third-party page (issue #688) -- the worker is + * created through a same-origin blob trampoline instead, because the Worker + * constructor enforces the same-origin policy regardless of CORS. See + * worker-trampoline.ts for the mechanism. + * * This is selected at build time via tsconfig path mapping for browser builds. */ import { EngineBackend } from './backend'; import { WorkerBackend } from './worker-backend'; +import { spawnWithTrampoline } from './worker-trampoline'; import type { WorkerRequest, WorkerResponse } from './worker-protocol'; +// Bundlers that implement webpack's module variables (rspack, webpack) +// rewrite this free identifier to their runtime publicPath; with +// assetPrefix 'auto' that value is derived from the embedding