Skip to content

Performance: add dry-run mode and sanity-range assertions to CodeVitals posting#49999

Merged
LiamSarsfield merged 13 commits into
trunkfrom
add/codevitals-dry-run-sanity-checks
Jun 29, 2026
Merged

Performance: add dry-run mode and sanity-range assertions to CodeVitals posting#49999
LiamSarsfield merged 13 commits into
trunkfrom
add/codevitals-dry-run-sanity-checks

Conversation

@LiamSarsfield

@LiamSarsfield LiamSarsfield commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Proposed changes

CodeVitals is append-only with no rollback, so a single bad metric (wrong key, out-of-range value, scale error) can't be removed once it's posted. This PR adds a safety layer to post-to-codevitals.js before FORMS-707 widens the metric set.

  • --dry-run (pnpm report:dry) builds and prints the payload, skips the POST, and needs no token, so CI can smoke-test it.
  • Sanity ranges. The poster checks each typed metric against SANITY_RANGES before posting and skips anything out of range or non-finite; valid metrics in the same run still post. It fails closed on an unknown metric type, a null/NaN/string value, or a keyed scenario missing its metricType, instead of posting it unchecked.
  • Two failure classes, two exit codes. Anything wrong before the live POST (bad results file, no usable metrics, duplicate key, sanity failure, malformed or non-http(s) CODEVITALS_URL) exits 2 and always fails the build. A failure during the POST exits 1, which --allow-codevitals-failure can suppress for a genuine outage. That flag used to swallow local data bugs too, since both shared exit 1.
  • Token stays out of logs. The script builds the URL with new URL() before attaching the token, and scrubs any caught error and its cause chain before logging or rethrowing, so err.message, err.cause, and util.inspect(err) carry no token.
  • Guarded main() so importing the helpers in tests can't trigger a post. The guard compares real filesystem paths, so a checkout with a space, non-ASCII char, or symlink still runs.
  • Apex host by default. www.codevitals.run now 301-redirects the API, and fetch retries a redirected POST as a bodiless GET, so a metric sent to the old www. default never landed. The default is now https://codevitals.run.
  • Tests and docs. 42 node:test cases (pnpm test:unit, no Docker/token/network) cover the guard, the exit-code split, and token redaction; pnpm test runs them before the perf runner. The README documents the staging-key convention and the bad-data escalation steps.

Related product discussion/links

  • FORMS-713: implementation ticket.
  • FORMS-696: parent performance-tracking effort, whose runbook holds the escalation steps the README points to.

Does this pull request change what data or activity we track or use?

No. This hardens the existing CodeVitals posting tool. Shipped Jetpack code is untouched, and the LCP metric posts as before.

Testing instructions

  • From tools/performance, run pnpm report:dry: it prints the payload and exits 0 without posting (no token needed).
  • Force a failure: point RESULTS_PATH at a results file whose median LCP is outside [100, 60000] (e.g. 70000). The metric is skipped and the command exits 2.
  • Confirm a normal file (LCP ~120ms) still posts the single LCP metric.
  • Confirm the host: pnpm report:dry prints CodeVitals URL: https://codevitals.run.
  • Run pnpm test:unit for the 42 unit tests (no Docker/token/network); pnpm test runs them before the perf runner.

…ls posting

CodeVitals is append-only with no self-service rollback, so a bad metric
(wrong key, out-of-range value, scale error) permanently pollutes the trend
graph. This adds the Phase 0 safety layer before we expand the metrics surface.

- --dry-run flag: build and print the payload, skip the POST, no token
  required (usable as a CI smoke test). Exposed as `pnpm report:dry`.
- Sanity-range assertions: each typed metric is checked against SANITY_RANGES
  in scenarios.js before posting. Out-of-range values are logged and skipped,
  and the script exits non-zero so CI surfaces them. Because
  run-performance-tests.js spawns this script, the gate guards both
  `pnpm report` and the integrated `pnpm test` path.
- Documented the staging-key convention and the bad-data escalation path
  in the README.

FORMS-713
@LiamSarsfield LiamSarsfield requested a review from a team as a code owner June 26, 2026 09:21
@github-actions github-actions Bot added the Docs label Jun 26, 2026
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Thank you for your PR!

When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:

  • ✅ Include a description of your PR changes.
  • ✅ Add a "[Status]" label (In Progress, Needs Review, ...).
  • ✅ Add testing instructions.
  • ✅ Specify whether this PR includes any changes to data or privacy.
  • ✅ Add changelog entries to affected projects

This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖


Follow this PR Review Process:

  1. Ensure all required checks appearing at the bottom of this PR are passing.
  2. Make sure to test your changes on all platforms that it applies to. You're responsible for the quality of the code you ship.
  3. You can use GitHub's Reviewers functionality to request a review.
  4. When it's reviewed and merged, you will be pinged in Slack to deploy the changes to WordPress.com simple once the build is done.

If you have questions about anything, reach out in #jetpack-developers for guidance!

@github-actions github-actions Bot added the [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. label Jun 26, 2026
www.codevitals.run 301-redirects the API to the apex. On a 301 fetch
retries a POST as a GET with no body, so a metric posted to the www
default would never land. Default CODEVITALS_URL to https://codevitals.run.
The guard returned true for any metricType absent from SANITY_RANGES (a
typo, a forgotten row, or the legacy untyped path), and coercion let null
pass min-0 ranges and let numeric strings through as strings. Both post
unchecked to an append-only store. checkSanityRange now rejects a typed
metric with no range row and any non-finite value, and only a genuinely
untyped legacy entry passes unchecked.

Guard main() behind an import.meta.url check so the pure helpers can be
imported, and add node:test coverage (pnpm test:unit) pinning the
fail-closed contract: over-range skipped, non-finite/string/typo rejected,
boundaries inclusive, untyped legacy passes.
The round-1 import guard compared import.meta.url against a raw
file://${argv[1]} string. Node percent-encodes and symlink-resolves
import.meta.url but argv[1] stays raw, so on a checkout whose path
contains a space or non-ASCII char (or via /tmp -> /private/tmp), the
match failed and main() silently never ran: the CLI exited 0 having
posted nothing. Replace it with isDirectInvocation(), which compares
realpath'd filesystem paths.

Also:
- Reject non-finite values for every entry, typed or untyped, by moving
  the finite check above the untyped early return (never post null/NaN).
- Gate the unit tests on the integrated path: pnpm test now runs
  node --test before the perf runner, so the guard is enforced wherever
  this tool runs (tools/performance is outside the monorepo CI matrix).
- Add integration coverage for postToCodeVitals (in-range posts,
  out-of-range skipped + validationFailed, missing file throws) and a CLI
  test that runs the script from a path with a space (regression guard
  for the bug above). Dry-run now returns the built payload so the
  integration test can assert it.
- Point the README default and the perf runner's result link at the apex
  host, matching the POST default.
Build the request URL with new URL() before attaching the token, so a
malformed CODEVITALS_URL throws a generic error instead of a parse error
that echoes the secret, and scrub token=... from any caught error before
logging or rethrowing. Add live-POST tests (fetch stubbed) covering the
payload, a non-OK response, and a malformed-URL redaction regression.
Redacting only the top-level error.message left the token in err.cause
and util.inspect(err) when an upstream fetch error echoed the URL. Walk
the caught error's whole cause chain and scrub it in place before logging
or rethrowing, so the full error object is token-free. Also make the CLI
test fixture explicitly ESM so it runs across the supported Node range,
and document that CODEVITALS_URL must be origin-only.
@LiamSarsfield LiamSarsfield requested a review from a team as a code owner June 26, 2026 14:03
@LiamSarsfield LiamSarsfield marked this pull request as draft June 26, 2026 15:47
@LiamSarsfield LiamSarsfield removed the request for review from a team June 26, 2026 15:48
extractScenarioMetrics now throws when a scenario sets metricKey but no
metricType, instead of emitting an untyped entry that checkSanityRange
would pass unchecked. That closed the one path the fail-closed guard is
meant to protect: a future keyed metric posting any finite value to the
append-only store. The current lcp scenario is unaffected.

Also harden two tests: a dry run with a poisoned fetch proves it never
posts, and the non-OK path now puts the token in the response body to
prove the whole error (message, cause, util.inspect) is scrubbed.
…s build

A sanity-check failure and a CodeVitals network outage both exited the
poster with code 1, so --allow-codevitals-failure (meant to tolerate
outages) also silently tolerated bad local data. Give validation failures
a distinct exit code (2) that run-performance-tests.js never suppresses,
and add a CLI test asserting an out-of-range dry run exits with it.

Also extend sanitizeErrorChain to scrub the token from custom enumerable
string error properties, not just message/stack/cause. Native fetch never
populates these; this is belt-and-suspenders for a non-native HTTP client.
…or causes

Closes two gaps the round-6 hardening left open:

- A keyed scenario missing its metricType threw a plain Error, which main()
  mapped to exit 1 — suppressible under --allow-codevitals-failure, despite
  being local bad data exactly like an out-of-range metric. It now throws a
  ValidationError that exitCodeForError maps to VALIDATION_FAILED_EXIT_CODE (2),
  so the runner always fails the build on it.
- sanitizeErrorChain walked the cause chain but never redacted a primitive
  string cause (new Error(m, { cause: someUrl })); cause is non-enumerable, so
  the own-property pass missed it too, leaking the token into util.inspect. It
  is now redacted in place before the walk advances.

Also makes run-performance-tests.js import-safe (guards main() with
isDirectInvocation, mirroring post-to-codevitals.js) and extracts the
build-fail decision into shouldFailBuildOnPostError, so the cross-file
validation/outage contract now has committed regression coverage.

Tests 27 -> 31, all green.
…odes

Round-8 review (Codex) surfaced three HIGH-confidence issues, all reproduced
firsthand:

- A native fetch abort rejects with a DOMException whose message/stack are
  getter-only; sanitizeErrorChain wrote to them and threw a TypeError out of
  the catch, so the 30s timeout path produced 'Cannot set property message'
  instead of the intended timeout error. Redaction is now best-effort via
  safeAssign (an abort message carries no token, so skipping it is safe).

- A run that skips a bad metric (validationFailed=true) but still posts a valid
  one would, if that POST then failed, rethrow a plain Error -> main() exit 1 ->
  suppressible under --allow-codevitals-failure, downgrading local bad data to a
  tolerable outage. The catch now rethrows a ValidationError when validationFailed,
  so it stays exit 2 (always fatal). ValidationError's constructor now forwards
  { cause } so the wrapped transport error is preserved.

- 'pnpm test' / 'test:unit' ran unscoped 'node --test', which recursively
  discovers tests inside the gitignored plugin/ checkout (a jetpack-production
  mirror) and fails before the runner starts. Scoped to 'scripts/*.test.js'.

Also adds a real-SCENARIOS contract test (every posted exact-key scenario must
declare a metricType with a matching SANITY_RANGES row) to guard FORMS-707.

Tests 31 -> 34, all green; reachable-only-at-2-metrics paths now have coverage.
…integrity exit code

Round-9 review (Codex) found the exit-code taxonomy was incomplete: only sanity
and misconfig failures used the unsuppressible code 2, while other local failures
threw a plain Error -> exit 1 -> suppressible under --allow-codevitals-failure.
That contradicts the PR's own contract that local data-integrity failures always
fail the build. Reproduced firsthand: empty measurements exited 1; a missing
measurements object and a measurement without a summary crashed with a TypeError
(also exit 1).

Adopt the clean invariant: everything before the live POST is local data-integrity
work and fails as a ValidationError (exit 2); only a failure during/after the live
POST is a transport error (exit 1, suppressible). Concretely:

- Results file not found / invalid JSON / no measurements object -> ValidationError.
- A measurement with no summary is now skipped (like a missing/errored one) instead
  of crashing on summary.median; if nothing is left to post, the no-metrics guard
  fails closed.
- 'No metrics to post' and 'Invalid CodeVitals URL' -> ValidationError.

Also adds the missing live-POST coverage Codex flagged: an OK response whose body
fails to parse is a transport failure (exit 1) with the token scrubbed everywhere.

Tests 34 -> 38, all green.
… metric keys

Two more pre-POST fail-closed guards on the same data-integrity invariant
(local bad config exits 2, never suppressible by --allow-codevitals-failure):

- A non-http(s) CODEVITALS_URL (file:, ftp:, …) parsed cleanly and reached
  fetch as a generic exit-1 transport error the runner could suppress. A bad
  scheme is a local misconfiguration, not a network outage, so reject it as a
  ValidationError before the token is attached or fetch runs.
- Two scenarios posting the same CodeVitals key silently clobbered one with the
  other and posted the survivor with validationFailed:false. A duplicate key is
  a scenario-config bug; fail closed before posting a coin-flip value to the
  append-only trend. Unreachable on today's single metric; guards the FORMS-707
  multi-metric foundation.

Tests (+4, 42 total): non-http(s) URL exits 2 without reaching fetch; a
duplicate key exits 2; a mixed valid/invalid run keeps the valid metric and
excludes the rejected key from the payload (pins the core contract on payload
contents, not just the flag); invalid-JSON and malformed-URL branches now assert
exit-2 classification directly.
@LiamSarsfield LiamSarsfield removed the [Status] Needs Author Reply We need more details from you. This label will be auto-added until the PR meets all requirements. label Jun 29, 2026
@LiamSarsfield LiamSarsfield marked this pull request as ready for review June 29, 2026 11:27
@LiamSarsfield LiamSarsfield requested a review from kraftbj June 29, 2026 11:27

@CGastrell CGastrell left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Tests run fine. Only nit found by agent:

Tests mutate module-level SCENARIOS/SANITY_RANGES (push/pop). Safe only because node:test runs top-level tests sequentially — fragile if concurrency is ever enabled. try/finally cleanup mitigates.

@LiamSarsfield LiamSarsfield merged commit d5fe2e1 into trunk Jun 29, 2026
75 checks passed
@LiamSarsfield LiamSarsfield deleted the add/codevitals-dry-run-sanity-checks branch June 29, 2026 14:33
LiamSarsfield added a commit that referenced this pull request Jun 30, 2026
…sted commits

Two FORMS-705 loop-integrity items not covered by the FORMS-713 PR (#49999):

- Commit-time timestamp: the poster stamped Date.now(), but CodeVitals orders a
  trend by the posted timestamp and the Scheduler reads the latest point to decide
  'last tested'. run-performance-tests.js now captures the plugin HEAD commit time
  (git show -s --format=%ct), measure-lcp.js carries it in results.git.timestamp,
  and post-to-codevitals.js stamps it (resolvePostTimestamp), falling back to build
  time with a warning only when no commit time is available.

- Cross-commit dedup: before a live post, the poster queries the same gitaudit
  evolution endpoint the Scheduler reads (metric 58) and skips if the hash already
  has a point, so a re-run / retryBuild / double-trigger can't append a duplicate to
  the append-only store. Fails open (a flaky read never blocks a post) and is gated
  after the dry-run return, so the token-free CI smoke test still makes no network
  call. Configurable via CODEVITALS_DEDUP_URL / CODEVITALS_REPO /
  CODEVITALS_DEDUP_METRIC_ID; disable with --no-dedup / CODEVITALS_SKIP_DEDUP.

Adds 10 unit tests (52 total). The remaining FORMS-705 items are TeamCity pipeline
config, not repo code; docs/teamcity-codevitals-runbook.md gives step-by-step UI
instructions for them, including the GATE-1 read/write host reconciliation that the
dedup read depends on.
LiamSarsfield added a commit that referenced this pull request Jun 30, 2026
…sted commits

Two FORMS-705 loop-integrity items not covered by the FORMS-713 PR (#49999):

- Commit-time timestamp: the poster stamped Date.now(), but CodeVitals orders a
  trend by the posted timestamp and the Scheduler reads the latest point to decide
  'last tested'. run-performance-tests.js now captures the plugin HEAD commit time
  (git show -s --format=%ct), measure-lcp.js carries it in results.git.timestamp,
  and post-to-codevitals.js stamps it (resolvePostTimestamp), falling back to build
  time with a warning only when no commit time is available.

- Cross-commit dedup: before a live post, the poster queries the same gitaudit
  evolution endpoint the Scheduler reads (metric 58) and skips if the hash already
  has a point, so a re-run / retryBuild / double-trigger can't append a duplicate to
  the append-only store. Fails open (a flaky read never blocks a post) and is gated
  after the dry-run return, so the token-free CI smoke test still makes no network
  call. Configurable via CODEVITALS_DEDUP_URL / CODEVITALS_REPO /
  CODEVITALS_DEDUP_METRIC_ID; disable with --no-dedup / CODEVITALS_SKIP_DEDUP.

Adds 10 unit tests (52 total). The remaining FORMS-705 items are TeamCity pipeline
config, not repo code; docs/teamcity-codevitals-runbook.md gives step-by-step UI
instructions for them, including the GATE-1 read/write host reconciliation that the
dedup read depends on.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants