Skip to content

feat(webapp): add per-worker Node.js heap metrics#3437

Merged
ericallam merged 2 commits intomainfrom
feat/webapp-nodejs-heap-metrics
Apr 23, 2026
Merged

feat(webapp): add per-worker Node.js heap metrics#3437
ericallam merged 2 commits intomainfrom
feat/webapp-nodejs-heap-metrics

Conversation

@ericallam
Copy link
Copy Markdown
Member

@ericallam ericallam commented Apr 23, 2026

Summary

Adds direct V8 heap and process-memory gauges to the webapp's OpenTelemetry meter. The webapp already exports per-cluster-worker Node.js runtime metrics (event-loop lag / utilization, active handles, active requests, libuv threadpool size) via a custom meter under the trigger.dev scope. Heap and memory were missing; this PR adds them alongside, in the same observable-batch pattern.

New gauges

Metric Source Unit
nodejs.memory.heap.used process.memoryUsage().heapUsed bytes
nodejs.memory.heap.total process.memoryUsage().heapTotal bytes
nodejs.memory.heap.limit v8.getHeapStatistics().heap_size_limit bytes
nodejs.memory.external process.memoryUsage().external bytes
nodejs.memory.array_buffers process.memoryUsage().arrayBuffers bytes
nodejs.memory.rss process.memoryUsage().rss bytes

Gated by the existing INTERNAL_OTEL_NODEJS_METRICS_ENABLED flag, same as the adjacent event-loop / handle gauges. Zero overhead when disabled.

Why

@opentelemetry/host-metrics publishes process.memory.usage, which is RSS only. RSS is the sum of V8 heap, external memory (Buffers, etc.), native code, and thread stacks. Without a direct heap metric it is not possible to size the V8 heap cap (--max-old-space-size) from metrics alone, because RSS overstates heap by the external + native footprint. A worker can have a 4 GB RSS with a 2.5 GB heap and 1.5 GB of buffers; the former constrains --max-old-space-size, the latter does not.

nodejs.memory.heap.limit also surfaces the configured --max-old-space-size (read from v8.getHeapStatistics().heap_size_limit), so operators can see the current limit in the same dashboard as actual usage rather than cross-referencing container environment variables.

Risk

Minimal. Observable gauges are sampled at the configured metric-export interval. v8.getHeapStatistics() and process.memoryUsage() are each microsecond-level calls, and six gauges are added to the same batch callback that already reads ~20 other Node.js runtime values per sample. Same registration pattern as the existing event-loop metrics in the file.

Test plan

  • Deploy and confirm the six new gauges appear at the configured exporter
  • In cluster mode, confirm per-worker granularity (one series per cluster worker, tagged by process.executable.name / service.instance.id)
  • Confirm nodejs.memory.heap.limit reports the configured --max-old-space-size value in bytes

Extends the existing nodejs.* OTel gauges in tracer.server.ts with direct
V8 heap + process memory readings via v8.getHeapStatistics() and
process.memoryUsage():

- nodejs.memory.heap.used      - V8 heap used after last GC
- nodejs.memory.heap.total     - V8 heap reserved
- nodejs.memory.heap.limit     - configured max-old-space-size
- nodejs.memory.external       - C++ objects bound to JS (Buffer, etc.)
- nodejs.memory.array_buffers  - ArrayBuffer/SharedArrayBuffer memory
- nodejs.memory.rss            - resident set size

@opentelemetry/host-metrics already publishes process.memory.usage (RSS),
but RSS overstates V8 heap by the external + native footprint. Without a
direct heap metric it's impossible to size NODE_MAX_OLD_SPACE_SIZE against
actual V8 usage. These gauges land in the same trigger.dev scope and
carry the same per-worker tags (process.executable.name,
service.instance.id) so they're queryable alongside the existing
event-loop + handle metrics on a per-cluster-worker basis.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 23, 2026

⚠️ No Changeset found

Latest commit: 75ea38f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2f762888-2148-4003-a2a4-1abf5f5e043d

📥 Commits

Reviewing files that changed from the base of the PR and between 574733b and 75ea38f.

📒 Files selected for processing (1)
  • apps/webapp/app/v3/tracer.server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/webapp/app/v3/tracer.server.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Deno Runtime

Walkthrough

Adds per-worker Node.js heap metrics to the webapp OpenTelemetry meter and a changelog entry at .server-changes/nodejs-heap-metrics.md. tracer.server.ts now imports node:v8, collects process.memoryUsage() and v8.getHeapStatistics() during metric scrapes, registers six new observable gauges under nodejs.memory.* (V8 heap used/total/limit, external, array buffers, and RSS), observes them in the existing batch callback alongside threadpool/handles/requests/event-loop metrics, and updates the event-loop utilization baseline after each collection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding per-worker Node.js heap metrics to the webapp's OpenTelemetry instrumentation.
Description check ✅ Passed The PR description is well-structured with clear sections (Summary, New gauges, Why, Risk, Test plan) covering rationale, implementation details, risk assessment, and testing approach, though it lacks explicit mention of following the contributing guide and testing verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/webapp-nodejs-heap-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

devin-ai-integration[bot]

This comment was marked as resolved.

…lative

lastEventLoopUtilization was set once at init and never reassigned,
so every performance.eventLoopUtilization(current, last) diff was
computed against the process-start snapshot. The nodejs.event_loop.utilization
gauge was therefore a cumulative average over process lifetime rather
than a per-interval measurement.

Rotate the baseline immediately after computing the diff.
@ericallam ericallam merged commit ca39956 into main Apr 23, 2026
42 checks passed
@ericallam ericallam deleted the feat/webapp-nodejs-heap-metrics branch April 23, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants