feat(webapp): add per-worker Node.js heap metrics#3437
Conversation
Extends the existing nodejs.* OTel gauges in tracer.server.ts with direct V8 heap + process memory readings via v8.getHeapStatistics() and process.memoryUsage(): - nodejs.memory.heap.used - V8 heap used after last GC - nodejs.memory.heap.total - V8 heap reserved - nodejs.memory.heap.limit - configured max-old-space-size - nodejs.memory.external - C++ objects bound to JS (Buffer, etc.) - nodejs.memory.array_buffers - ArrayBuffer/SharedArrayBuffer memory - nodejs.memory.rss - resident set size @opentelemetry/host-metrics already publishes process.memory.usage (RSS), but RSS overstates V8 heap by the external + native footprint. Without a direct heap metric it's impossible to size NODE_MAX_OLD_SPACE_SIZE against actual V8 usage. These gauges land in the same trigger.dev scope and carry the same per-worker tags (process.executable.name, service.instance.id) so they're queryable alongside the existing event-loop + handle metrics on a per-cluster-worker basis.
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
WalkthroughAdds per-worker Node.js heap metrics to the webapp OpenTelemetry meter and a changelog entry at .server-changes/nodejs-heap-metrics.md. tracer.server.ts now imports Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…lative lastEventLoopUtilization was set once at init and never reassigned, so every performance.eventLoopUtilization(current, last) diff was computed against the process-start snapshot. The nodejs.event_loop.utilization gauge was therefore a cumulative average over process lifetime rather than a per-interval measurement. Rotate the baseline immediately after computing the diff.
Summary
Adds direct V8 heap and process-memory gauges to the webapp's OpenTelemetry meter. The webapp already exports per-cluster-worker Node.js runtime metrics (event-loop lag / utilization, active handles, active requests, libuv threadpool size) via a custom meter under the
trigger.devscope. Heap and memory were missing; this PR adds them alongside, in the same observable-batch pattern.New gauges
nodejs.memory.heap.usedprocess.memoryUsage().heapUsednodejs.memory.heap.totalprocess.memoryUsage().heapTotalnodejs.memory.heap.limitv8.getHeapStatistics().heap_size_limitnodejs.memory.externalprocess.memoryUsage().externalnodejs.memory.array_buffersprocess.memoryUsage().arrayBuffersnodejs.memory.rssprocess.memoryUsage().rssGated by the existing
INTERNAL_OTEL_NODEJS_METRICS_ENABLEDflag, same as the adjacent event-loop / handle gauges. Zero overhead when disabled.Why
@opentelemetry/host-metricspublishesprocess.memory.usage, which is RSS only. RSS is the sum of V8 heap, external memory (Buffers, etc.), native code, and thread stacks. Without a direct heap metric it is not possible to size the V8 heap cap (--max-old-space-size) from metrics alone, because RSS overstates heap by the external + native footprint. A worker can have a 4 GB RSS with a 2.5 GB heap and 1.5 GB of buffers; the former constrains--max-old-space-size, the latter does not.nodejs.memory.heap.limitalso surfaces the configured--max-old-space-size(read fromv8.getHeapStatistics().heap_size_limit), so operators can see the current limit in the same dashboard as actual usage rather than cross-referencing container environment variables.Risk
Minimal. Observable gauges are sampled at the configured metric-export interval.
v8.getHeapStatistics()andprocess.memoryUsage()are each microsecond-level calls, and six gauges are added to the same batch callback that already reads ~20 other Node.js runtime values per sample. Same registration pattern as the existing event-loop metrics in the file.Test plan
process.executable.name/service.instance.id)nodejs.memory.heap.limitreports the configured--max-old-space-sizevalue in bytes