diff --git a/charts/netdata/README.md b/charts/netdata/README.md index 55c5218..92f2d09 100644 --- a/charts/netdata/README.md +++ b/charts/netdata/README.md @@ -2313,6 +2313,122 @@ In case of `child` instance it is a bit simpler. By default, hostPath: `/var/lib in: `/var/lib/netdata`. You can disable it, but this option is pretty much required in a real life scenario, as without it each pod deletion will result in a new replication node for a parent. +### Collecting logs with OpenTelemetry + +Netdata can ingest, store, and visualize your cluster's container logs through OpenTelemetry. +This relies on two components — **both disabled by default**: + +- **`netdataOpentelemetry`** — Netdata's OpenTelemetry receiver. It runs as its own `Deployment` + (the `netdata-otel` node) and listens for OTLP data on port `4317`. It receives, stores, and + displays the logs in the Netdata UI. It does **not** collect logs itself. +- **`otel-collector`** — the upstream [OpenTelemetry Collector](https://opentelemetry.io/docs/platforms/kubernetes/helm/collector/), + bundled as an optional subchart. It runs as a `DaemonSet` (one pod per node), reads each node's + local container log files, and forwards them over OTLP to `netdataOpentelemetry`. + +`netdataOpentelemetry` is the destination; `otel-collector` is what feeds it. The collector is +bundled only as a convenient, working default, and is off by default because it is not the only +option. If you already run a log pipeline (Fluent Bit, Vector, an existing Collector, or any +OTLP-capable agent), leave `otel-collector` disabled and point that pipeline's OTLP exporter at +the `netdata-otel` service on port `4317` instead. + +**Log flow:** + +``` +container stdout/stderr + │ the container runtime writes them to node-local files (/var/log/pods/…) + ▼ +otel-collector DaemonSet (one pod per node, needs host log access) + │ reads each node's log files, pushes over OTLP + ▼ +netdata-otel:4317 → stored and shown in the Netdata UI +``` + +#### Securing the endpoint with TLS + +By default the `netdata-otel` receiver listens on port `4317` in **plaintext** — TLS is **disabled** +(`tls_cert_path`, `tls_key_path`, and `tls_ca_cert_path` are unset in +`netdataOpentelemetry.configs.otel.data`). The steps below turn it on with a self-signed +certificate. TLS affects **both sides**: the receiver must serve the certificate, and every client +(including the bundled `otel-collector`) must be switched to TLS, or it will stop delivering data. + +**1. Generate a self-signed certificate and key** (Linux, `openssl`): + +``` +openssl req -x509 -newkey rsa:4096 -nodes \ + -keyout tls.key -out tls.crt -days 365 \ + -subj "/CN=netdata-otel" +``` + +**2. Create a Kubernetes TLS secret** from those files, in the chart's namespace: + +``` +kubectl create secret tls netdata-otel-tls \ + --cert=tls.crt --key=tls.key \ + --namespace +``` + +**3. Mount the secret into the receiver and point the config at it.** The certificate paths live +inside `netdataOpentelemetry.configs.otel.data`, which is a single block — supply it in full with +the two `tls_*_path` values filled in (keep the `metrics` and `logs` sections in sync with the +chart's `values.yaml`). Mounting the secret alone does nothing until these paths are set: + +```yaml +netdataOpentelemetry: + extraVolumes: + - name: otel-tls + secret: + secretName: netdata-otel-tls + extraVolumeMounts: + - name: otel-tls + mountPath: /etc/netdata/otel-certs + readOnly: true + configs: + otel: + data: | + endpoint: + path: "0.0.0.0:4317" + tls_cert_path: /etc/netdata/otel-certs/tls.crt + tls_key_path: /etc/netdata/otel-certs/tls.key + tls_ca_cert_path: null + metrics: + print_flattened: false + buffer_samples: 10 + throttle_charts: 100 + chart_configs_dir: otel.d/v1/metrics + logs: + journal_dir: otel/v1 + size_of_journal_file: "100MB" + number_of_journal_files: 10 + size_of_journal_files: "1GB" + duration_of_journal_files: "7 days" + duration_of_journal_file: "2 hours" + store_otlp_json: false +``` + +**4. Switch every client to TLS.** A TLS listener rejects plaintext connections, so any sender must +be reconfigured — otherwise logs silently stop arriving. For the bundled `otel-collector`, enable +TLS on its OTLP exporter. Because the certificate is self-signed, skip verification with +`insecure_skip_verify` — this keeps the connection encrypted but does not validate the certificate +chain (only the `tls` block is overridden; the exporter's `endpoint` is kept from the chart +defaults): + +```yaml +otel-collector: + config: + exporters: + otlp: + tls: + insecure: false + insecure_skip_verify: true +``` + +Apply the same change to any external OTLP client — Fluent Bit, Vector, or another Collector — +pointing at the `netdata-otel` service. + +> For production, replace the self-signed certificate with one issued by a trusted CA, give it a +> `CN`/`SAN` that matches the `netdata-otel` service DNS name, and have clients trust that CA via +> `tls_ca_cert_path` instead of skipping verification. + ### Service discovery and supported services Netdata's [service discovery](https://github.com/netdata/agent-service-discovery/), which is installed as part of the diff --git a/templates/netdata-README.md.gotmpl b/templates/netdata-README.md.gotmpl index 7a0b29e..60b6a2c 100644 --- a/templates/netdata-README.md.gotmpl +++ b/templates/netdata-README.md.gotmpl @@ -221,6 +221,122 @@ In case of `child` instance it is a bit simpler. By default, hostPath: `/var/lib in: `/var/lib/netdata`. You can disable it, but this option is pretty much required in a real life scenario, as without it each pod deletion will result in a new replication node for a parent. +### Collecting logs with OpenTelemetry + +Netdata can ingest, store, and visualize your cluster's container logs through OpenTelemetry. +This relies on two components — **both disabled by default**: + +- **`netdataOpentelemetry`** — Netdata's OpenTelemetry receiver. It runs as its own `Deployment` + (the `netdata-otel` node) and listens for OTLP data on port `4317`. It receives, stores, and + displays the logs in the Netdata UI. It does **not** collect logs itself. +- **`otel-collector`** — the upstream [OpenTelemetry Collector](https://opentelemetry.io/docs/platforms/kubernetes/helm/collector/), + bundled as an optional subchart. It runs as a `DaemonSet` (one pod per node), reads each node's + local container log files, and forwards them over OTLP to `netdataOpentelemetry`. + +`netdataOpentelemetry` is the destination; `otel-collector` is what feeds it. The collector is +bundled only as a convenient, working default, and is off by default because it is not the only +option. If you already run a log pipeline (Fluent Bit, Vector, an existing Collector, or any +OTLP-capable agent), leave `otel-collector` disabled and point that pipeline's OTLP exporter at +the `netdata-otel` service on port `4317` instead. + +**Log flow:** + +``` +container stdout/stderr + │ the container runtime writes them to node-local files (/var/log/pods/…) + ▼ +otel-collector DaemonSet (one pod per node, needs host log access) + │ reads each node's log files, pushes over OTLP + ▼ +netdata-otel:4317 → stored and shown in the Netdata UI +``` + +#### Securing the endpoint with TLS + +By default the `netdata-otel` receiver listens on port `4317` in **plaintext** — TLS is **disabled** +(`tls_cert_path`, `tls_key_path`, and `tls_ca_cert_path` are unset in +`netdataOpentelemetry.configs.otel.data`). The steps below turn it on with a self-signed +certificate. TLS affects **both sides**: the receiver must serve the certificate, and every client +(including the bundled `otel-collector`) must be switched to TLS, or it will stop delivering data. + +**1. Generate a self-signed certificate and key** (Linux, `openssl`): + +``` +openssl req -x509 -newkey rsa:4096 -nodes \ + -keyout tls.key -out tls.crt -days 365 \ + -subj "/CN=netdata-otel" +``` + +**2. Create a Kubernetes TLS secret** from those files, in the chart's namespace: + +``` +kubectl create secret tls netdata-otel-tls \ + --cert=tls.crt --key=tls.key \ + --namespace +``` + +**3. Mount the secret into the receiver and point the config at it.** The certificate paths live +inside `netdataOpentelemetry.configs.otel.data`, which is a single block — supply it in full with +the two `tls_*_path` values filled in (keep the `metrics` and `logs` sections in sync with the +chart's `values.yaml`). Mounting the secret alone does nothing until these paths are set: + +```yaml +netdataOpentelemetry: + extraVolumes: + - name: otel-tls + secret: + secretName: netdata-otel-tls + extraVolumeMounts: + - name: otel-tls + mountPath: /etc/netdata/otel-certs + readOnly: true + configs: + otel: + data: | + endpoint: + path: "0.0.0.0:4317" + tls_cert_path: /etc/netdata/otel-certs/tls.crt + tls_key_path: /etc/netdata/otel-certs/tls.key + tls_ca_cert_path: null + metrics: + print_flattened: false + buffer_samples: 10 + throttle_charts: 100 + chart_configs_dir: otel.d/v1/metrics + logs: + journal_dir: otel/v1 + size_of_journal_file: "100MB" + number_of_journal_files: 10 + size_of_journal_files: "1GB" + duration_of_journal_files: "7 days" + duration_of_journal_file: "2 hours" + store_otlp_json: false +``` + +**4. Switch every client to TLS.** A TLS listener rejects plaintext connections, so any sender must +be reconfigured — otherwise logs silently stop arriving. For the bundled `otel-collector`, enable +TLS on its OTLP exporter. Because the certificate is self-signed, skip verification with +`insecure_skip_verify` — this keeps the connection encrypted but does not validate the certificate +chain (only the `tls` block is overridden; the exporter's `endpoint` is kept from the chart +defaults): + +```yaml +otel-collector: + config: + exporters: + otlp: + tls: + insecure: false + insecure_skip_verify: true +``` + +Apply the same change to any external OTLP client — Fluent Bit, Vector, or another Collector — +pointing at the `netdata-otel` service. + +> For production, replace the self-signed certificate with one issued by a trusted CA, give it a +> `CN`/`SAN` that matches the `netdata-otel` service DNS name, and have clients trust that CA via +> `tls_ca_cert_path` instead of skipping verification. + ### Service discovery and supported services Netdata's [service discovery](https://github.com/netdata/agent-service-discovery/), which is installed as part of the