diff --git a/docs/INSTALL.md b/docs/INSTALL.md new file mode 100644 index 0000000..8891308 --- /dev/null +++ b/docs/INSTALL.md @@ -0,0 +1,103 @@ +# `pup install` — design + +This doc covers the design and rollout plan for `pup install ` — pup's unified entry point for installing Datadog components on a target. The first concrete leaf is `pup install ssi linux`, the rest follow in subsequent PRs. + +## Motivation + +Today, multiple agent-skills install Datadog components by embedding `bash -c "$(curl -L install.datadoghq.com/scripts/install_script_agent7.sh)"` in their SKILL.md. Security scanners (Snyk, etc.) correctly flag this as RCE-shaped — and the script itself lives outside any repo we control, with no pinned version or audit trail. + +Workarounds we evaluated: + +- **Pointing skills at docs** — breaks the "user does nothing" promise that SSI sells. +- **Vendoring the install script in the agent-skills repo** — Datadog ships updates per release; we'd inherit a sync treadmill. +- **Manual apt/yum install in the skill** — only works for the agent; the SSI components (apm-inject, language libs) aren't deb/rpm packages, they're OCI artifacts delivered by the `datadog-installer` binary. +- **Snyk policy exception** — pragmatic but doesn't fix the underlying audit-trail issue. + +The clean answer is to move install logic out of markdown and into pup. The skill becomes a single line (`pup install ssi linux --host `), and the install steps execute in compiled, reviewed Rust — using HTTP fetch + SHA-256 verification + signed package install rather than fetch-and-execute shell. This is the same shape every modern install CLI (helm, brew, AWS CLI) already uses. + +## Command surface + +``` +pup install [args] +``` + +Two dimensions: + +- **Component** — what to install (`ssi`, `agent`, `dbm`, `llm-obs`, `csm`, …) +- **Platform** — where to install (`linux`, `k8s`, `docker`, `windows`, …) + +Today, only `ssi linux` is wired up (and as scaffolding — see "Status" below). The matrix grows as future PRs add components/platforms. + +### Examples + +Today (after this scaffolding PR): + +``` +pup install ssi linux --host bastion.example.com --user ec2-user --key ~/.ssh/id_ed25519 +pup install ssi linux --host bastion.example.com --dry-run +``` + +Future (out of scope for this PR): + +``` +pup install agent linux --host # just the agent, no SSI +pup install ssi k8s --context # SSI via helm + DatadogAgent CR +pup install dbm linux --host --db pg # Database Monitoring +pup install llm-obs local --runtime python # LLM Obs SDK +``` + +## Linux SSI install flow + +This is what `pup install ssi linux --host ` will do once implemented: + +1. Open an SSH session to the target via the `russh` client (already an optional dep of pup; used today by the auth tunnel server). +2. Detect the host's package manager by reading `/etc/os-release` (`$ID` + `$ID_LIKE`). +3. Configure the Datadog package repo on the target: + - **Debian / Ubuntu**: write `/etc/apt/sources.list.d/datadog.list` (with `signed-by=`), download `keys.datadoghq.com/DATADOG_APT_KEY_*.public` keys, import via `gpg --no-default-keyring`. + - **RHEL / CentOS / Amazon / Rocky / Alma**: write `/etc/yum.repos.d/datadog.repo` referencing the same key URLs. +4. Install signed packages via the host's package manager: `datadog-agent` (+ `datadog-signing-keys` on apt). +5. Download the `datadog-installer` binary from the OCI registry (`gcr.io/datadoghq/installer-package/blobs/sha256:`), verify against a pinned SHA-256, then upload it to the remote host. +6. Execute `datadog-installer setup` on the remote host to install `apm-inject` + the language libraries and to write `/etc/ld.so.preload`. (The exact subcommand and flags — `--flavor "APM SSI"`, `--apm-instrumentation-enabled=host`, etc. — need verification with whoever owns `datadog-installer`. See "Open questions" below.) +7. Patch `/etc/datadog-agent/datadog.yaml` with `DD_API_KEY` and `DD_SITE` (sourced from pup's existing auth config). +8. Restart the agent and confirm `datadog-agent status` reports healthy. + +The critical property: nowhere in this flow does pup execute a remote shell script. Every step is either (a) an `apt-get` / `yum` install against signed package repos, (b) an HTTP fetch of a key file or binary followed by signature/hash verification, or (c) executing a known binary with known arguments. The `curl | bash` pattern is gone. + +## Security model + +- **No fetch-and-execute.** No `curl | bash`, no `bash -c "$(curl ...)"`, no `wget | sh`. All downloads write to a file, then verify, then either parse or execute. +- **Signed packages.** Agent + SSI components install via apt/yum with the Datadog signing keys imported into a keyring; signature verification is enforced by the package manager. +- **Pinned binary.** The `datadog-installer` binary is downloaded by SHA-256 (pinned in pup's source) rather than "whatever the latest URL serves." This trades update friction for a real audit trail. +- **Auth surface.** SSH credentials are passed via flags / config (same shape as existing ssh tooling); Datadog API credentials use pup's existing `auth` infrastructure. +- **No new outbound endpoints beyond what the official install script already hits** — `apt.datadoghq.com`, `yum.datadoghq.com`, `keys.datadoghq.com`, `gcr.io` / `public.ecr.aws`. + +## Status (as of this PR) + +This is a **scaffolding PR**. What's in vs out: + +**In:** +- The `pup install` command surface in `src/main.rs` (clap subcommands `Install` → `Ssi` → `Linux`). +- A new `src/commands/install.rs` module with the `ssi_linux` dispatcher. +- `--dry-run` prints the planned install steps for the given target. +- Without `--dry-run`, the command bails with an actionable error that points at this doc. +- Unit tests for both behaviors. + +**Out (follow-up PRs):** +- The actual `russh` client session (connect, exec, file transfer). +- The OCI registry manifest fetch + blob download + SHA-256 verification. +- The repo file / keyring setup on the remote host. +- The `datadog-installer setup` invocation. +- The agent restart + health check. +- Integration tests against a real Linux host. +- Multi-distro test matrix (Debian 11/12, Ubuntu 22.04/24.04, RHEL 8/9, Amazon Linux 2/2023, Rocky 9). + +The intent is to land the *surface* now so the pup team can review the shape, and to land the implementation in small, reviewable follow-ups. + +## Open questions for the pup team + +1. **Where should the install module live?** This PR puts everything in `src/commands/install.rs`. Once the implementation lands the install logic might want its own top-level `src/install/` module with submodules (`ssh.rs`, `oci.rs`, `linux_ssi.rs`). Happy to refactor at that point. +2. **SSH transport.** `russh` is already an optional dep but currently used only as a server (`src/tunnel.rs`). Should the install module reuse `russh` as a client, or shell out to the system `ssh` binary? `russh` keeps pup self-contained (especially on Windows); shelling out is simpler but adds a runtime dependency. +3. **`datadog-installer` CLI surface.** The exact subcommand and flags used by `install-ssi.sh` (`setup --flavor "APM SSI"` per one read of the script) need to be confirmed with whoever owns that binary as a public, stable contract. If the contract isn't stable, we need a different integration point. +4. **OCI version pinning.** Who owns bumping the `datadog-installer` SHA-256 in pup when Datadog ships a new release? Manual on each agent release, or auto-bumped from a manifest? +5. **k8s parity.** `pup install ssi k8s` is the natural next leaf. The mechanics are entirely different (helm + DatadogAgent CR + admission webhook) — does it deserve its own command tree, or stay under `install ssi`? +6. **Auth integration.** Pup's existing `auth` config gives access to Datadog API. The install flow needs `DD_API_KEY` + `DD_SITE` to write `/etc/datadog-agent/datadog.yaml` — should the install command reuse `cfg.api_key` / `cfg.site`, or take its own flags? diff --git a/src/commands/install.rs b/src/commands/install.rs new file mode 100644 index 0000000..a7eee45 --- /dev/null +++ b/src/commands/install.rs @@ -0,0 +1,150 @@ +//! `pup install ` — unified install entry point. +//! +//! **Status: scaffolding.** The CLI surface and module structure land in this PR; +//! the actual install flow (russh client, OCI registry fetch, signed package +//! install, `datadog-installer setup`) ships in follow-up PRs. +//! +//! Motivation lives in `docs/INSTALL.md` — short version: skills currently +//! embed `bash -c "$(curl …install_script_agent7.sh)"` to install the agent + +//! SSI on remote hosts, which scanners (correctly) flag as RCE-shaped and which +//! references a script that lives outside any repo we control. Moving the +//! install flow into pup turns the SKILL.md into a one-liner (`pup install …`) +//! and shifts the install logic into compiled Rust that already gets reviewed. +//! +//! Today this returns a structured "not yet implemented" error that lists the +//! planned steps. `--dry-run` prints the same plan without erroring so callers +//! can wire up the surface end-to-end. + +use anyhow::{bail, Result}; + +use crate::config::Config; + +/// Install Single Step Instrumentation on a remote Linux host. +/// +/// The `--dry-run` flag returns successfully after printing the install plan. +/// Without it, returns an error today — the actual install lands in a follow-up +/// PR per the checklist in `docs/INSTALL.md`. +pub async fn ssi_linux( + _cfg: &Config, + host: String, + user: String, + key: Option, + port: u16, + dry_run: bool, +) -> Result<()> { + let plan = LinuxSsiPlan { + host: &host, + user: &user, + key: key.as_deref(), + port, + }; + + if dry_run { + plan.print(); + return Ok(()); + } + + bail!( + "`pup install ssi linux` is scaffolded but not yet executable in this build. \ + Re-run with --dry-run to see the planned steps for {user}@{host}:{port}, \ + or track the follow-up implementation work in docs/INSTALL.md.", + ) +} + +/// Captured arguments for a single Linux-SSI install. Kept narrow on purpose — +/// the actual install code lives in follow-up PRs and will likely want a +/// richer config struct (e.g. proxy settings, sudo strategy, distro override). +struct LinuxSsiPlan<'a> { + host: &'a str, + user: &'a str, + key: Option<&'a str>, + port: u16, +} + +impl LinuxSsiPlan<'_> { + fn print(&self) { + println!("# Planned `pup install ssi linux` steps"); + println!("#"); + println!( + "# Target: {user}@{host}:{port}{key}", + user = self.user, + host = self.host, + port = self.port, + key = match self.key { + Some(k) => format!(" (key: {k})"), + None => " (key: SSH agent / default identity)".to_string(), + } + ); + println!("#"); + println!("# 1. Open an SSH session via russh client."); + println!("# 2. Detect distro family by reading /etc/os-release ($ID + $ID_LIKE)."); + println!("# 3. Configure Datadog package repo for the host:"); + println!("# - Debian/Ubuntu → add apt repo signed by"); + println!("# keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public"); + println!("# - RHEL/CentOS/Amazon/Rocky/Alma → write /etc/yum.repos.d/datadog.repo"); + println!("# 4. Install signed packages: datadog-agent (+ datadog-signing-keys on apt)."); + println!("# 5. Download the `datadog-installer` binary from the OCI registry"); + println!("# (gcr.io/datadoghq/installer-package), verify its SHA-256 against a"); + println!("# pinned manifest, and execute it locally on the remote host."); + println!("# 6. Run `datadog-installer setup` to install apm-inject + language"); + println!("# libraries and to write /etc/ld.so.preload."); + println!("# 7. Patch /etc/datadog-agent/datadog.yaml with DD_API_KEY + DD_SITE."); + println!("# 8. Restart the agent and confirm `datadog-agent status` is healthy."); + println!("#"); + println!("# None of these steps run today — this command is scaffolding so the pup team"); + println!("# can review the shape before the implementation lands. See docs/INSTALL.md."); + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::test_support::test_config; + + #[tokio::test] + async fn dry_run_returns_ok_and_does_not_bail() { + // ssi_linux's scaffolding doesn't read from cfg yet, so we just need + // *any* Config to satisfy the signature. + let cfg = test_config("http://unused.invalid"); + let result = ssi_linux( + &cfg, + "bastion.example.com".to_string(), + "ec2-user".to_string(), + Some("~/.ssh/id_ed25519".to_string()), + 22, + /* dry_run */ true, + ) + .await; + assert!( + result.is_ok(), + "dry-run should succeed; got {:?}", + result.err() + ); + } + + #[tokio::test] + async fn non_dry_run_bails_with_actionable_message() { + let cfg = test_config("http://unused.invalid"); + let err = ssi_linux( + &cfg, + "host.example.com".to_string(), + "root".to_string(), + None, + 22, + /* dry_run */ false, + ) + .await + .expect_err("non-dry-run should bail until the real impl lands"); + let msg = format!("{err:#}"); + // The error message must tell the user what to do next, not just say + // "not implemented" with no context. + assert!( + msg.contains("--dry-run"), + "error should point at --dry-run as the working flow, got: {msg}" + ); + assert!( + msg.contains("docs/INSTALL.md"), + "error should reference the design doc, got: {msg}" + ); + } +} diff --git a/src/commands/mod.rs b/src/commands/mod.rs index a8bb41c..8cc3eb6 100644 --- a/src/commands/mod.rs +++ b/src/commands/mod.rs @@ -47,6 +47,8 @@ pub mod hamr; pub mod idp; pub mod incidents; pub mod infrastructure; +#[cfg(not(target_arch = "wasm32"))] +pub mod install; pub mod integrations; pub mod investigations; pub mod kafka; diff --git a/src/main.rs b/src/main.rs index 73f8985..1ae8e54 100644 --- a/src/main.rs +++ b/src/main.rs @@ -1551,6 +1551,37 @@ enum Commands { #[command(subcommand)] action: InfraActions, }, + /// Install Datadog components on a target (host, cluster, …) + /// + /// Install Datadog observability components on a target — the unified entry + /// point for the install flows historically delivered by curl-bash one-liners + /// (`install_script_agent7.sh`, `install-ssi.sh`, etc.). + /// + /// COMMAND SHAPE: + /// pup install [args] + /// + /// what to install: ssi, agent, dbm, llm-obs, … + /// where to install: linux, k8s, … + /// + /// EXAMPLES (today): + /// # Install Datadog Agent + Single Step Instrumentation on a remote Linux host + /// pup install ssi linux --host bastion.example.com --user ec2-user --key ~/.ssh/id_ed25519 + /// + /// EXAMPLES (planned, follow-up PRs): + /// pup install agent linux --host # just the agent + /// pup install ssi k8s --context # SSI via helm + DatadogAgent CR + /// pup install dbm linux --host --db pg # Database Monitoring + /// pup install llm-obs local --runtime python # LLM Obs SDK + /// + /// STATUS: scaffolding. Command surface and module structure land in this PR; + /// the russh client + OCI registry client + installer-binary invocation are + /// follow-up work. See docs/INSTALL.md for the full plan. + #[cfg(not(target_arch = "wasm32"))] + #[command(verbatim_doc_comment)] + Install { + #[command(subcommand)] + action: InstallActions, + }, /// Manage third-party integrations /// /// Manage third-party integrations with external services. @@ -4431,6 +4462,48 @@ enum InfraHostActions { Get { hostname: String }, } +// ---- Install: Datadog component install on a target (host, cluster, …) ---- +// +// Shape: `pup install [args]`. Scaffolding only in this +// PR — the working install flows ship in follow-up PRs. See docs/INSTALL.md. +#[cfg(not(target_arch = "wasm32"))] +#[derive(Subcommand)] +enum InstallActions { + /// Install Single Step Instrumentation (Datadog Agent + APM auto-injection) + Ssi { + #[command(subcommand)] + action: InstallSsiActions, + }, +} + +#[cfg(not(target_arch = "wasm32"))] +#[derive(Subcommand)] +enum InstallSsiActions { + /// Install SSI on a remote Linux host over SSH + /// + /// Replaces the historical `bash -c "$(curl install_script_agent7.sh)"` flow + /// with a native Rust install: SSH client, signed package install (apt/yum), + /// SHA-256-verified OCI binary download for `datadog-installer`, then + /// `installer setup` to enable SSI. No remote shell scripts are executed. + Linux { + /// SSH host or `user@host` of the target machine + #[arg(long)] + host: String, + /// SSH user (default: root) + #[arg(long, default_value = "root")] + user: String, + /// Path to SSH private key + #[arg(long)] + key: Option, + /// SSH port (default: 22) + #[arg(long, default_value_t = 22)] + port: u16, + /// Print the install plan without connecting or making changes + #[arg(long)] + dry_run: bool, + }, +} + // ---- IDP (Internal Developer Portal) ---- #[derive(Subcommand)] enum IdpActions { @@ -11914,6 +11987,21 @@ async fn main_inner() -> anyhow::Result<()> { }, } } + // --- Install (Datadog component install on a target) --- + #[cfg(not(target_arch = "wasm32"))] + Commands::Install { action } => match action { + InstallActions::Ssi { action } => match action { + InstallSsiActions::Linux { + host, + user, + key, + port, + dry_run, + } => { + commands::install::ssi_linux(&cfg, host, user, key, port, dry_run).await?; + } + }, + }, // --- IDP (Internal Developer Portal) --- Commands::Idp { action } => { cfg.validate_auth()?;