Skip to content

feat(scheduler): KubernetesBackend with client-go for runtime CronJob CRUD (#162 part 2b)#170

Merged
initializ-mk merged 1 commit into
feat/issue-162-k8s-schedulerfrom
feat/issue-162-k8s-backend
Jun 15, 2026
Merged

feat(scheduler): KubernetesBackend with client-go for runtime CronJob CRUD (#162 part 2b)#170
initializ-mk merged 1 commit into
feat/issue-162-k8s-schedulerfrom
feat/issue-162-k8s-backend

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Summary

Second half of part 2 of the #162 stack. Stacked on top of #169 (`feat/issue-162-k8s-scheduler`) — the base is that branch, not main. After #169 merges I'll rebase this to main.

Adds the real `KubernetesBackend` that implements `scheduler.Backend` against `k8s.io/client-go`, plus the forge.yaml-driven backend selection in the runner. The cluster's CronJob controller takes over timing; the agent stays out of the tick loop.

Behavior

Method KubernetesBackend behavior
`Start` / `Stop` / `Reload` No-ops — cluster owns timing, no cached state
`Sync(declared)` Reconciles cluster CronJobs against declared yaml entries: create absent, update on drift, prune dropped yaml entries. Preserves LLM-sourced entries unconditionally. Same rule as FileBackend.Sync.
`Set(s)` Gated by `AllowDynamic` when source = LLM. `AllowDynamic: false` (default) returns an error referencing the config flag. yaml-source bypasses the gate.
`Delete(id)` Refuses to delete yaml-sourced CronJobs (Sync is the only declarative removal path). Gated by `AllowDynamic` for LLM-sourced.
`List` Filters by `forge.agent.id` label — unrelated CronJobs in the namespace don't appear in `schedule_list`.
`Get` Direct API Get by deterministic name.
`History` Returns empty + warns once. Audit stream's `schedule_fire` / `schedule_complete` events are the canonical source.

Backend selection

`runner.go` picks the backend off `forge.yaml`:

`scheduler.backend` Behavior
`auto` (default) / unset KubernetesBackend when in-cluster, FileBackend otherwise
`file` Always FileBackend
`kubernetes` Always KubernetesBackend; errors at startup when not in-cluster (`FORGE_IN_CLUSTER=true` overrides for tests)

The previous `syncYAMLSchedules` helper is replaced by `Backend.Sync(declaredSchedules())` which works for both modes uniformly.

Round-trip Schedule ↔ CronJob

Schedule field K8s representation
`ID` label `forge.schedule.id` + name suffix
`Cron` `spec.schedule`
`Source` label `forge.schedule.source` (`yaml` / `llm`)
`Enabled` `spec.suspend` (inverted)
`Task` annotation `forge.schedule.task`
`Skill` annotation `forge.schedule.skill`
`Channel` / `ChannelTarget` annotations
`RunCount` / `LastStatus` annotations
`LastRun` `status.lastScheduleTime` (read-only)

The in-memory CronJob the runtime builds is byte-equivalent to the YAML `scheduler.CronJobYAML` emits, so #162 part 3's manifest-applied resources don't churn against the runtime's reconcile loop.

Files

File Change
`forge-cli/runtime/scheduler_k8s_backend.go` (new, ~330 lines) `KubernetesBackend` impl + `K8sBackendConfig` + helpers
`forge-cli/runtime/scheduler_k8s_backend_test.go` (new) 9 tests against `fake.Clientset`
`forge-cli/runtime/runner.go` `selectScheduleBackend` helper; `declaredSchedules` adapter; drops the old `syncYAMLSchedules` (covered by `Backend.Sync`)
`forge-cli/go.mod` / `go.sum` `k8s.io/api`, `k8s.io/apimachinery`, `k8s.io/client-go` @ v0.36.2
`docs/deployment/scheduler-kubernetes.md` RBAC table, annotation round-trip reference, "what's not in the K8s backend" section

Test plan

Dependency footprint

Module Purpose
`k8s.io/api/batch/v1` CronJob / JobSpec types
`k8s.io/api/core/v1` PodSpec / Container / EnvVar / SecretKeyRef types
`k8s.io/apimachinery/pkg/apis/meta/v1` ObjectMeta + ListOptions
`k8s.io/apimachinery/pkg/api/errors` `IsNotFound` for the Get-or-Create pattern
`k8s.io/client-go/kubernetes` Typed clientset interface
`k8s.io/client-go/rest` `InClusterConfig` for the runtime construction
`k8s.io/client-go/kubernetes/fake` (test-only) Fake clientset for unit tests

Goreleaser binary size impact: ~+10 MB transitive, the standard cost for any K8s-aware Go binary. Forge-core stays free of client-go — the dependency is contained to forge-cli/runtime.

What's next

Part 3 wires `scheduler.CronJobYAML` into the `forge package` build pipeline so a packaged deploy produces:

  • One `CronJob.yaml` per `forge.yaml schedules[]` entry
  • A credential-less Secret template (operator populates via `forge auth secret-yaml`)
  • A Role + RoleBinding scoped to the agent's namespace

Refs #162

… CRUD (#162 part 2b)

Second half of part 2 of the #162 stack. Builds on the
ScheduleBackend interface + FileBackend refactor + manifest
helpers shipped in part 2 (PR #169). Adds the real K8s
runtime backend, dependency on k8s.io/client-go, and the
forge.yaml-driven backend selection in the runner.

forge-cli/runtime/scheduler_k8s_backend.go
  KubernetesBackend implements scheduler.Backend by delegating
  persistence + timing to the cluster's CronJob controller:

  - Start/Stop/Reload are no-ops (cluster owns timing)
  - Sync reconciles cluster CronJobs against declared yaml
    entries: create, update on drift, prune dropped yaml
    entries, PRESERVE LLM-sourced entries unconditionally
  - Set / Delete are gated by AllowDynamic (default false);
    yaml-sourced CronJobs cannot be deleted via direct Delete
    (Sync is the only removal path for declarative entries)
  - List filters by forge.agent.id label so unrelated CronJobs
    in the namespace don't appear in schedule_list output
  - History returns empty + warns once; the audit stream's
    schedule_fire/complete events are the canonical source
  - CronJobs are constructed in-memory to match the YAML the
    forge-core scheduler.CronJobYAML emits byte-for-byte, so
    runtime reconcile doesn't churn against forge package
    manifests (#162 part 3)
  - Round-trips Schedule <-> CronJob via labels (agent.id,
    schedule.id, schedule.source) and annotations (task, skill,
    channel, channel_target, run_count, last_status)
  - LastRun read from CronJob.Status.LastScheduleTime
  - NewKubernetesBackendWithClient testing seam accepts an
    explicit kubernetes.Interface (fake.Clientset in tests)

forge-cli/runtime/runner.go
  Backend selection wired off forge.yaml scheduler.backend:
  - "kubernetes" — always K8s; errors at startup when not
    in-cluster (FORGE_IN_CLUSTER=true overrides for tests)
  - "file"       — always FileBackend
  - "auto"/""    — K8s when in-cluster, file otherwise
  Drops the old syncYAMLSchedules helper now that the runner
  calls Backend.Sync(declaredSchedules()) for both modes.

go.mod
  k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2.

Tests (9 cases against fake.Clientset):
  - Sync creates CronJobs for declared entries with the
    expected labels + ConcurrencyPolicy=Forbid
  - Sync is idempotent (no churn on no-op re-run)
  - Sync updates on cron drift
  - Sync prunes yaml entries removed from the manifest
  - Sync preserves LLM-sourced entries on yaml-only re-Sync
  - Dynamic Set is gated by AllowDynamic with an actionable
    error referencing the config flag
  - Dynamic Delete of a yaml-sourced schedule is refused with
    an error pointing operators at the manifest
  - List filters by forge.agent.id (unrelated CronJobs in the
    namespace are not returned)
  - History returns empty + does not error

Docs:
  - docs/deployment/scheduler-kubernetes.md gains the RBAC
    table, the annotation round-trip reference, and the
    "what's not in the K8s backend" section flagging
    schedule_history deferral + cross-namespace out-of-scope
    + token rotation as a follow-up.

Refs #162
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant