Symptom
When forge runs in-cluster and `scheduler.kubernetes.service_url` is not set in `forge.yaml`, the agent fails to start with:
```
Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required
```
Operators hit this on any cluster deploy that uses the default `scheduler.backend: auto` (picks `kubernetes` when `scheduler.InCluster()` reports true) and didn't explicitly add a `service_url`.
Why it's a footgun
The build-time schedule-manifest stage already knows how to derive a sensible default:
```go
// forge-cli/build/schedule_manifest_stage.go:70-82
serviceURL := cfg.Scheduler.Kubernetes.ServiceURL
if serviceURL == "" {
port := 8080
if bc.Spec != nil && bc.Spec.Runtime != nil && bc.Spec.Runtime.Port != 0 {
port = bc.Spec.Runtime.Port
}
serviceURL = fmt.Sprintf("http://%s.%s.svc:%d/", agentID, ns, port)
}
```
The runtime at `forge-cli/runtime/scheduler_k8s_backend.go:104-106` does not:
```go
if cfg.ServiceURL == "" {
return nil, fmt.Errorf("kubernetes scheduler backend: scheduler.kubernetes.service_url is required")
}
```
So an operator who runs `forge package` once gets working CronJob manifests with the right Service DNS — but if their pod starts without an explicit `service_url` (e.g. they didn't re-render forge.yaml from the build output), the runtime refuses to come up. Two adjacent code paths, two different decisions for the same missing field.
Proposed fix
Mirror the build-stage default in the runtime:
- `NewKubernetesBackend` (and the `-WithClient` test seam) derives `http://<agent_id>..svc:/` when `cfg.ServiceURL` is empty.
- `K8sBackendConfig` gains a `Port int` field; `selectScheduleBackend` plumbs `r.cfg.Port` into it.
- The `service_url is required` error path is removed (there's no scenario where we can't derive a default in-cluster).
- Operator override semantics unchanged: an explicit `scheduler.kubernetes.service_url` always wins.
- Tests pin both branches (default-derivation + explicit override unchanged).
Repro
`forge.yaml` without a `scheduler:` block at all; deploy the image to any K8s namespace. Pod logs:
```
{"level":"info","msg":"schedule tools registered"}
Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required
```
Workarounds today
Either set `scheduler.backend: file` (skip K8s entirely) or set `scheduler.kubernetes.service_url` to the same value the build stage would derive.
Scope
Code: `forge-cli/runtime/scheduler_k8s_backend.go`, `forge-cli/runtime/runner.go` (one-line plumbing).
Tests: `forge-cli/runtime/scheduler_k8s_backend_test.go`.
Docs: `docs/deployment/scheduler-kubernetes.md` and `docs/core-concepts/scheduling.md` to mention the new default.
Symptom
When forge runs in-cluster and `scheduler.kubernetes.service_url` is not set in `forge.yaml`, the agent fails to start with:
```
Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required
```
Operators hit this on any cluster deploy that uses the default `scheduler.backend: auto` (picks `kubernetes` when `scheduler.InCluster()` reports true) and didn't explicitly add a `service_url`.
Why it's a footgun
The build-time schedule-manifest stage already knows how to derive a sensible default:
```go
// forge-cli/build/schedule_manifest_stage.go:70-82
serviceURL := cfg.Scheduler.Kubernetes.ServiceURL
if serviceURL == "" {
port := 8080
if bc.Spec != nil && bc.Spec.Runtime != nil && bc.Spec.Runtime.Port != 0 {
port = bc.Spec.Runtime.Port
}
serviceURL = fmt.Sprintf("http://%s.%s.svc:%d/", agentID, ns, port)
}
```
The runtime at `forge-cli/runtime/scheduler_k8s_backend.go:104-106` does not:
```go
if cfg.ServiceURL == "" {
return nil, fmt.Errorf("kubernetes scheduler backend: scheduler.kubernetes.service_url is required")
}
```
So an operator who runs `forge package` once gets working CronJob manifests with the right Service DNS — but if their pod starts without an explicit `service_url` (e.g. they didn't re-render forge.yaml from the build output), the runtime refuses to come up. Two adjacent code paths, two different decisions for the same missing field.
Proposed fix
Mirror the build-stage default in the runtime:
Repro
`forge.yaml` without a `scheduler:` block at all; deploy the image to any K8s namespace. Pod logs:
```
{"level":"info","msg":"schedule tools registered"}
Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required
```
Workarounds today
Either set `scheduler.backend: file` (skip K8s entirely) or set `scheduler.kubernetes.service_url` to the same value the build stage would derive.
Scope
Code: `forge-cli/runtime/scheduler_k8s_backend.go`, `forge-cli/runtime/runner.go` (one-line plumbing).
Tests: `forge-cli/runtime/scheduler_k8s_backend_test.go`.
Docs: `docs/deployment/scheduler-kubernetes.md` and `docs/core-concepts/scheduling.md` to mention the new default.