Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
5053411
feat: add aggregationMethod field to PrometheusSource (sum, count, av…
Copilot Jun 18, 2026
d32b360
fix: correctly track max for negative values in aggregateRangeValues
Copilot Jun 18, 2026
90494fe
feat: add e2e tests for aggregation methods, update discovery docs, a…
Copilot Jun 18, 2026
210edf3
Fix flaky discovery aggregation e2e assertion
Copilot Jun 18, 2026
3c10c73
Clarify aggregation assertion comment in e2e test
Copilot Jun 18, 2026
c9dbdc2
Remove brittle score-comparison assertion from e2e test
Copilot Jun 18, 2026
d9b9423
Clarify Prometheus sample-count comment in e2e test
Copilot Jun 18, 2026
db495c0
feat: add queryType enum field, use enum types for AggregationMethod …
Copilot Jun 19, 2026
18eb34a
fix: consistent queryType defaulting and doc accuracy
Copilot Jun 19, 2026
c6c1518
feat: change queryType default to range, add e2e and unit tests for b…
Copilot Jun 19, 2026
c99cbf9
feat: add convention to document resources explored with timing and t…
Copilot Jun 19, 2026
427b637
fix: extract prometheusStatusSuccess constant and add lint convention
Copilot Jun 19, 2026
adcb937
refactor: change Step field from string to metav1.Duration
Copilot Jun 19, 2026
c157ee1
feat: add "none" aggregationMethod as default — allows self-contained…
Copilot Jun 19, 2026
80a48cf
refactor: make aggregationMethod nullable instead of using "none" sen…
Copilot Jun 19, 2026
95798b0
docs: add SVG line graphs illustrating query types and aggregation me…
Copilot Jun 19, 2026
911790e
docs: enhance SVG diagrams with vertical lines, shading, and explicit…
Copilot Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ make docs-gen # regenerate AI docs from source
- Pod builder is a pure function in internal/podbuilder/ (no k8s client)
- Pacing logic lives exclusively in internal/pacing/
- Don't manually edit generated files — run make docs-gen
- Documentation must never contain unverified information — verify all examples against a real cluster before merging
- Always document which resources you looked at in which order (short summary + time spent + tokens consumed + context consumed)
- Always lint and fix linter issues locally before pushing code

## Testing Patterns

Expand All @@ -58,6 +61,7 @@ api/v1alpha1 — Package v1alpha1 contains API Schema definitions for the drop v
internal/controller — Package controller implements Kubernetes reconcilers for the drop CRDs (one per Kind).
imports: api/v1alpha1, internal/discovery, internal/metrics, internal/pacing, internal/podbuilder
internal/discovery — Package discovery implements image discovery from registries and Prometheus metrics.
imports: api/v1alpha1
internal/metrics — Package metrics registers Prometheus metrics for the drop operator.
internal/pacing — Package pacing implements the shared rate-limiting engine for image pull scheduling.
imports: api/v1alpha1, internal/podbuilder
Expand Down
62 changes: 55 additions & 7 deletions api/v1alpha1/discoverypolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,40 @@ type DiscoverySource struct {
SecretRef *corev1.LocalObjectReference `json:"secretRef,omitempty"`
}

// AggregationMethod defines how range query values are aggregated into a score.
// +kubebuilder:validation:Enum=sum;count;avg;max
type AggregationMethod string

const (
// AggregationSum adds all data-point values over the lookback window.
// Use when the query returns a gauge/counter and the total magnitude matters
// (e.g., total memory usage across the window).
AggregationSum AggregationMethod = "sum"
// AggregationCount counts the number of non-zero data points over the lookback window.
// Use when you want to rank by how frequently an image appears
// (e.g., number of sample intervals where the image was running).
AggregationCount AggregationMethod = "count"
// AggregationAvg computes the arithmetic mean of all data-point values.
// Use when you want the average magnitude regardless of how many samples exist.
AggregationAvg AggregationMethod = "avg"
// AggregationMax takes the highest single data-point value.
// Use when peak usage is more relevant than cumulative usage.
AggregationMax AggregationMethod = "max"
)

// QueryType defines how the Prometheus query is executed.
// +kubebuilder:validation:Enum=range;instant
type QueryType string

const (
// QueryTypeRange uses /api/v1/query_range with a time window defined by lookback.
// Returns multiple data points which are aggregated using the aggregationMethod.
QueryTypeRange QueryType = "range"
// QueryTypeInstant uses /api/v1/query for a single point-in-time result.
// The returned value is used directly as the score.
QueryTypeInstant QueryType = "instant"
)

// PrometheusSource defines Prometheus query configuration for image discovery.
type PrometheusSource struct {
// Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics).
Expand All @@ -65,18 +99,32 @@ type PrometheusSource struct {
// Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image)
// +kubebuilder:validation:MinLength=1
Query string `json:"query"`
// Lookback is the time window for aggregation. When set, the operator uses query_range
// (start=now-lookback, end=now) and sums all returned values per image to produce a score.
// When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score.
// QueryType controls how the Prometheus query is executed.
// "range" uses /api/v1/query_range with a time window defined by lookback.
// "instant" uses /api/v1/query for a single point-in-time result.
// Default: "range".
// +kubebuilder:default="range"
// +optional
QueryType QueryType `json:"queryType,omitempty"`
// Lookback is the time window for range queries. When queryType is "range",
// the operator queries (start=now-lookback, end=now) and aggregates all returned values per image.
// The aggregation function is controlled by the aggregationMethod field.
// Required when queryType is "range". Ignored when queryType is "instant".
// Example: "168h" (7 days), "24h", "72h"
// +optional
Lookback *metav1.Duration `json:"lookback,omitempty"`
// AggregationMethod controls how data points from a range query are combined into a single score.
// Only used when queryType is "range". Ignored for instant queries.
// When not set (nil), Drop uses the last data-point value directly — use this when your PromQL
// already contains aggregation functions (e.g., count_over_time, topk).
// Options: "sum", "count", "avg", "max"
// +optional
AggregationMethod *AggregationMethod `json:"aggregationMethod,omitempty"`
// Step is the resolution step for range queries (only used when lookback is set).
// Smaller steps = more data points = more accurate sums but higher Prometheus load.
// Default: "5m". Example: "1m", "15m"
// +kubebuilder:default="5m"
// Smaller steps = more data points = more accurate aggregation but higher Prometheus load.
// Default: 5m. Example: "1m", "15m"
// +optional
Step string `json:"step,omitempty"`
Step *metav1.Duration `json:"step,omitempty"`
}

// RegistrySource defines OCI registry tag listing configuration for image discovery.
Expand Down
10 changes: 10 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 30 additions & 6 deletions config/crd/bases/drop.corewire.io_discoverypolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,19 @@ spec:
prometheus:
description: Prometheus contains the configuration when type=prometheus.
properties:
aggregationMethod:
description: |-
AggregationMethod controls how data points from a range query are combined into a single score.
Only used when queryType is "range". Ignored for instant queries.
When not set (nil), Drop uses the last data-point value directly — use this when your PromQL
already contains aggregation functions (e.g., count_over_time, topk).
Options: "sum", "count", "avg", "max"
enum:
- sum
- count
- avg
- max
type: string
endpoint:
description: |-
Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics).
Expand All @@ -94,9 +107,10 @@ spec:
type: string
lookback:
description: |-
Lookback is the time window for aggregation. When set, the operator uses query_range
(start=now-lookback, end=now) and sums all returned values per image to produce a score.
When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score.
Lookback is the time window for range queries. When queryType is "range",
the operator queries (start=now-lookback, end=now) and aggregates all returned values per image.
The aggregation function is controlled by the aggregationMethod field.
Required when queryType is "range". Ignored when queryType is "instant".
Example: "168h" (7 days), "24h", "72h"
type: string
query:
Expand All @@ -107,12 +121,22 @@ spec:
Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image)
minLength: 1
type: string
queryType:
default: range
description: |-
QueryType controls how the Prometheus query is executed.
"range" uses /api/v1/query_range with a time window defined by lookback.
"instant" uses /api/v1/query for a single point-in-time result.
Default: "range".
enum:
- range
- instant
type: string
step:
default: 5m
description: |-
Step is the resolution step for range queries (only used when lookback is set).
Smaller steps = more data points = more accurate sums but higher Prometheus load.
Default: "5m". Example: "1m", "15m"
Smaller steps = more data points = more accurate aggregation but higher Prometheus load.
Default: 5m. Example: "1m", "15m"
type: string
required:
- endpoint
Expand Down
39 changes: 35 additions & 4 deletions docs/content/docs/discovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,27 @@ count(container_memory_working_set_bytes{

Hand-maintained image lists do not keep up in environments where automation (for example Renovate) ships new image versions every day. A practical pattern is to rank images by observed CI usage over a rolling window.

The `lookback` field tells Drop to use Prometheus `query_range` API over that time window and sum all returned values per image to produce a total usage score:
The `queryType` field controls whether Drop sends an instant or range query (default: `range`). When set to `range`, the `lookback` field defines the time window and `aggregationMethod` controls how the returned data points are combined into a single score per image.

#### Query Types

{{< figure src="/drop/images/query-type-range.svg" alt="Range query: multiple data points over a lookback window" >}}

{{< figure src="/drop/images/query-type-instant.svg" alt="Instant query: single point-in-time value used as score" >}}

#### Aggregation Methods

When using `queryType: range`, the `aggregationMethod` field determines how the returned data points are reduced into a single score:

{{< figure src="/drop/images/aggregation-methods.svg" alt="Aggregation methods: nil (last value), sum, count, avg, max" >}}

| Method | Behavior | Use when |
|--------|----------|----------|
| *(not set)* | Uses the last data-point value directly | Your PromQL already aggregates (e.g. `count_over_time`, `topk`) |
| `sum` | Adds all data-point values over the window | Total cumulative usage matters (e.g. total memory consumed) |
| `count` | Counts the number of data points returned | You want to rank by how frequently an image appears |
| `avg` | Arithmetic mean of all data-point values | Average magnitude matters regardless of sample count |
| `max` | Highest single data-point value | Peak usage is more relevant than cumulative |

```yaml
apiVersion: drop.corewire.io/v1alpha1
Expand All @@ -80,8 +100,10 @@ spec:
- type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: range # default — use query_range API
lookback: 168h # 7 days
step: 5m
aggregationMethod: sum # rank by total usage over 7 days (omit to use last value directly)
query: |
count(
container_memory_working_set_bytes{
Expand All @@ -95,7 +117,9 @@ Use this when you want DiscoveryPolicy to continuously follow what your GitLab r

#### Field-by-field explanation

- `lookback: 168h` — Drop uses `query_range` with start=now-7d, end=now, and sums all returned values per image to rank by total usage over the window.
- `queryType: range` — tells Drop to use the Prometheus `query_range` API. This is the default. Set to `instant` for a single point-in-time query.
- `lookback: 168h` — defines the time window for range queries (start=now-7d, end=now). Required when `queryType` is `range`.
- `aggregationMethod: sum` — sums all data-point values to rank by total usage. When omitted (nil), the last value is used directly — ideal for self-contained PromQL queries. Other options: `count` to rank by number of appearances, `avg` for average magnitude, or `max` for peak value.
- `step: 5m` — resolution step for the range query (controls how many data points Prometheus returns).
- `count(...) by (image)` — counts the number of running containers per image to rank by popularity.
- `container_memory_working_set_bytes{...}` — source metric used to observe running containers.
Expand All @@ -108,9 +132,15 @@ Use this when you want DiscoveryPolicy to continuously follow what your GitLab r

For each unique `image` label, Drop uses the Prometheus query result value as the score.

When `lookback` is not set (the default), Drop sends an instant query (`/api/v1/query`) and uses the returned value directly. When `lookback` is set (e.g. `lookback: 168h`), Drop uses a range query (`/api/v1/query_range`) over that window and **sums all returned values** to produce the score. This means images that appear more frequently over the window get a higher score.
When `queryType` is `range` (the default), Drop uses a range query (`/api/v1/query_range`) over the `lookback` window and aggregates data points using the `aggregationMethod`. When `queryType` is `instant`, Drop sends an instant query (`/api/v1/query`) and uses the returned value directly:

- *(not set)*: uses the last data-point value — ideal when your PromQL already contains aggregation functions like `count_over_time` or `topk`
- `sum`: adds all data-point values — images with higher cumulative usage score higher
- `count`: counts the number of data points — images that appear more frequently score higher
- `avg`: averages data-point values — images with higher average value score higher
- `max`: takes the peak value — images with the highest single observation score higher

The example above uses `lookback: 168h` so Drop handles the 7-day windowing via the API — no need to embed `[7d]` in PromQL.
The example above uses `queryType: range` with `lookback: 168h` so Drop handles the 7-day windowing via the API — no need to embed `[7d]` in PromQL.

If Prometheus returns:

Expand Down Expand Up @@ -156,6 +186,7 @@ spec:
- type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: instant
query: |
count(container_memory_working_set_bytes{
container!="", container!="POD",
Expand Down
1 change: 1 addition & 0 deletions docs/content/docs/reference/_generated_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ graph LR
internal/controller --> internal/metrics
internal/controller --> internal/pacing
internal/controller --> internal/podbuilder
internal/discovery --> api/v1alpha1
internal/pacing --> api/v1alpha1
internal/pacing --> internal/podbuilder
internal/podbuilder --> api/v1alpha1
Expand Down
6 changes: 4 additions & 2 deletions docs/content/docs/reference/_generated_crds.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,10 @@ PrometheusSource defines Prometheus query configuration for image discovery.
|-------|------|----------|---------|-------------|
| `endpoint` | `string` | Yes | — | Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics). Example: "http://prometheus.monitoring.svc:9090", "https://mimir.example.com" |
| `query` | `string` | Yes | — | Query is the PromQL expression. It MUST return results with an "image" label — that label value is used as the discovered image reference. The query result value is used as the ranking score (higher = more relevant). Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image) |
| `lookback` | `*metav1.Duration` | No | — | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and sums all returned values per image to produce a score. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" |
| `step` | `string` | No | 5m | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate sums but higher Prometheus load. Default: "5m". Example: "1m", "15m" |
| `queryType` | `QueryType` | No | range | QueryType controls how the Prometheus query is executed. "range" uses /api/v1/query_range with a time window defined by lookback. "instant" uses /api/v1/query for a single point-in-time result. Default: "range". |
| `lookback` | `*metav1.Duration` | No | — | Lookback is the time window for range queries. When queryType is "range", the operator queries (start=now-lookback, end=now) and aggregates all returned values per image. The aggregation function is controlled by the aggregationMethod field. Required when queryType is "range". Ignored when queryType is "instant". Example: "168h" (7 days), "24h", "72h" |
| `aggregationMethod` | `*AggregationMethod` | No | — | AggregationMethod controls how data points from a range query are combined into a single score. Only used when queryType is "range". Ignored for instant queries. When not set (nil), Drop uses the last data-point value directly — use this when your PromQL already contains aggregation functions (e.g., count_over_time, topk). Options: "sum", "count", "avg", "max" |
| `step` | `*metav1.Duration` | No | — | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate aggregation but higher Prometheus load. Default: 5m. Example: "1m", "15m" |

### RegistrySource

Expand Down
2 changes: 0 additions & 2 deletions docs/go.mod
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
module github.com/corewire/drop/docs

go 1.26.0

require github.com/imfing/hextra v0.12.3 // indirect
2 changes: 0 additions & 2 deletions docs/go.sum
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
github.com/imfing/hextra v0.12.3 h1:DZHY2rUWYteyzjlHi9r4n7Bb5e2Q+6LXe4C1Dqn0ZjM=
github.com/imfing/hextra v0.12.3/go.mod h1:vi+yhpq8YPp/aghvJlNKVnJKcPJ/VyAEcfC1BSV9ARo=
Loading
Loading