diff --git a/packaging/src/kubernetes/README.md b/packaging/src/kubernetes/README.md index 1fc11623240c..f52da4db6eb6 100644 --- a/packaging/src/kubernetes/README.md +++ b/packaging/src/kubernetes/README.md @@ -505,19 +505,527 @@ kubectl get hiveclusters kubectl describe hivecluster hive ``` +--- + +## Autoscaling + +The operator supports metric-based autoscaling for all four Hive components using +[KEDA](https://keda.sh/) ScaledObjects and Kubernetes-native HPA. Autoscaling is +opt-in per component and designed for **zero query failures** during scale-down. + +### Prerequisites + +- [KEDA](https://keda.sh/) installed in the cluster +- [Prometheus](https://prometheus.io/) scraping Hive pod metrics (for HS2, HMS, LLAP custom metrics) +- Kubernetes metrics-server (for CPU-based triggers on Tez AM) +- [KEDA HTTP Add-on](https://github.com/kedacore/http-add-on) — **required for `minReplicas: 0`**, enables automatic wake-from-zero for HS2 + +### Installing KEDA + +KEDA must be installed **before** enabling autoscaling on any Hive component. +The operator creates KEDA `ScaledObject` custom resources which require the KEDA +CRDs to be present on the cluster. + +```bash +# Add the KEDA Helm repo +helm repo add kedacore https://kedacore.github.io/charts +helm install keda kedacore/keda --namespace keda --create-namespace --wait +``` + +Verify KEDA is running: + +```bash +kubectl get pods -n keda +# Expected: keda-operator, keda-metrics-apiserver, keda-admission-webhooks +kubectl get crd | grep keda +# Expected: scaledobjects.keda.sh, scaledjobs.keda.sh, triggerauthentications.keda.sh, etc. +``` + +**For HS2 scale-to-zero** (`minReplicas: 0`), install the KEDA HTTP Add-on: + +```bash +helm install http-add-on kedacore/keda-add-ons-http \ + --namespace keda --wait +``` + +Verify the interceptor is running: + +```bash +kubectl get pods -n keda -l app=keda-add-ons-http-interceptor-proxy +# Expected: keda-add-ons-http-interceptor-proxy-... Running +``` + +> **Note:** The HTTP Add-on is required when `minReplicas: 0`. The operator creates +> an `InterceptorRoute` CRD that configures the interceptor proxy to route traffic +> to HS2. When HS2 has zero pods, the interceptor holds incoming requests and triggers +> scale-up via an `external-push` trigger on the HS2 ScaledObject. The first request +> takes ~30-60s while the pod starts. + +**For Prometheus-based triggers** (HS2, HMS, LLAP), install Prometheus: + +```bash +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm install prometheus prometheus-community/prometheus \ + --namespace monitoring --create-namespace --wait +``` + +> **Note:** If autoscaling is enabled in the HiveCluster spec but KEDA is not +> installed, the operator will fail to reconcile with errors like +> `"Could not find the metadata for the given apiVersion and kind"`. +> Always install KEDA before setting `autoscaling.enabled: true`. + +### Graceful Scale-Down Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Scale Down Flow │ +├─────────────────────────────────────────────────────────────────────┤ +│ 1. KEDA reduces desired replicas (cooldown elapsed, metric below │ +│ threshold) │ +│ 2. PodDisruptionBudget ensures minAvailable=1 (at least one pod │ +│ always running) │ +│ 3. Kubernetes sends SIGTERM to selected pod │ +│ 4. preStop hook runs: │ +│ - HS2: deregisters from ZK, drains open sessions, kills JVM │ +│ - HMS: kills JVM (stateless HTTP — no drain needed) │ +│ - LLAP: waits until all executors become idle, kills JVM │ +│ - TezAM: waits for current DAG completion, kills JVM │ +│ 5. terminationGracePeriodSeconds = gracePeriodSeconds (safety cap) │ +│ 6. Pod terminates immediately once drain completes (does NOT wait │ +│ the full grace period — it's only the upper safety bound) │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +> **Note:** Shell entrypoints (PID 1) in containers don't forward SIGTERM to child +> processes. The preStop hook explicitly sends SIGTERM to the Hive/Tez Java process +> after drain completes, ensuring prompt shutdown without waiting for the grace period +> to expire. + +### Scaling Timers + +The autoscaling system uses four independent timing controls: + +| Timer | Config Field | Default | Purpose | +|-------|-------------|---------|---------| +| **Metrics scrape interval** | `metricsScrapeIntervalSeconds` | `10` | How often Prometheus scrapes the pod's metrics endpoint. This is the **biggest bottleneck** for autoscaling reaction time — KEDA cannot detect metric changes faster than the scrape interval. | +| **Scale-up stabilization** | `scaleUpStabilizationSeconds` | `60` | HPA window: picks the highest recommendation within this period before scaling up. Prevents flapping when metrics oscillate. Set to `0` for LLAP and TezAM (reactive dependents). | +| **Scale-down stabilization** | `scaleDownStabilizationSeconds` | `300` | HPA window: picks the highest recommendation within this period before scaling down. Prevents premature removal of pods during temporary load dips. | +| **KEDA cooldown** | `cooldownSeconds` | `300-900` | Time after **all** KEDA triggers go inactive (metric = 0) before KEDA scales from 1→0. Only relevant when `minReplicas: 0`. HPA handles N→1 transitions using the stabilization window. | + +**How they interact:** +- Load spike detected → Prometheus scrapes metric within `metricsScrapeIntervalSeconds` → HPA waits `scaleUpStabilizationSeconds` then scales up +- Load drops → HPA waits `scaleDownStabilizationSeconds` then scales down (N→1) +- All triggers inactive → KEDA waits `cooldownSeconds` then scales 1→0 + +**Tuning reaction time:** With defaults (`metricsScrapeIntervalSeconds: 10`, `scaleUpStabilizationSeconds: 0` for LLAP/TezAM), scale-up latency is ~15s (one scrape + KEDA polling). For HS2 with `scaleUpStabilizationSeconds: 60`, expect ~70s. Reducing `metricsScrapeIntervalSeconds` below 10 gives diminishing returns and increases Prometheus load. + +### Per-Component Scaling Logic + +| Component | Scale-Up Trigger | Scale-Down Trigger | Native Metric | +|-----------|-----------------|-------------------|---------------| +| **HiveServer2** | `sum(hs2_open_sessions)` > scaleUpThreshold **OR** CPU > targetCpuValue | Sessions below threshold **AND** CPU below activationCpuValue | `hs2_open_sessions` (sum across pods) | +| **Metastore** | API request rate > scaleUpThreshold **OR** CPU > targetCpuValue | Request rate below threshold **AND** CPU below activationCpuValue | `api_*_total` (manual delta for Prometheus 3.x compatibility) | +| **LLAP** | Total busy slots > scaleUpThreshold (queued + busy executors) | All executors idle + no HS2 sessions | `NumQueuedRequests`, `NumExecutorsConfigured`, `NumExecutorsAvailable` | +| **Tez AM** | max(`sum(hs2_open_sessions)`, `count(HS2 pods)` x `sessions.per.default.queue`) | All HS2 sessions closed | `hs2_open_sessions` (demand-driven, no CPU trigger) | + +**TezAM Scaling Model:** TezAM uses demand-driven scaling with two KEDA triggers (max wins): +1. **Session demand** — `sum(hs2_open_sessions)`: scales to match the total number of + concurrent sessions across all HS2 pods (each session needs its own exclusive TezAM). +2. **Pre-warm** — `count(HS2 pods) × hive.server2.tez.sessions.per.default.queue` (default 1): + ensures every HS2 pod has enough TezAM sessions pre-claimed from ZooKeeper before queries arrive. + +KEDA takes the maximum desired replicas across both triggers. This ensures TezAM capacity +is always sufficient for both current demand and eager session pre-warming. No CPU-based +trigger is used — TezAM scaling is purely demand-driven from HS2 metrics. + +### Scale-to-Zero Architecture + +When `minReplicas: 0` is configured (default for HS2, LLAP, TezAM), the cluster +scales down to zero pods when completely idle. The operator uses a **unified +ScaledObject + InterceptorRoute** architecture — a single KEDA ScaledObject per +component handles both Prometheus-based scaling and wake-from-zero, while an +`InterceptorRoute` (from the KEDA HTTP Add-on) provides routing-only configuration +without creating a conflicting second ScaledObject. + +``` + Scale-to-Zero (Idle Detection) + + 1. No open sessions/queries for cooldownPeriod seconds + → KEDA detects all triggers inactive + → scales HS2 to 0 (idleReplicaCount) + + 2. LLAP/TezAM ScaledObjects see hs2_open_sessions = 0 + → activation triggers inactive for cooldownPeriod + → scale LLAP and TezAM to 0 + + 3. HMS stays at minReplicas=1 (always available) + +``` + +``` + Wake-from-Zero (with KEDA HTTP Add-on) + + 1. Beeline connects → KEDA HTTP interceptor proxy queues the + request and triggers HS2 scale-up via external-push trigger + + 2. HS2 pod starts, reports hs2_open_sessions > 0 to Prometheus + + 3. KEDA detects cross-component activation trigger: + - LLAP ScaledObject sees hs2_open_sessions > 0 → scales up + - TezAM ScaledObject sees hs2_open_sessions > 0 → scales up + + 4. Query executes once LLAP/TezAM pods are ready + +``` + +The HS2 ScaledObject combines three trigger types in a single resource: +- **Prometheus trigger** (`sum(hs2_open_sessions)`) — session-aware scaling using total + session count across all pods (`sum()` prevents premature scale-down of pods with + active sessions; `desired = ceil(sum / threshold)`) +- **CPU trigger** (`AverageValue` in millicores) — load-based scaling when `targetCpuValue` is configured +- **external-push trigger** — wake-from-zero via the KEDA HTTP Add-on interceptor + +**Session protection:** The HS2 Service uses `sessionAffinity: ClientIP` to ensure +beeline clients always reach the same pod. The preStop hook deregisters the pod from +ZooKeeper (preventing new sessions) and waits for `hs2_open_sessions` to drain to 0 +before terminating. The `gracePeriodSeconds` (default 3600s) is a safety cap — the pod +terminates immediately once sessions drain, not after the full grace period. + +The `InterceptorRoute` CRD (`http.keda.sh/v1beta1`) configures only the interceptor +routing (host matching, backend target) without auto-creating a ScaledObject — this +avoids the dual-HPA conflict that `HTTPScaledObject` would cause. + +> **Important:** Automatic wake-from-zero requires the KEDA HTTP Add-on. Traffic +> must flow through the interceptor proxy (via Ingress or port-forward). Without the +> HTTP Add-on, HS2 must be manually woken (`kubectl scale deployment/hive-hiveserver2 --replicas=1`). +> LLAP and TezAM wake automatically once HS2 reports open sessions. See +> [Connect to HiveServer2 > Connecting with Scale-to-Zero](#connecting-with-scale-to-zero-minreplicas--0) +> for setup instructions. + +**Component-specific behavior:** + +| Component | minReplicas | Scale-to-Zero Trigger | Wake Trigger | +|-----------|-------------|----------------------|--------------| +| **HS2** | 0 | `hs2_open_sessions = 0` for cooldown | HTTP request via KEDA interceptor (`external-push`) | +| **HMS** | 1 | Never (always running) | N/A | +| **LLAP** | 0 | `hs2_open_sessions = 0` for cooldown | `hs2_open_sessions > 0` (cross-component) | +| **TezAM** | 0 | No HS2 pods with open sessions | `hs2_open_sessions > 0` (cross-component, demand-driven) | + +### Enabling Autoscaling + +**CLI (with Ozone storage backend):** + +Each component has sensible per-component defaults (see [Configuration Reference](#configuration-reference)). +Only `enabled=true` is needed to turn on autoscaling: + +```bash +helm install hive ./helm/hive-operator \ + --set cluster.database.type=postgres \ + --set cluster.database.url="jdbc:postgresql://postgres-postgresql:5432/metastore" \ + --set cluster.database.driver="org.postgresql.Driver" \ + --set cluster.database.username=hive \ + --set cluster.database.passwordSecretRef.name=hive-db-secret \ + --set cluster.database.passwordSecretRef.key=password \ + --set cluster.database.driverJarUrl="https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar" \ + --set cluster.zookeeper.quorum="zookeeper:2181" \ + --set cluster.storage.coreSiteOverrides."fs\.defaultFS"="s3a://hive" \ + --set cluster.storage.coreSiteOverrides."fs\.s3a\.endpoint"="http://ozone-s3g-rest:9878" \ + --set-string cluster.storage.coreSiteOverrides."fs\.s3a\.path\.style\.access"=true \ + --set 'cluster.storage.envVars[0].name=HADOOP_OPTIONAL_TOOLS' \ + --set 'cluster.storage.envVars[0].value=hadoop-aws' \ + --set 'cluster.storage.envVars[1].name=AWS_ACCESS_KEY_ID' \ + --set 'cluster.storage.envVars[1].value=ozone' \ + --set 'cluster.storage.envVars[2].name=AWS_SECRET_ACCESS_KEY' \ + --set 'cluster.storage.envVars[2].value=ozone' \ + --set cluster.hiveServer2.autoscaling.enabled=true \ + --set cluster.metastore.autoscaling.enabled=true \ + --set cluster.llap.autoscaling.enabled=true \ + --set cluster.tezAm.autoscaling.enabled=true +``` + +**Values file (for customizing beyond defaults):** + +```yaml +# values-autoscaling.yaml — only override what you need +cluster: + database: + type: postgres + url: "jdbc:postgresql://postgres-postgresql:5432/metastore" + driver: "org.postgresql.Driver" + username: hive + passwordSecretRef: + name: hive-db-secret + key: password + driverJarUrl: "https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar" + + zookeeper: + quorum: "zookeeper:2181" + + storage: + coreSiteOverrides: + fs.defaultFS: "s3a://hive" + fs.s3a.endpoint: "http://ozone-s3g-rest:9878" + fs.s3a.path.style.access: "true" + envVars: + - name: HADOOP_OPTIONAL_TOOLS + value: "hadoop-aws" + - name: AWS_ACCESS_KEY_ID + value: "ozone" + - name: AWS_SECRET_ACCESS_KEY + value: "ozone" + + hiveServer2: + replicas: 10 # Acts as maxReplicas when autoscaling is enabled + autoscaling: + enabled: true + # minReplicas: 0 # default — scale to zero when idle (requires KEDA HTTP Add-on) + # scaleUpThreshold: 80 # default — avg open sessions per pod triggering scale-up + # cooldownSeconds: 600 # default — KEDA 1→0 cooldown after all triggers inactive + # scaleUpStabilizationSeconds: 60 # default — HPA scale-up window + # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window + # metricsScrapeIntervalSeconds: 10 # default — Prometheus scrape interval (lower = faster reaction) + + metastore: + replicas: 6 # Acts as maxReplicas when autoscaling is enabled + autoscaling: + enabled: true + # minReplicas: 0 # default — scale to zero when no connections + # scaleUpThreshold: 75 # default — API request rate (req/s) triggering scale-up + # cooldownSeconds: 300 # default — KEDA 1→0 cooldown after all triggers inactive + # scaleUpStabilizationSeconds: 60 # default — HPA scale-up window + # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window + # gracePeriodSeconds: 60 # default — fast drain (HMS is stateless) + # metricsScrapeIntervalSeconds: 10 # default — Prometheus scrape interval + + llap: + replicas: 8 # Acts as maxReplicas when autoscaling is enabled + autoscaling: + enabled: true + # minReplicas: 0 # default — scale to zero when no HS2 sessions + # scaleUpThreshold: 1 # default — total busy slots (queued+running) triggering scale-up + # cooldownSeconds: 900 # default — KEDA 1→0 cooldown (scaling down destroys cache) + # scaleUpStabilizationSeconds: 60 # default — HPA scale-up window + # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window + # gracePeriodSeconds: 600 # default — 10 min drain for in-flight fragments + # metricsScrapeIntervalSeconds: 10 # default — Prometheus scrape interval (lower = faster reaction) + + tezAm: + replicas: 10 # Acts as maxReplicas when autoscaling is enabled + autoscaling: + enabled: true + # minReplicas: 0 # default — scale to zero when no HS2 sessions + # scaleUpThreshold: 1 # default — threshold for demand metric (1 = match HS2 pod count) + # scaleUpStabilizationSeconds: 60 # default — HPA scale-up window + # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window + # gracePeriodSeconds: 120 # default — 2 min drain for DAG completion + # metricsScrapeIntervalSeconds: 10 # default — Prometheus scrape interval (lower = faster reaction) +``` + +```bash +helm install hive ./helm/hive-operator -f values-autoscaling.yaml +``` + +When autoscaling is enabled, the operator automatically: +- Deploys the Prometheus JMX Exporter agent sidecar (port 9404, `/metrics`) +- Enables `hive.server2.metrics.enabled` / `metastore.metrics.enabled` (JMX reporter) +- Adds Prometheus scrape annotations to pods (including `prometheus.io/scrape-interval` for fast reaction) +- Creates KEDA ScaledObjects with the configured thresholds +- Creates PodDisruptionBudgets (minAvailable: 1) +- Configures preStop lifecycle hooks for graceful drain +- Sets `terminationGracePeriodSeconds` to the configured grace period +- Adds cross-component activation triggers for LLAP/TezAM (wake when HS2 has open sessions) + +**Exported Prometheus Metrics (per component):** + +| Component | Key Metrics | Purpose | +|-----------|---------|---------| +| **HiveServer2** | `hs2_open_sessions` | Session count — used by HS2 ScaledObject (sum for scale-up protection) and TezAM ScaledObject (demand-driven scaling) | +| **Metastore** | `api_*_total` | API call counters (manual delta for Prometheus 3.x compatibility) | +| **LLAP** | `hadoop_llapdaemon_executornumqueuedrequests`, `hadoop_llapdaemon_executornumexecutorsconfigured`, `hadoop_llapdaemon_executornumexecutorsavailable` | Total busy slots = queued + configured - available (scaling trigger) | +| **Tez AM** | N/A (scales on HS2 metrics) | TezAM scaling is demand-driven from `hs2_open_sessions` — no TezAM-specific metrics needed | + +### CPU-Based Scaling + +The operator can include a **CPU trigger** in the ScaledObject for HS2 and Metastore. +The trigger uses KEDA's `AverageValue` metric type with **absolute millicore targets** that +you specify directly. This handles burstable QoS pods correctly — unlike `Utilization` +(which measures against the CPU request), `AverageValue` uses actual CPU consumption in +absolute terms, so pods with a small request but high limit won't show perpetual >100% +utilization that prevents scale-down. + +**The CPU trigger is opt-in:** it is only added to the ScaledObject when you explicitly set +both `targetCpuValue` and `activationCpuValue` in the autoscaling config. If omitted, the +operator relies solely on the Prometheus-based trigger (sessions, connections, etc.). + +**How it works:** + +- `targetCpuValue` — the average CPU per pod (e.g., `"1500m"` or `"1"`) that triggers scale-up +- `activationCpuValue` — below this CPU value, the trigger is completely inactive + (doesn't participate in scaling decisions at all) +- Both the CPU trigger and the Prometheus-based trigger are evaluated independently — + if **either** exceeds its threshold, the component scales up (OR logic) +- Scale-down only happens when **both** triggers agree load is low +- The component must also have `resources` defined on its pods; if `targetCpuValue` is set + but `resources` is missing, the operator logs a warning and skips the CPU trigger + +**Example:** With `targetCpuValue: "1600m"` and `activationCpuValue: "400m"`, KEDA scales up +when average pod CPU exceeds 1600m and considers the trigger inactive below 400m. + +To enable both Prometheus and CPU-based scaling: + +```yaml +cluster: + hiveServer2: + resources: + requestsCpu: "500m" + limitsCpu: "2" + requestsMemory: "2Gi" + autoscaling: + enabled: true + scaleUpThreshold: 1 # scale up when total sessions > 1 + targetCpuValue: "1600m" # scale up when avg CPU > 1600m per pod + activationCpuValue: "400m" # CPU trigger inactive below 400m + + metastore: + resources: + requestsCpu: "500m" + limitsCpu: "1" + requestsMemory: "1Gi" + autoscaling: + enabled: true + targetCpuValue: "750m" + activationCpuValue: "200m" +``` + +| Setting | Effect on CPU trigger | +|---------|----------------------| +| `targetCpuValue` | Absolute CPU target (e.g., `"1500m"` or `"1"`). **Required** to enable CPU trigger. | +| `activationCpuValue` | CPU below which trigger is inactive. **Required** with targetCpuValue. | +| `resources` | Pod resources must be defined — operator warns and skips CPU trigger otherwise. | + +> **Note:** LLAP and TezAM scaling use only Prometheus-based triggers and do not +> include CPU triggers. LLAP scales on total busy slots (queued + running executors). +> TezAM scales on demand — the number of active HS2 pods multiplied by +> `hive.server2.tez.sessions.per.default.queue` (default 1). + +### Helm Values Reference (Autoscaling) + +| Value | Default | Description | +|-------|---------|-------------| +| `cluster..replicas` | `1-2` | Static replica count, or max replicas ceiling when autoscaling is enabled | +| `cluster..autoscaling.enabled` | `false` | Enable KEDA-based autoscaling | +| `cluster..autoscaling.minReplicas` | `0` (HS2/LLAP/TezAM), `1` (HMS) | Minimum replica count. Set to 0 for scale-to-zero | +| `cluster..autoscaling.scaleUpThreshold` | varies | Metric threshold triggering scale-up | +| `cluster..autoscaling.scaleDownThreshold` | varies | Metric threshold triggering scale-down | +| `cluster..autoscaling.cooldownSeconds` | varies | KEDA cooldown: seconds after all triggers go inactive before scaling 1→0 | +| `cluster..autoscaling.scaleUpStabilizationSeconds` | `60` | HPA stabilization window for scale-up (picks highest recommendation in window) | +| `cluster..autoscaling.scaleDownStabilizationSeconds` | `300` | HPA stabilization window for scale-down (picks highest recommendation in window) | +| `cluster..autoscaling.gracePeriodSeconds` | `3600` | Safety cap: max drain time before forced termination. Pod exits immediately once sessions/connections drain to 0. | + +--- + ## Connect to HiveServer2 +HiveServer2 runs in **HTTP transport mode** by default (recommended for Kubernetes +environments as it works well with load balancers, ingress controllers, and proxies). + +### Standard Connection (minReplicas >= 1) + +When HS2 always has at least one pod running, connect directly to the service: + ```bash -kubectl exec -it deployment/hive-hiveserver2 -- beeline -u "jdbc:hive2://hive-hiveserver2:10000/" +kubectl exec -it deployment/hive-hiveserver2 -- beeline -u "jdbc:hive2://hive-hiveserver2:10001/;transportMode=http;httpPath=cliservice" ``` Or via port-forward: ```bash -kubectl port-forward svc/hive-hiveserver2 10000:10000 -beeline -u "jdbc:hive2://localhost:10000/" +kubectl port-forward svc/hive-hiveserver2 10001:10001 +beeline -u "jdbc:hive2://localhost:10001/;transportMode=http;httpPath=cliservice" +``` + +### Connecting with Scale-to-Zero (minReplicas = 0) + +When HS2 is configured with `minReplicas: 0`, the deployment starts with zero pods. +Connections go through the **KEDA HTTP interceptor proxy** which automatically wakes +HS2 when a request arrives (first request takes ~30-60s while the pod starts). + +``` +Traffic flow: +Client → KEDA HTTP Interceptor → (if 0 pods: scale up, wait) → HS2 Service → HS2 Pod +``` + +**Via kubectl exec (no local Hive install needed):** + +The Metastore pod is always running (`minReplicas=1`) and has beeline pre-installed. +Connecting through the interceptor wakes HS2 from zero automatically: + +```bash +kubectl exec -it deploy/hive-metastore -- beeline -u "jdbc:hive2://keda-add-ons-http-interceptor-proxy.keda.svc:8080/;transportMode=http;httpPath=cliservice" +``` + +Or connect directly when HS2 is already running: + +```bash +kubectl exec -it deploy/hive-metastore -- beeline -u "jdbc:hive2://hive-hiveserver2:10001/;transportMode=http;httpPath=cliservice" +``` + +**Via port-forward (local development):** + +```bash +# Port-forward the KEDA HTTP interceptor proxy +kubectl port-forward -n keda svc/keda-add-ons-http-interceptor-proxy 8080:8080 + +# Connect — interceptor auto-wakes HS2 (first request may take 30-60s) +beeline -u "jdbc:hive2://localhost:8080/;transportMode=http;httpPath=cliservice" +``` + +**Via Ingress:** + +Create an Ingress that routes to the KEDA interceptor. Uses [nip.io](https://nip.io) +wildcard DNS so no `/etc/hosts` editing is needed — `hive.127.0.0.1.nip.io` resolves +to `127.0.0.1` automatically: + +```bash +kubectl create ingress hive-interceptor -n keda --class=nginx \ + --rule="hive.127.0.0.1.nip.io/*=keda-add-ons-http-interceptor-proxy:8080" \ + --annotation="nginx.ingress.kubernetes.io/upstream-vhost=hive-hiveserver2.default.svc.cluster.local" +``` + +> The `upstream-vhost` annotation rewrites the Host header to the internal service +> name so the KEDA interceptor can match and route the request. + +Connect via beeline using the Ingress: + +```bash +beeline -u "jdbc:hive2://hive.127.0.0.1.nip.io:80/;transportMode=http;httpPath=cliservice" +``` + +> For production, replace `hive.127.0.0.1.nip.io` with your actual domain +> (e.g., `hive.example.com`) and ensure DNS points to your ingress controller. + +**Manual wake (fallback without HTTP Add-on):** + +```bash +kubectl scale deployment/hive-hiveserver2 --replicas=1 +kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=hiveserver2 --timeout=120s +kubectl exec -it deployment/hive-hiveserver2 -- beeline -u "jdbc:hive2://hive-hiveserver2:10001/;transportMode=http;httpPath=cliservice" ``` +> **Note:** The operator sets `hive.server2.transport.mode=http`, +> `hive.server2.thrift.http.port=10001`, and +> `hive.server2.thrift.http.path=cliservice` by default. The binary Thrift +> port (10000) is still exposed for backward compatibility but HTTP mode +> is the primary transport. To override, use `configOverrides` in the +> HiveServer2 spec. + +> **Metastore HTTP Mode:** The operator configures HMS in HTTP transport mode +> (`metastore.server.thrift.transport.mode=http`) and sets the matching client +> config (`hive.metastore.client.thrift.transport.mode=http`) on HS2 and TezAM. +> HTTP mode makes Metastore connections stateless — each RPC is an independent +> HTTP request, so Metastore pods can scale down safely without breaking active +> connections from HiveServer2. The port remains 9083 (same as binary mode). + --- ## Helm Values Reference @@ -620,6 +1128,22 @@ beeline -u "jdbc:hive2://localhost:10000/" | `cluster.tezAm.extraVolumes` | `[]` | Additional volumes for TezAM pods | | `cluster.tezAm.extraVolumeMounts` | `[]` | Additional volume mounts for TezAM containers | +### Autoscaling (per component) + +| Value | Default | Description | +|-------|---------|-------------| +| `cluster..autoscaling.enabled` | `false` | Enable KEDA-based autoscaling for this component | +| `cluster..autoscaling.minReplicas` | `0` | Floor replica count. 0 enables scale-to-zero (HS2 requires KEDA HTTP Add-on) | +| `cluster..autoscaling.scaleUpThreshold` | `80` | Metric threshold triggering scale-up (total sessions for HS2, request rate for HMS, busy slots for LLAP, demand per HS2 pod for TezAM) | +| `cluster..autoscaling.scaleDownThreshold` | `30` | Prometheus metric threshold for scale-down (component-specific) | +| `cluster..autoscaling.targetCpuValue` | — | Absolute CPU target for scale-up (e.g., `1500m`). Omit to disable CPU trigger. | +| `cluster..autoscaling.activationCpuValue` | — | CPU value below which CPU trigger is inactive. Required with targetCpuValue. | +| `cluster..autoscaling.cooldownSeconds` | `300-900` | KEDA cooldown: seconds after all triggers go inactive before scaling 1→0 | +| `cluster..autoscaling.scaleUpStabilizationSeconds` | `60` | HPA stabilization window for scale-up decisions (prevents flapping) | +| `cluster..autoscaling.scaleDownStabilizationSeconds` | `300` | HPA stabilization window for scale-down decisions (prevents premature scale-down) | +| `cluster..autoscaling.gracePeriodSeconds` | `3600` | Safety cap (seconds) — pod terminates immediately once drain completes, this is only the upper bound | +| `cluster..autoscaling.metricsScrapeIntervalSeconds` | `10` | Prometheus scrape interval override for this component's pods. Lower values make autoscaling react faster but increase Prometheus load. Applied via `prometheus.io/scrape-interval` pod annotation. | + --- ## Upgrade and Uninstall @@ -659,12 +1183,21 @@ helm install hive ./helm/hive-operator -f my-values.yaml ### Remove Everything (including dependencies) ```bash -helm uninstall hive -kubectl delete crd hiveclusters.hive.apache.org -helm uninstall ozone postgres zookeeper --ignore-not-found -kubectl delete pvc data-zookeeper-0 --ignore-not-found -kubectl delete pvc data-postgres-postgresql-0 --ignore-not-found +kubectl delete hivecluster --all -A --wait=false --ignore-not-found +kubectl delete ingress hive-interceptor -n keda --ignore-not-found +helm uninstall hive --ignore-not-found +kubectl delete crd hiveclusters.hive.apache.org --wait=false --ignore-not-found +kubectl delete crd --wait=false --ignore-not-found scaledobjects.keda.sh scaledjobs.keda.sh triggerauthentications.keda.sh clustertriggerauthentications.keda.sh httpscaledobjects.http.keda.sh interceptorroutes.http.keda.sh +helm uninstall http-add-on -n keda --ignore-not-found +helm uninstall keda -n keda --ignore-not-found +helm uninstall prometheus -n monitoring --ignore-not-found +helm uninstall ozone --ignore-not-found +helm uninstall postgres --ignore-not-found +helm uninstall zookeeper --ignore-not-found +kubectl delete pvc data-zookeeper-0 data-postgres-postgresql-0 --ignore-not-found kubectl delete secret hive-db-secret --ignore-not-found +kubectl delete namespace keda --wait=false --ignore-not-found +kubectl delete namespace monitoring --wait=false --ignore-not-found ``` --- diff --git a/packaging/src/kubernetes/helm/hive-operator/crds/hiveclusters.hive.apache.org-v1.yml b/packaging/src/kubernetes/helm/hive-operator/crds/hiveclusters.hive.apache.org-v1.yml index 99768633a128..1e496830f21b 100644 --- a/packaging/src/kubernetes/helm/hive-operator/crds/hiveclusters.hive.apache.org-v1.yml +++ b/packaging/src/kubernetes/helm/hive-operator/crds/hiveclusters.hive.apache.org-v1.yml @@ -44,6 +44,70 @@ spec: hiveServer2: description: HiveServer2 component configuration properties: + autoscaling: + description: Autoscaling configuration (requires KEDA installed + in the cluster) + properties: + activationCpuValue: + description: CPU average value below which the trigger is + inactive. Required if targetCpuValue is set. + type: string + cooldownSeconds: + default: 600 + description: Cooldown period in seconds after all KEDA triggers + are inactive before scaling from 1 to 0 (scale-to-zero delay) + type: integer + enabled: + default: false + description: Whether autoscaling is enabled for this component + type: boolean + gracePeriodSeconds: + default: 3600 + description: Maximum time in seconds to wait for graceful + drain during scale-down before the pod is forcibly terminated. + The pod terminates immediately once sessions/connections + drain to 0; this value is only the upper safety cap. + type: integer + metricsScrapeIntervalSeconds: + default: 10 + description: Prometheus scrape interval in seconds for this + component's metrics. Lower values make autoscaling react + faster but increase Prometheus load. + type: integer + minReplicas: + default: 0 + description: Minimum number of replicas (floor for scale-down). + Set to 0 for scale-to-zero (HS2 requires KEDA HTTP Add-on + for wake-from-zero) + type: integer + scaleDownStabilizationSeconds: + default: 300 + description: Stabilization window in seconds for scale-down + decisions. HPA picks the highest recommendation within this + window to prevent premature scale-down. + type: integer + scaleDownThreshold: + default: 20 + description: Threshold that triggers scale-down for Prometheus-based + metrics + type: integer + scaleUpStabilizationSeconds: + default: 60 + description: Stabilization window in seconds for scale-up + decisions. HPA picks the highest recommendation within this + window to prevent flapping. + type: integer + scaleUpThreshold: + default: 80 + description: "Threshold that triggers scale-up (component-specific:\ + \ sessions for HS2, connections for HMS, queue depth for\ + \ LLAP, pending tasks for TezAM)" + type: integer + targetCpuValue: + description: "Target CPU average value for scaling (e.g.,\ + \ '1500m' or '1'). If omitted, CPU scaling is disabled." + type: string + type: object configOverrides: additionalProperties: type: string @@ -152,6 +216,70 @@ spec: llap: description: LLAP daemon configuration. Enabled by default. properties: + autoscaling: + description: Autoscaling configuration (requires KEDA installed + in the cluster) + properties: + activationCpuValue: + description: CPU average value below which the trigger is + inactive. Required if targetCpuValue is set. + type: string + cooldownSeconds: + default: 600 + description: Cooldown period in seconds after all KEDA triggers + are inactive before scaling from 1 to 0 (scale-to-zero delay) + type: integer + enabled: + default: false + description: Whether autoscaling is enabled for this component + type: boolean + gracePeriodSeconds: + default: 3600 + description: Maximum time in seconds to wait for graceful + drain during scale-down before the pod is forcibly terminated. + The pod terminates immediately once sessions/connections + drain to 0; this value is only the upper safety cap. + type: integer + metricsScrapeIntervalSeconds: + default: 10 + description: Prometheus scrape interval in seconds for this + component's metrics. Lower values make autoscaling react + faster but increase Prometheus load. + type: integer + minReplicas: + default: 0 + description: Minimum number of replicas (floor for scale-down). + Set to 0 for scale-to-zero (HS2 requires KEDA HTTP Add-on + for wake-from-zero) + type: integer + scaleDownStabilizationSeconds: + default: 300 + description: Stabilization window in seconds for scale-down + decisions. HPA picks the highest recommendation within this + window to prevent premature scale-down. + type: integer + scaleDownThreshold: + default: 20 + description: Threshold that triggers scale-down for Prometheus-based + metrics + type: integer + scaleUpStabilizationSeconds: + default: 60 + description: Stabilization window in seconds for scale-up + decisions. HPA picks the highest recommendation within this + window to prevent flapping. + type: integer + scaleUpThreshold: + default: 80 + description: "Threshold that triggers scale-up (component-specific:\ + \ sessions for HS2, connections for HMS, queue depth for\ + \ LLAP, pending tasks for TezAM)" + type: integer + targetCpuValue: + description: "Target CPU average value for scaling (e.g.,\ + \ '1500m' or '1'). If omitted, CPU scaling is disabled." + type: string + type: object configOverrides: additionalProperties: type: string @@ -235,6 +363,70 @@ spec: metastore: description: Metastore component configuration properties: + autoscaling: + description: Autoscaling configuration (requires KEDA installed + in the cluster) + properties: + activationCpuValue: + description: CPU average value below which the trigger is + inactive. Required if targetCpuValue is set. + type: string + cooldownSeconds: + default: 600 + description: Cooldown period in seconds after all KEDA triggers + are inactive before scaling from 1 to 0 (scale-to-zero delay) + type: integer + enabled: + default: false + description: Whether autoscaling is enabled for this component + type: boolean + gracePeriodSeconds: + default: 3600 + description: Maximum time in seconds to wait for graceful + drain during scale-down before the pod is forcibly terminated. + The pod terminates immediately once sessions/connections + drain to 0; this value is only the upper safety cap. + type: integer + metricsScrapeIntervalSeconds: + default: 10 + description: Prometheus scrape interval in seconds for this + component's metrics. Lower values make autoscaling react + faster but increase Prometheus load. + type: integer + minReplicas: + default: 0 + description: Minimum number of replicas (floor for scale-down). + Set to 0 for scale-to-zero (HS2 requires KEDA HTTP Add-on + for wake-from-zero) + type: integer + scaleDownStabilizationSeconds: + default: 300 + description: Stabilization window in seconds for scale-down + decisions. HPA picks the highest recommendation within this + window to prevent premature scale-down. + type: integer + scaleDownThreshold: + default: 20 + description: Threshold that triggers scale-down for Prometheus-based + metrics + type: integer + scaleUpStabilizationSeconds: + default: 60 + description: Stabilization window in seconds for scale-up + decisions. HPA picks the highest recommendation within this + window to prevent flapping. + type: integer + scaleUpThreshold: + default: 80 + description: "Threshold that triggers scale-up (component-specific:\ + \ sessions for HS2, connections for HMS, queue depth for\ + \ LLAP, pending tasks for TezAM)" + type: integer + targetCpuValue: + description: "Target CPU average value for scaling (e.g.,\ + \ '1500m' or '1'). If omitted, CPU scaling is disabled." + type: string + type: object configOverrides: additionalProperties: type: string @@ -371,6 +563,70 @@ spec: tezAm: description: Tez Application Master configuration. Enabled by default. properties: + autoscaling: + description: Autoscaling configuration (requires KEDA installed + in the cluster) + properties: + activationCpuValue: + description: CPU average value below which the trigger is + inactive. Required if targetCpuValue is set. + type: string + cooldownSeconds: + default: 600 + description: Cooldown period in seconds after all KEDA triggers + are inactive before scaling from 1 to 0 (scale-to-zero delay) + type: integer + enabled: + default: false + description: Whether autoscaling is enabled for this component + type: boolean + gracePeriodSeconds: + default: 3600 + description: Maximum time in seconds to wait for graceful + drain during scale-down before the pod is forcibly terminated. + The pod terminates immediately once sessions/connections + drain to 0; this value is only the upper safety cap. + type: integer + metricsScrapeIntervalSeconds: + default: 10 + description: Prometheus scrape interval in seconds for this + component's metrics. Lower values make autoscaling react + faster but increase Prometheus load. + type: integer + minReplicas: + default: 0 + description: Minimum number of replicas (floor for scale-down). + Set to 0 for scale-to-zero (HS2 requires KEDA HTTP Add-on + for wake-from-zero) + type: integer + scaleDownStabilizationSeconds: + default: 300 + description: Stabilization window in seconds for scale-down + decisions. HPA picks the highest recommendation within this + window to prevent premature scale-down. + type: integer + scaleDownThreshold: + default: 20 + description: Threshold that triggers scale-down for Prometheus-based + metrics + type: integer + scaleUpStabilizationSeconds: + default: 60 + description: Stabilization window in seconds for scale-up + decisions. HPA picks the highest recommendation within this + window to prevent flapping. + type: integer + scaleUpThreshold: + default: 80 + description: "Threshold that triggers scale-up (component-specific:\ + \ sessions for HS2, connections for HMS, queue depth for\ + \ LLAP, pending tasks for TezAM)" + type: integer + targetCpuValue: + description: "Target CPU average value for scaling (e.g.,\ + \ '1500m' or '1'). If omitted, CPU scaling is disabled." + type: string + type: object configOverrides: additionalProperties: type: string diff --git a/packaging/src/kubernetes/helm/hive-operator/templates/clusterrole.yaml b/packaging/src/kubernetes/helm/hive-operator/templates/clusterrole.yaml index d27e1fea8c6f..d3df4a5a7868 100644 --- a/packaging/src/kubernetes/helm/hive-operator/templates/clusterrole.yaml +++ b/packaging/src/kubernetes/helm/hive-operator/templates/clusterrole.yaml @@ -50,3 +50,15 @@ rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list", "watch"] + # PodDisruptionBudgets for graceful autoscaling + - apiGroups: ["policy"] + resources: ["poddisruptionbudgets"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] + # KEDA ScaledObjects for autoscaling + - apiGroups: ["keda.sh"] + resources: ["scaledobjects", "triggerauthentications"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] + # KEDA HTTP Add-on for scale-to-zero (wake-from-zero on HTTP request) + - apiGroups: ["http.keda.sh"] + resources: ["httpscaledobjects", "interceptorroutes"] + verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] diff --git a/packaging/src/kubernetes/helm/hive-operator/templates/hivecluster.yaml b/packaging/src/kubernetes/helm/hive-operator/templates/hivecluster.yaml index 091ecefb3cb0..9ed5269db04c 100644 --- a/packaging/src/kubernetes/helm/hive-operator/templates/hivecluster.yaml +++ b/packaging/src/kubernetes/helm/hive-operator/templates/hivecluster.yaml @@ -67,6 +67,23 @@ spec: extraVolumeMounts: {{- toYaml .Values.cluster.metastore.extraVolumeMounts | nindent 6 }} {{- end }} + {{- if and .Values.cluster.metastore.autoscaling .Values.cluster.metastore.autoscaling.enabled }} + autoscaling: + enabled: true + minReplicas: {{ .Values.cluster.metastore.autoscaling.minReplicas }} + scaleUpThreshold: {{ .Values.cluster.metastore.autoscaling.scaleUpThreshold }} + scaleDownThreshold: {{ .Values.cluster.metastore.autoscaling.scaleDownThreshold }} + {{- if .Values.cluster.metastore.autoscaling.targetCpuValue }} + targetCpuValue: {{ .Values.cluster.metastore.autoscaling.targetCpuValue | quote }} + {{- end }} + {{- if .Values.cluster.metastore.autoscaling.activationCpuValue }} + activationCpuValue: {{ .Values.cluster.metastore.autoscaling.activationCpuValue | quote }} + {{- end }} + cooldownSeconds: {{ .Values.cluster.metastore.autoscaling.cooldownSeconds }} + scaleUpStabilizationSeconds: {{ .Values.cluster.metastore.autoscaling.scaleUpStabilizationSeconds }} + scaleDownStabilizationSeconds: {{ .Values.cluster.metastore.autoscaling.scaleDownStabilizationSeconds }} + gracePeriodSeconds: {{ .Values.cluster.metastore.autoscaling.gracePeriodSeconds }} + {{- end }} {{- else }} {{- if .Values.cluster.metastore.externalUri }} externalUri: {{ .Values.cluster.metastore.externalUri | quote }} @@ -96,6 +113,23 @@ spec: extraVolumeMounts: {{- toYaml .Values.cluster.hiveServer2.extraVolumeMounts | nindent 6 }} {{- end }} + {{- if and .Values.cluster.hiveServer2.autoscaling .Values.cluster.hiveServer2.autoscaling.enabled }} + autoscaling: + enabled: true + minReplicas: {{ .Values.cluster.hiveServer2.autoscaling.minReplicas }} + scaleUpThreshold: {{ .Values.cluster.hiveServer2.autoscaling.scaleUpThreshold }} + scaleDownThreshold: {{ .Values.cluster.hiveServer2.autoscaling.scaleDownThreshold }} + {{- if .Values.cluster.hiveServer2.autoscaling.targetCpuValue }} + targetCpuValue: {{ .Values.cluster.hiveServer2.autoscaling.targetCpuValue | quote }} + {{- end }} + {{- if .Values.cluster.hiveServer2.autoscaling.activationCpuValue }} + activationCpuValue: {{ .Values.cluster.hiveServer2.autoscaling.activationCpuValue | quote }} + {{- end }} + cooldownSeconds: {{ .Values.cluster.hiveServer2.autoscaling.cooldownSeconds }} + scaleUpStabilizationSeconds: {{ .Values.cluster.hiveServer2.autoscaling.scaleUpStabilizationSeconds }} + scaleDownStabilizationSeconds: {{ .Values.cluster.hiveServer2.autoscaling.scaleDownStabilizationSeconds }} + gracePeriodSeconds: {{ .Values.cluster.hiveServer2.autoscaling.gracePeriodSeconds }} + {{- end }} llap: enabled: {{ .Values.cluster.llap.enabled }} @@ -120,6 +154,17 @@ spec: extraVolumeMounts: {{- toYaml .Values.cluster.llap.extraVolumeMounts | nindent 6 }} {{- end }} + {{- if and .Values.cluster.llap.autoscaling .Values.cluster.llap.autoscaling.enabled }} + autoscaling: + enabled: true + minReplicas: {{ .Values.cluster.llap.autoscaling.minReplicas }} + scaleUpThreshold: {{ .Values.cluster.llap.autoscaling.scaleUpThreshold }} + scaleDownThreshold: {{ .Values.cluster.llap.autoscaling.scaleDownThreshold }} + cooldownSeconds: {{ .Values.cluster.llap.autoscaling.cooldownSeconds }} + scaleUpStabilizationSeconds: {{ .Values.cluster.llap.autoscaling.scaleUpStabilizationSeconds }} + scaleDownStabilizationSeconds: {{ .Values.cluster.llap.autoscaling.scaleDownStabilizationSeconds }} + gracePeriodSeconds: {{ .Values.cluster.llap.autoscaling.gracePeriodSeconds }} + {{- end }} {{- end }} tezAm: @@ -146,6 +191,23 @@ spec: extraVolumeMounts: {{- toYaml .Values.cluster.tezAm.extraVolumeMounts | nindent 6 }} {{- end }} + {{- if and .Values.cluster.tezAm.autoscaling .Values.cluster.tezAm.autoscaling.enabled }} + autoscaling: + enabled: true + minReplicas: {{ .Values.cluster.tezAm.autoscaling.minReplicas }} + scaleUpThreshold: {{ .Values.cluster.tezAm.autoscaling.scaleUpThreshold }} + scaleDownThreshold: {{ .Values.cluster.tezAm.autoscaling.scaleDownThreshold }} + {{- if .Values.cluster.tezAm.autoscaling.targetCpuValue }} + targetCpuValue: {{ .Values.cluster.tezAm.autoscaling.targetCpuValue | quote }} + {{- end }} + {{- if .Values.cluster.tezAm.autoscaling.activationCpuValue }} + activationCpuValue: {{ .Values.cluster.tezAm.autoscaling.activationCpuValue | quote }} + {{- end }} + cooldownSeconds: {{ .Values.cluster.tezAm.autoscaling.cooldownSeconds }} + scaleUpStabilizationSeconds: {{ .Values.cluster.tezAm.autoscaling.scaleUpStabilizationSeconds }} + scaleDownStabilizationSeconds: {{ .Values.cluster.tezAm.autoscaling.scaleDownStabilizationSeconds }} + gracePeriodSeconds: {{ .Values.cluster.tezAm.autoscaling.gracePeriodSeconds }} + {{- end }} {{- end }} zookeeper: diff --git a/packaging/src/kubernetes/helm/hive-operator/values.yaml b/packaging/src/kubernetes/helm/hive-operator/values.yaml index b7d75930c5b2..85bb02c1277f 100644 --- a/packaging/src/kubernetes/helm/hive-operator/values.yaml +++ b/packaging/src/kubernetes/helm/hive-operator/values.yaml @@ -112,6 +112,19 @@ cluster: configOverrides: {} extraVolumes: [] extraVolumeMounts: [] + # Autoscaling (requires KEDA + Prometheus in the cluster) + # When enabled, 'replicas' above acts as the max replica ceiling + autoscaling: + enabled: false + minReplicas: 1 + scaleUpThreshold: 75 + scaleDownThreshold: 30 + # targetCpuValue: "750m" # Uncomment to enable CPU-based scaling (AverageValue) + # activationCpuValue: "200m" # CPU trigger inactive below this value + cooldownSeconds: 300 + scaleUpStabilizationSeconds: 60 + scaleDownStabilizationSeconds: 300 + gracePeriodSeconds: 60 # Set to use an external Metastore instead of deploying one: # enabled: false # externalUri: "thrift://external-metastore:9083" @@ -127,6 +140,20 @@ cluster: externalJars: [] extraVolumes: [] extraVolumeMounts: [] + # Autoscaling (requires KEDA + Prometheus + KEDA HTTP Add-on in the cluster) + # minReplicas: 0 enables scale-to-zero — beeline HTTP connects wake HS2 via KEDA HTTP interceptor + # When enabled, 'replicas' above acts as the max replica ceiling + autoscaling: + enabled: false + minReplicas: 0 + scaleUpThreshold: 80 + scaleDownThreshold: 20 + # targetCpuValue: "1600m" # Uncomment to enable CPU-based scaling (AverageValue) + # activationCpuValue: "400m" # CPU trigger inactive below this value + cooldownSeconds: 600 + scaleUpStabilizationSeconds: 60 + scaleDownStabilizationSeconds: 300 + gracePeriodSeconds: 300 # --------------------------------------------------------------------------- # LLAP — enabled by default for full-HA @@ -141,6 +168,18 @@ cluster: configOverrides: {} extraVolumes: [] extraVolumeMounts: [] + # Autoscaling (requires KEDA + Prometheus in the cluster) + # minReplicas: 0 enables scale-to-zero — scales up immediately when queries need LLAP + # When enabled, 'replicas' above acts as the max replica ceiling + autoscaling: + enabled: false + minReplicas: 0 + scaleUpThreshold: 1 + scaleDownThreshold: 0 + cooldownSeconds: 900 + scaleUpStabilizationSeconds: 60 + scaleDownStabilizationSeconds: 300 + gracePeriodSeconds: 600 # --------------------------------------------------------------------------- # TEZ AM — enabled by default for full-HA @@ -154,3 +193,18 @@ cluster: configOverrides: {} extraVolumes: [] extraVolumeMounts: [] + # Autoscaling (requires KEDA + Prometheus in the cluster) + # minReplicas: 0 enables scale-to-zero — wakes when HS2 receives queries + # When enabled, 'replicas' above acts as the max replica ceiling + # scaleUpThreshold: pending tasks per AM (e.g., 5 = scale when 5+ tasks waiting) + autoscaling: + enabled: false + minReplicas: 0 + scaleUpThreshold: 5 + scaleDownThreshold: 10 + # targetCpuValue: "600m" # Uncomment to enable CPU-based scaling (AverageValue) + # activationCpuValue: "100m" # CPU trigger inactive below this value + cooldownSeconds: 600 + scaleUpStabilizationSeconds: 60 + scaleDownStabilizationSeconds: 300 + gracePeriodSeconds: 120 diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/HiveOperatorMain.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/HiveOperatorMain.java index 55bd3372a40d..d02f08fff038 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/HiveOperatorMain.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/HiveOperatorMain.java @@ -19,7 +19,11 @@ package org.apache.hive.kubernetes.operator; import io.javaoperatorsdk.operator.Operator; +import io.javaoperatorsdk.operator.api.config.ControllerConfiguration; +import io.javaoperatorsdk.operator.api.config.ResolvedControllerConfiguration; +import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.reconciler.HiveClusterReconciler; +import org.apache.hive.kubernetes.operator.reconciler.HiveWorkflowSpec; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -36,7 +40,16 @@ private HiveOperatorMain() { public static void main(String[] args) { LOG.info("Starting Hive Kubernetes Operator"); Operator operator = new Operator(); - operator.register(new HiveClusterReconciler()); + HiveClusterReconciler reconciler = new HiveClusterReconciler(); + // Get the annotation-derived base config, then inject our programmatic workflow spec. + ControllerConfiguration baseConfig = + operator.getConfigurationService().getConfigurationFor(reconciler); + HiveWorkflowSpec workflowSpec = new HiveWorkflowSpec(); + ((ResolvedControllerConfiguration) baseConfig) + .setWorkflowSpec(workflowSpec); + LOG.info("Registered workflow with {} dependent resource specs", + workflowSpec.getDependentResourceSpecs().size()); + operator.register(reconciler, baseConfig); operator.start(); LOG.info("Hive Kubernetes Operator started successfully"); } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HadoopConfigMapDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HadoopConfigMapDependent.java deleted file mode 100644 index 6c0f9308dbc1..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HadoopConfigMapDependent.java +++ /dev/null @@ -1,67 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import java.util.Map; - -import io.fabric8.kubernetes.api.model.ConfigMap; -import io.fabric8.kubernetes.api.model.ConfigMapBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; -import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the Hadoop core-site.xml ConfigMap for filesystem configuration. */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=hadoop-config," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class HadoopConfigMapDependent - extends HiveDependentResource { - - public static final String COMPONENT = "hadoop-config"; - - public HadoopConfigMapDependent() { - super(ConfigMap.class); - } - - @Override - protected ConfigMap desired(HiveCluster hiveCluster, - Context context) { - Map props = - HiveConfigBuilder.getHadoopCoreSite(hiveCluster.getSpec()); - - return new ConfigMapBuilder() - .withNewMetadata() - .withName(resourceName(hiveCluster)) - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) - .endMetadata() - .addToData("core-site.xml", HadoopXmlBuilder.buildXml(props)) - .build(); - } - - /** Returns the ConfigMap resource name for this HiveCluster. */ - public static String resourceName(HiveCluster hiveCluster) { - return hiveCluster.getMetadata().getName() + "-hadoop-config"; - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveConfigMapDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveConfigMapDependent.java new file mode 100644 index 000000000000..935b47e094cb --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveConfigMapDependent.java @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import io.fabric8.kubernetes.api.model.ConfigMap; +import io.fabric8.kubernetes.api.model.ConfigMapBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.api.config.informer.Informer; +import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; + +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; +import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; +import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; +import org.apache.hive.kubernetes.operator.util.Labels; + +/** + * Unified ConfigMap dependent resource for all Hive component configurations. + * Subclassed per component to define the specific XML data and label selector. + */ +public abstract class HiveConfigMapDependent extends HiveDependentResource { + + private final String component; + private final String suffix; + + protected HiveConfigMapDependent(String component, String suffix) { + super(ConfigMap.class); + this.component = component; + this.suffix = suffix; + } + + @Override + protected String getSecondaryResourceName(HiveCluster primary, Context context) { + return primary.getMetadata().getName() + "-" + suffix; + } + + @Override + protected ConfigMap desired(HiveCluster hiveCluster, Context context) { + ConfigMapBuilder builder = + new ConfigMapBuilder().withNewMetadata().withName(hiveCluster.getMetadata().getName() + "-" + suffix) + .withNamespace(hiveCluster.getMetadata().getNamespace()) + .withLabels(Labels.forComponent(hiveCluster, component)).endMetadata(); + addData(builder, hiveCluster); + return builder.build(); + } + + /** + * Subclasses add their specific XML data entries. + */ + protected abstract void addData(ConfigMapBuilder builder, HiveCluster hiveCluster); + + /** + * Hadoop core-site.xml ConfigMap for filesystem configuration. + */ + @KubernetesDependent(informer = @Informer(labelSelector = "app.kubernetes.io/component=hadoop-config," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator")) + public static class Hadoop extends HiveConfigMapDependent { + public Hadoop() { + super("hadoop-config", "hadoop-config"); + } + + @Override + protected void addData(ConfigMapBuilder builder, HiveCluster hiveCluster) { + builder.addToData("core-site.xml", + HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHadoopCoreSite(hiveCluster.getSpec()))); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-hadoop-config"; + } + } + + /** + * Metastore metastore-site.xml ConfigMap. + */ + @KubernetesDependent(informer = @Informer(labelSelector = "app.kubernetes.io/component=metastore," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator")) + public static class Metastore extends HiveConfigMapDependent { + public Metastore() { + super("metastore", "metastore-config"); + } + + @Override + protected void addData(ConfigMapBuilder builder, HiveCluster hiveCluster) { + builder.addToData("metastore-site.xml", + HadoopXmlBuilder.buildXml(HiveConfigBuilder.getMetastoreSite(hiveCluster.getSpec()))); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-metastore-config"; + } + } + + /** + * HiveServer2 hive-site.xml + tez-site.xml ConfigMap. + */ + @KubernetesDependent(informer = @Informer(labelSelector = "app.kubernetes.io/component=hiveserver2," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator")) + public static class HiveServer2 extends HiveConfigMapDependent { + public HiveServer2() { + super("hiveserver2", "hiveserver2-config"); + } + + @Override + protected void addData(ConfigMapBuilder builder, HiveCluster hiveCluster) { + HiveClusterSpec spec = hiveCluster.getSpec(); + builder.addToData("hive-site.xml", + HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHiveServer2HiveSite(hiveCluster, spec))); + builder.addToData("tez-site.xml", HadoopXmlBuilder.buildXml(HiveConfigBuilder.getTezSite(spec))); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-hiveserver2-config"; + } + } + + /** + * LLAP llap-daemon-site.xml ConfigMap. + */ + @KubernetesDependent(informer = @Informer(labelSelector = "app.kubernetes.io/component=llap," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator")) + public static class Llap extends HiveConfigMapDependent { + public Llap() { + super("llap", "llap-config"); + } + + @Override + protected void addData(ConfigMapBuilder builder, HiveCluster hiveCluster) { + builder.addToData("llap-daemon-site.xml", + HadoopXmlBuilder.buildXml(HiveConfigBuilder.getLlapDaemonSite(hiveCluster.getSpec()))); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-llap-config"; + } + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveDependentResource.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveDependentResource.java index cc2eb0de6de0..3a47e4e114b4 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveDependentResource.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveDependentResource.java @@ -24,6 +24,7 @@ import java.util.List; import java.util.Map; import java.util.Optional; +import java.util.Set; import io.fabric8.kubernetes.api.model.AffinityBuilder; import io.fabric8.kubernetes.api.model.Container; import io.fabric8.kubernetes.api.model.ContainerBuilder; @@ -45,11 +46,13 @@ import io.javaoperatorsdk.operator.processing.dependent.Matcher; import io.javaoperatorsdk.operator.processing.dependent.kubernetes.CRUDKubernetesDependentResource; import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; import org.apache.hive.kubernetes.operator.model.spec.DatabaseConfig; import org.apache.hive.kubernetes.operator.model.spec.ResourceRequirementsSpec; import org.apache.hive.kubernetes.operator.model.spec.SecretKeyRef; import org.apache.hive.kubernetes.operator.model.spec.ProbeSpec; +import org.apache.hive.kubernetes.operator.util.ConfigUtils; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -78,32 +81,41 @@ protected HiveDependentResource(Class resourceType) { super(resourceType); } + /** - * Catches 409 AlreadyExists during resource creation caused by - * informer lag — the resource exists on the API server but - * the informer cache hasn't indexed it yet, so JOSDK calls - * create directly. + * Returns the expected Kubernetes resource name for this dependent. + * Used to disambiguate when multiple dependents share the same resource + * type (e.g., multiple ConfigMap or Service dependents). Subclasses that + * share a resource type MUST override this method. + * + * @throws IllegalStateException if not overridden and disambiguation is needed */ - @Override - protected R handleCreate(R desired, P primary, Context

context) { - try { - return super.handleCreate(desired, primary, context); - } catch (KubernetesClientException e) { - if (e.getCode() == 409) { - LOG.info("Resource {} already exists (informer lag), " - + "will reconcile on next event", - desired.getMetadata().getName()); - return desired; - } - throw e; - } + protected String getSecondaryResourceName(P primary, Context

context) { + throw new IllegalStateException( + getClass().getSimpleName() + " must override getSecondaryResourceName() " + + "when multiple dependents share the same resource type"); } @Override public Optional getSecondaryResource(P primary, Context

context) { return eventSource() - .flatMap(es -> es.getSecondaryResource(primary)); + .flatMap(es -> { + Set resources = es.getSecondaryResources(primary); + if (resources.isEmpty()) { + return Optional.empty(); + } + // Always filter by expected name — even when only one resource + // is in the cache. Without this, a single Deployment (e.g. + // metastore) would be handed to HiveServer2's matcher, causing + // a cross-component update loop. + String expectedName = getSecondaryResourceName(primary, + context); + return resources.stream() + .filter(r -> expectedName.equals( + r.getMetadata().getName())) + .findFirst(); + }); } /** @@ -125,6 +137,185 @@ public Matcher.Result match(R actualResource, R desired, return super.match(actualResource, desired, primary, context); } + /** + * Handles 409 Conflict errors during resource creation caused by informer + * cache lag. When the operator creates a resource but the informer hasn't + * yet received the creation event, the framework may attempt to create it + * again. Kubernetes rejects the duplicate with 409 — this handler absorbs + * that expected race and lets the next reconciliation pick up the resource + * from the updated cache. + */ + @Override + protected R handleCreate(R desired, P primary, Context

context) { + try { + return super.handleCreate(desired, primary, context); + } catch (KubernetesClientException e) { + if (e.getCode() == 409) { + LOG.info("Resource {} already exists (informer lag), " + + "will reconcile on next event", + desired.getMetadata().getName()); + return desired; + } + throw e; + } + } + + /** + * Resolves the replica count to set in the desired workload spec. + * When autoscaling is enabled and the workload already exists, the current + * replica count is preserved (KEDA/HPA manages it). On initial creation + * the provided fallback is used. + * + * @param primary the HiveCluster primary resource + * @param context the reconciliation context + * @param autoscaling autoscaling spec for this component (may be null) + * @param staticReplicas replica count from the spec (used when autoscaling is off) + * @param initialReplicas replica count on first creation when autoscaling is on + */ + @SuppressWarnings("unchecked") + protected Integer resolveReplicaCount(P primary, Context

context, + AutoscalingSpec autoscaling, int staticReplicas, int initialReplicas) { + if (autoscaling == null || !autoscaling.isEnabled()) { + return staticReplicas; + } + return getSecondaryResource(primary, context) + .map(existing -> { + if (existing instanceof io.fabric8.kubernetes.api.model.apps.Deployment d) { + return d.getSpec().getReplicas(); + } else if (existing instanceof io.fabric8.kubernetes.api.model.apps.StatefulSet s) { + return s.getSpec().getReplicas(); + } + return initialReplicas; + }) + .orElse(initialReplicas); + } + + /** + * Builds a preStop drain script that polls a single Prometheus metric + * (from the JMX Exporter at localhost:9404/metrics) until the value + * reaches zero, then exits to allow graceful pod termination. + * + * @param startupMessage logged at the start (e.g. "Waiting for open connections to drain") + * @param metricName Prometheus metric name (used in grep and log messages) + * @param varName shell variable name for the extracted value (e.g. "CONNS") + * @param idleMessage logged when idle condition is met (e.g. "All connections drained. Shutting down.") + * @param sleepSeconds polling interval in seconds + * @param maxRetries max consecutive curl failures before giving up + * @param prefixCommands optional commands to run before the polling loop (may be null) + */ + protected static String buildDrainScript( + String startupMessage, String metricName, String varName, + String idleMessage, int sleepSeconds, int maxRetries, + List prefixCommands) { + List lines = new ArrayList<>(); + lines.add("#!/bin/bash"); + if (prefixCommands != null) { + lines.addAll(prefixCommands); + } + lines.add("echo '[preStop] " + startupMessage + + " (polling localhost:9404/metrics)...'"); + lines.add("RETRIES=0"); + lines.add("while true; do"); + lines.add(" RESPONSE=$(curl -sf http://localhost:9404/metrics)"); + lines.add(" if [ $? -ne 0 ]; then"); + lines.add(" RETRIES=$((RETRIES+1))"); + lines.add(" echo \"[preStop] ERROR: JMX Exporter unreachable on port 9404 (attempt $RETRIES)\""); + lines.add(" if [ $RETRIES -ge " + maxRetries + " ]; then"); + lines.add(" echo '[preStop] JMX Exporter not responding after " + + (maxRetries * sleepSeconds) + "s. Proceeding with shutdown.'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" sleep " + sleepSeconds + "; continue"); + lines.add(" fi"); + lines.add(" " + varName + "=$(echo \"$RESPONSE\" | grep '^" + + metricName + " ' | awk '{print $2}')"); + lines.add(" if [ -z \"$" + varName + "\" ]; then"); + lines.add(" echo '[preStop] WARNING: " + metricName + + " metric not found. JMX Exporter may not be configured.'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" if [ \"${" + varName + "%.*}\" -le 0 ] 2>/dev/null; then"); + lines.add(" echo '[preStop] " + idleMessage + "'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" echo \"[preStop] " + metricName + "=$" + varName + " - waiting...\""); + lines.add(" RETRIES=0"); + lines.add(" sleep " + sleepSeconds); + lines.add("done"); + // Send SIGTERM directly to the Java process. Shell entrypoint scripts + // (PID 1) often don't forward signals, so K8s SIGTERM never reaches + // the JVM — causing a full grace-period wait before SIGKILL. + // Use 'java' pattern to avoid matching this script itself. + lines.add("echo '[preStop] Sending SIGTERM to Java process...'"); + lines.add("kill $(pgrep -f 'java.*org.apache') 2>/dev/null"); + lines.add("exit 0"); + return String.join("\n", lines); + } + + /** + * Builds a preStop drain script that polls two Prometheus metrics and + * waits until available >= total (all executors idle). Used by LLAP. + * + * @param startupMessage logged at the start + * @param metricGrepA grep pattern for the first metric (e.g. includes trailing '{') + * @param varNameA shell variable for the first metric value (e.g. "AVAILABLE") + * @param metricGrepB grep pattern for the second metric + * @param varNameB shell variable for the second metric value (e.g. "TOTAL") + * @param notFoundWarning warning message when metrics are not found + * @param idleMessage logged when idle condition is met + * @param waitingFormat format for waiting log (with shell variable references) + * @param sleepSeconds polling interval in seconds + * @param maxRetries max consecutive curl failures before giving up + */ + protected static String buildDualMetricDrainScript( + String startupMessage, + String metricGrepA, String varNameA, + String metricGrepB, String varNameB, + String notFoundWarning, String idleMessage, + String waitingFormat, int sleepSeconds, int maxRetries) { + List lines = new ArrayList<>(); + lines.add("#!/bin/bash"); + lines.add("echo '[preStop] " + startupMessage + + " (polling localhost:9404/metrics)...'"); + lines.add("RETRIES=0"); + lines.add("while true; do"); + lines.add(" RESPONSE=$(curl -sf http://localhost:9404/metrics)"); + lines.add(" if [ $? -ne 0 ]; then"); + lines.add(" RETRIES=$((RETRIES+1))"); + lines.add(" echo \"[preStop] ERROR: JMX Exporter unreachable on port 9404 (attempt $RETRIES)\""); + lines.add(" if [ $RETRIES -ge " + maxRetries + " ]; then"); + lines.add(" echo '[preStop] JMX Exporter not responding after " + + (maxRetries * sleepSeconds) + "s. Proceeding with shutdown.'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" sleep " + sleepSeconds + "; continue"); + lines.add(" fi"); + lines.add(" " + varNameA + "=$(echo \"$RESPONSE\" | grep '^" + + metricGrepA + "' | awk '{print $2}')"); + lines.add(" " + varNameB + "=$(echo \"$RESPONSE\" | grep '^" + + metricGrepB + "' | awk '{print $2}')"); + lines.add(" if [ -z \"$" + varNameA + "\" ] || [ -z \"$" + varNameB + "\" ]; then"); + lines.add(" echo '[preStop] WARNING: " + notFoundWarning + "'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" if [ \"${" + varNameA + "%.*}\" -ge \"${" + varNameB + "%.*}\" ] 2>/dev/null; then"); + lines.add(" echo '[preStop] " + idleMessage + "'"); + lines.add(" break"); + lines.add(" fi"); + lines.add(" echo \"[preStop] " + waitingFormat + "\""); + lines.add(" RETRIES=0"); + lines.add(" sleep " + sleepSeconds); + lines.add("done"); + // Send SIGTERM directly to the Java process. Shell entrypoint scripts + // (PID 1) often don't forward signals, so K8s SIGTERM never reaches + // the JVM — causing a full grace-period wait before SIGKILL. + // Use 'java' pattern to avoid matching this script itself. + lines.add("echo '[preStop] Sending SIGTERM to Java process...'"); + lines.add("kill $(pgrep -f 'java.*org.apache') 2>/dev/null"); + lines.add("exit 0"); + return String.join("\n", lines); + } + /** * Computes a SHA-256 hash of the given input strings. * Used to annotate pod templates so that config changes trigger rolling updates. @@ -235,8 +426,8 @@ protected static void buildMetastoreVolumes( .withMountPath(CONF_MOUNT_PATH).build()); volumes.add(buildProjectedConfigVolume("hive-config", - MetastoreConfigMapDependent.resourceName(hiveCluster), - HadoopConfigMapDependent.resourceName(hiveCluster))); + HiveConfigMapDependent.Metastore.resourceName(hiveCluster), + HiveConfigMapDependent.Hadoop.resourceName(hiveCluster))); } /** Builds Kubernetes ResourceRequirements from the operator's spec. */ @@ -422,4 +613,214 @@ protected static Probe buildTcpProbe(int port, ProbeSpec spec, int defaultInitia return builder.build(); } + /** + * Applies the autoscaling lifecycle to a workload's pod template: sets a preStop + * exec lifecycle hook, terminationGracePeriodSeconds, and Prometheus scrape annotations. + * + * @param podSpec the pod spec of the workload (Deployment or StatefulSet) + * @param podMetadata the pod template metadata (for annotations) + * @param preStopScript the shell script to run in the preStop hook + * @param gracePeriodSeconds termination grace period + */ + protected static void applyAutoscalingLifecycle( + io.fabric8.kubernetes.api.model.PodSpec podSpec, + io.fabric8.kubernetes.api.model.ObjectMeta podMetadata, + String preStopScript, int gracePeriodSeconds, + int metricsScrapeIntervalSeconds) { + io.fabric8.kubernetes.api.model.Lifecycle lifecycle = + new io.fabric8.kubernetes.api.model.LifecycleBuilder() + .withNewPreStop() + .withNewExec() + .withCommand("/bin/bash", "-c", preStopScript) + .endExec() + .endPreStop() + .build(); + podSpec.getContainers().get(0).setLifecycle(lifecycle); + podSpec.setTerminationGracePeriodSeconds((long) gracePeriodSeconds); + applyPrometheusScrapeAnnotations(podMetadata, metricsScrapeIntervalSeconds); + } + + /** + * Adds Prometheus scrape annotations to a pod template so that + * the JMX Exporter metrics endpoint is discovered by Prometheus. + */ + private static void applyPrometheusScrapeAnnotations( + io.fabric8.kubernetes.api.model.ObjectMeta podMetadata, + int scrapeIntervalSeconds) { + podMetadata.getAnnotations().put("prometheus.io/scrape", "true"); + podMetadata.getAnnotations().put("prometheus.io/port", + String.valueOf(ConfigUtils.PROMETHEUS_JMX_EXPORTER_PORT)); + podMetadata.getAnnotations().put("prometheus.io/path", "/metrics"); + podMetadata.getAnnotations().put("prometheus.io/scrape-interval", + scrapeIntervalSeconds + "s"); + } + + /** + * Appends user-provided volumes and volume mounts to a workload's pod template. + * Handles both global (spec-level) and component-specific extras. + * + * @param podSpec the pod spec + * @param globalVolumes spec.volumes() (may be null) + * @param globalVolumeMounts spec.volumeMounts() (may be null) + * @param extraVolumes component-specific extraVolumes (may be null) + * @param extraVolumeMounts component-specific extraVolumeMounts (may be null) + */ + protected static void appendUserVolumes( + io.fabric8.kubernetes.api.model.PodSpec podSpec, + List globalVolumes, + List globalVolumeMounts, + List extraVolumes, + List extraVolumeMounts) { + if (globalVolumes != null) { + podSpec.getVolumes().addAll(globalVolumes); + } + if (globalVolumeMounts != null) { + podSpec.getContainers().get(0).getVolumeMounts().addAll(globalVolumeMounts); + } + if (extraVolumes != null) { + podSpec.getVolumes().addAll(extraVolumes); + } + if (extraVolumeMounts != null) { + podSpec.getContainers().get(0).getVolumeMounts().addAll(extraVolumeMounts); + } + } + + /** Path where the JMX Exporter agent JAR is stored inside the pod. */ + protected static final String JMX_EXPORTER_DIR = "/opt/jmx-exporter"; + protected static final String JMX_EXPORTER_JAR = JMX_EXPORTER_DIR + "/jmx_prometheus_javaagent.jar"; + protected static final String JMX_EXPORTER_CONFIG = JMX_EXPORTER_DIR + "/config.yaml"; + + /** + * Adds the Prometheus JMX Exporter agent infrastructure to a pod spec when + * autoscaling is enabled. This includes: + *

    + *
  • An emptyDir volume for the JMX exporter JAR and config
  • + *
  • An init container that downloads the agent JAR and writes a config file
  • + *
  • A volume mount on the main container
  • + *
  • A container port for the metrics endpoint (9404)
  • + *
  • The javaagent JVM argument appended to SERVICE_OPTS
  • + *
+ * + * @param image the container image (used for the init container) + * @param component the Hive component name (for JMX bean pattern matching) + * @param initContainers list to add the download init container to + * @param volumeMounts list to add the jmx-exporter mount to (main container) + * @param volumes list to add the emptyDir volume to + * @param envVars list of env vars — SERVICE_OPTS will be updated with the javaagent flag + * @param ports list to add the metrics port to + */ + protected static void addJmxExporter( + String image, String component, + List initContainers, + List volumeMounts, + List volumes, + List envVars, + List ports) { + + // Volume for the JMX exporter JAR + config + volumes.add(new VolumeBuilder() + .withName("jmx-exporter") + .withNewEmptyDir().endEmptyDir().build()); + VolumeMount exporterMount = new VolumeMountBuilder() + .withName("jmx-exporter") + .withMountPath(JMX_EXPORTER_DIR).build(); + volumeMounts.add(exporterMount); + + // JMX exporter config: export all beans in a catch-all pattern + // The agent exposes metrics in Prometheus text format at /metrics + String jmxConfig = buildJmxExporterConfig(component); + + // Init container: download JAR + write config + String downloadCmd = String.format( + "wget -q --tries=3 --waitretry=5 -O %s '%s' && " + + "cat > %s << 'JMXEOF'\n%s\nJMXEOF", + JMX_EXPORTER_JAR, ConfigUtils.JMX_EXPORTER_JAR_URL, + JMX_EXPORTER_CONFIG, jmxConfig); + initContainers.add(new ContainerBuilder() + .withName("jmx-exporter-init") + .withImage(image) + .withCommand("/bin/bash", "-c", downloadCmd) + .withVolumeMounts(exporterMount) + .build()); + + // Expose the metrics port + ports.add(new io.fabric8.kubernetes.api.model.ContainerPortBuilder() + .withName("metrics") + .withContainerPort(ConfigUtils.PROMETHEUS_JMX_EXPORTER_PORT) + .withProtocol("TCP").build()); + + // Add javaagent flag to the appropriate JVM opts env var. + // LLAP uses LLAP_DAEMON_OPTS (its startup script ignores SERVICE_OPTS). + String agentArg = String.format("-javaagent:%s=%d:%s", + JMX_EXPORTER_JAR, ConfigUtils.PROMETHEUS_JMX_EXPORTER_PORT, JMX_EXPORTER_CONFIG); + String optsEnvVar = "llap".equals(component) ? "LLAP_DAEMON_OPTS" : "SERVICE_OPTS"; + boolean found = false; + for (int i = 0; i < envVars.size(); i++) { + if (optsEnvVar.equals(envVars.get(i).getName())) { + String existing = envVars.get(i).getValue(); + envVars.set(i, new EnvVar(optsEnvVar, + existing + " " + agentArg, null)); + found = true; + break; + } + } + if (!found) { + envVars.add(new EnvVar(optsEnvVar, agentArg, null)); + } + } + + /** + * Builds the JMX Exporter YAML config for a Hive component. + * Uses broad patterns to export all Hive/Hadoop metrics relevant to autoscaling. + */ + private static String buildJmxExporterConfig(String component) { + StringBuilder sb = new StringBuilder(); + sb.append("lowercaseOutputName: true\n"); + sb.append("lowercaseOutputLabelNames: true\n"); + sb.append("rules:\n"); + + switch (component) { + case "hiveserver2": + // HS2 session and operation metrics + sb.append("- pattern: 'metrics<>Value'\n"); + sb.append(" name: hs2_$1\n"); + sb.append(" type: GAUGE\n"); + sb.append("- pattern: 'metrics<>Count'\n"); + sb.append(" name: hs2_active_calls_$1\n"); + sb.append(" type: GAUGE\n"); + // Tez session pool metrics (pending tasks, backlog ratio, running tasks) + sb.append("- pattern: 'metrics<>Value'\n"); + sb.append(" name: tez_session_$1\n"); + sb.append(" type: GAUGE\n"); + break; + case "metastore": + // HMS API call metrics + sb.append("- pattern: 'metrics<>Count'\n"); + sb.append(" name: api_$1_total\n"); + sb.append(" type: COUNTER\n"); + sb.append("- pattern: 'metrics<>Count'\n"); + sb.append(" name: hive_metastore_open_connections\n"); + sb.append(" type: GAUGE\n"); + break; + case "llap": + // LLAP uses its own MetricsSystem (not DefaultMetricsSystem). + // Default JMX exporter pattern (.*) exports Hadoop Metrics2 MBeans as: + // hadoop_llapdaemon_{name=""} + // e.g., hadoop_llapdaemon_executornumqueuedrequests{name="LlapDaemonExecutorMetrics-..."} + // No custom rules needed — the default naming is usable directly. + sb.append("- pattern: '.*'\n"); + break; + case "tezam": + // TezAM DAG execution metrics + sb.append("- pattern: 'Hadoop<>(.+)'\n"); + sb.append(" name: tez_am_$1\n"); + sb.append(" type: GAUGE\n"); + break; + default: + sb.append("- pattern: '.*'\n"); + break; + } + return sb.toString(); + } + } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveGenericDependentResource.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveGenericDependentResource.java new file mode 100644 index 000000000000..de9fb6351824 --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveGenericDependentResource.java @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.Optional; +import java.util.Set; + + +import io.fabric8.kubernetes.api.model.GenericKubernetesResource; +import io.javaoperatorsdk.operator.api.config.informer.InformerEventSourceConfiguration; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.api.reconciler.EventSourceContext; +import io.javaoperatorsdk.operator.api.reconciler.dependent.GarbageCollected; +import io.javaoperatorsdk.operator.processing.GroupVersionKind; +import io.javaoperatorsdk.operator.processing.dependent.Creator; +import io.javaoperatorsdk.operator.processing.dependent.Updater; +import io.javaoperatorsdk.operator.processing.dependent.kubernetes.GenericKubernetesDependentResource; +import org.apache.hive.kubernetes.operator.model.HiveCluster; + +/** + * Base class for dependent resources that manage custom resources via + * {@link GenericKubernetesResource} (e.g. KEDA ScaledObject, HTTPScaledObject). + *

+ * Extends {@link GenericKubernetesDependentResource} which properly configures + * the informer with the specified GroupVersionKind, avoiding the fabric8 + * "resources cannot be called with a generic type" error. + *

+ * Also overrides {@link #getSecondaryResource} to use the dependent's own + * event source (same pattern as {@link HiveDependentResource}) so multiple + * GenericKubernetesResource dependents don't collide in the type-based lookup. + */ +public abstract class HiveGenericDependentResource + extends GenericKubernetesDependentResource + implements Creator, + Updater, + GarbageCollected { + + protected HiveGenericDependentResource(GroupVersionKind gvk) { + super(gvk); + } + + /** + * Adds a generation-aware update filter so that KEDA/controller status + * patches (which don't increment metadata.generation) do not trigger + * unnecessary reconciliation loops. + */ + @Override + protected InformerEventSourceConfiguration.Builder + informerConfigurationBuilder(EventSourceContext context) { + return super.informerConfigurationBuilder(context) + .withOnUpdateFilter((newResource, oldResource) -> { + Long newGen = newResource.getMetadata().getGeneration(); + Long oldGen = oldResource.getMetadata().getGeneration(); + return !Objects.equals(newGen, oldGen); + }); + } + + /** + * Returns the expected Kubernetes resource name for this dependent given the primary. + * Used to discriminate between multiple secondary resources of the same GVK + * (e.g. multiple ScaledObjects owned by the same HiveCluster). + */ + protected abstract String getResourceName(HiveCluster hiveCluster); + + @Override + public Optional getSecondaryResource( + HiveCluster primary, Context context) { + String expectedName = getResourceName(primary); + Set secondaries = eventSource() + .map(es -> es.getSecondaryResources(primary)) + .orElse(Set.of()); + return secondaries.stream() + .filter(r -> expectedName.equals(r.getMetadata().getName())) + .findFirst(); + } + + /** + * Builds the nested "advanced" HPA behavior configuration for a KEDA ScaledObject. + * + * @param scaleDownStabilization stabilizationWindowSeconds for scale-down + * @param scaleDownPolicyType policy type (e.g. "Pods", "Percent") + * @param scaleDownValue policy value + * @param scaleDownPeriod policy periodSeconds + * @param scaleUpStabilization stabilizationWindowSeconds for scale-up + * @param scaleUpPolicyType policy type (e.g. "Pods", "Percent") + * @param scaleUpValue policy value + * @param scaleUpPeriod policy periodSeconds + */ + protected static Map buildHpaBehavior( + int scaleDownStabilization, String scaleDownPolicyType, + int scaleDownValue, int scaleDownPeriod, + int scaleUpStabilization, String scaleUpPolicyType, + int scaleUpValue, int scaleUpPeriod) { + return Map.of( + "horizontalPodAutoscalerConfig", Map.of( + "behavior", Map.of( + "scaleDown", Map.of( + "stabilizationWindowSeconds", scaleDownStabilization, + "policies", List.of(Map.of( + "type", scaleDownPolicyType, + "value", scaleDownValue, + "periodSeconds", scaleDownPeriod + )) + ), + "scaleUp", Map.of( + "stabilizationWindowSeconds", scaleUpStabilization, + "policies", List.of(Map.of( + "type", scaleUpPolicyType, + "value", scaleUpValue, + "periodSeconds", scaleUpPeriod + )) + ) + ) + ) + ); + } + + /** + * Builds the HS2 cross-component activation trigger used by LLAP and TezAM. + * Uses {@code (max(hs2_open_sessions{...}) > bool 0) or vector(0)} so the + * result is always 0 or 1, preventing zombie sessions from driving proportional scaling. + * Threshold is set to maxReplicas so desired = ceil(1/max) = 1 (activation only). + * + * @param namespace the Kubernetes namespace + * @param hs2TargetName the HS2 deployment name (for pod label matching) + * @param maxReplicas the max replicas of the component (used as threshold) + */ + protected static Map buildHs2ActivationTrigger( + String namespace, String hs2TargetName, int maxReplicas) { + return buildPrometheusTrigger( + "hs2_open_sessions_activation", + String.format( + "(max(hs2_open_sessions{namespace=\"%s\",pod=~\"%s-.*\"}) > bool 0) or vector(0)", + namespace, hs2TargetName), + String.valueOf(maxReplicas)); + } + + /** + * Builds a KEDA Prometheus trigger entry. + * + * @param metricName the KEDA metric name + * @param query the PromQL query + * @param threshold the scaling threshold value + */ + protected static Map buildPrometheusTrigger( + String metricName, String query, String threshold) { + return Map.of( + "type", "prometheus", + "metadata", Map.of( + "serverAddress", "http://prometheus-server.monitoring.svc.cluster.local", + "metricName", metricName, + "query", query, + "threshold", threshold, + "activationThreshold", "0" + ) + ); + } + + /** + * Builds a KEDA CPU AverageValue trigger if both targetCpuValue and + * activationCpuValue are configured. Returns null if CPU scaling is + * not configured, or if resources are missing (logs a warning). + * + * @param autoscaling the autoscaling spec + * @param resources the pod resource spec (null means not set) + * @param componentName component name for the warning message + * @param log the logger to use for warnings + */ + protected static Map buildCpuTrigger( + org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec autoscaling, + Object resources, String componentName, + org.slf4j.Logger log) { + if (autoscaling.targetCpuValue() == null || autoscaling.activationCpuValue() == null) { + return null; + } + if (resources == null) { + log.warn("targetCpuValue is set for {}, but no pod resources are defined. " + + "Skipping CPU trigger to prevent erratic scaling.", componentName); + return null; + } + return Map.of( + "type", "cpu", + "metricType", "AverageValue", + "metadata", Map.of( + "value", autoscaling.targetCpuValue(), + "activationValue", autoscaling.activationCpuValue() + ) + ); + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HivePdbDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HivePdbDependent.java new file mode 100644 index 000000000000..2942a5b674bf --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HivePdbDependent.java @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import io.fabric8.kubernetes.api.model.IntOrString; +import io.fabric8.kubernetes.api.model.policy.v1.PodDisruptionBudget; +import io.fabric8.kubernetes.api.model.policy.v1.PodDisruptionBudgetBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.api.config.informer.Informer; +import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.util.Labels; + +/** + * Unified PodDisruptionBudget dependent resource for all Hive components. + * Ensures at least one pod remains available during voluntary disruptions + * (scale-down, node drain, rolling updates). + *

+ * Subclassed per component (HS2, Metastore, LLAP, TezAM) only to satisfy + * JOSDK's requirement for distinct no-arg-constructible classes in the workflow. + */ +public abstract class HivePdbDependent + extends HiveDependentResource { + + private final String component; + + protected HivePdbDependent(String component) { + super(PodDisruptionBudget.class); + this.component = component; + } + + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return primary.getMetadata().getName() + "-" + component + "-pdb"; + } + + @Override + protected PodDisruptionBudget desired(HiveCluster hiveCluster, + Context context) { + return new PodDisruptionBudgetBuilder() + .withNewMetadata() + .withName(hiveCluster.getMetadata().getName() + "-" + component + "-pdb") + .withNamespace(hiveCluster.getMetadata().getNamespace()) + .withLabels(Labels.forComponent(hiveCluster, component)) + .endMetadata() + .withNewSpec() + .withMinAvailable(new IntOrString(1)) + .withNewSelector() + .withMatchLabels(Labels.selectorForComponent(hiveCluster, component)) + .endSelector() + .endSpec() + .build(); + } + + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=hiveserver2," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class HiveServer2 extends HivePdbDependent { + public HiveServer2() { super("hiveserver2"); } + } + + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=metastore," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class Metastore extends HivePdbDependent { + public Metastore() { super("metastore"); } + } + + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=llap," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class Llap extends HivePdbDependent { + public Llap() { super("llap"); } + } + + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=tezam," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class TezAm extends HivePdbDependent { + public TezAm() { super("tezam"); } + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveScaledObjectDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveScaledObjectDependent.java new file mode 100644 index 000000000000..52a3624639ca --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveScaledObjectDependent.java @@ -0,0 +1,373 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import io.fabric8.kubernetes.api.model.GenericKubernetesResource; +import io.fabric8.kubernetes.api.model.GenericKubernetesResourceBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.processing.GroupVersionKind; +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; +import org.apache.hive.kubernetes.operator.util.ConfigUtils; +import org.apache.hive.kubernetes.operator.util.Labels; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Unified KEDA ScaledObject dependent resource for metric-based autoscaling. + * Subclassed per component to define component-specific triggers, HPA behavior, + * and target workload kind. + *

+ * Note: When HS2 minReplicas is 0, the ScaledObject includes an external-push + * trigger from the KEDA HTTP Add-on (via InterceptorRoute) for wake-from-zero. + */ +public abstract class HiveScaledObjectDependent extends HiveGenericDependentResource { + + private static final Logger LOG = LoggerFactory.getLogger(HiveScaledObjectDependent.class); + + private final String component; + private final String targetKind; + + protected HiveScaledObjectDependent(String component, String targetKind) { + super(new GroupVersionKind("keda.sh", "v1alpha1", "ScaledObject")); + this.component = component; + this.targetKind = targetKind; + } + + @Override + protected GenericKubernetesResource desired(HiveCluster hiveCluster, + Context context) { + AutoscalingSpec autoscaling = getAutoscalingSpec(hiveCluster); + int maxReplicas = getMaxReplicas(hiveCluster); + String targetName = hiveCluster.getMetadata().getName() + "-" + component; + + Map spec = new HashMap<>(); + spec.put("scaleTargetRef", Map.of( + "apiVersion", "apps/v1", + "kind", targetKind, + "name", targetName + )); + spec.put("minReplicaCount", autoscaling.minReplicas()); + spec.put("maxReplicaCount", maxReplicas); + spec.put("cooldownPeriod", autoscaling.cooldownSeconds()); + spec.put("pollingInterval", getPollingInterval()); + spec.put("advanced", getAdvanced(hiveCluster, autoscaling, maxReplicas)); + spec.put("triggers", getTriggers(hiveCluster, autoscaling, maxReplicas, targetName)); + + return new GenericKubernetesResourceBuilder() + .withApiVersion("keda.sh/v1alpha1") + .withKind("ScaledObject") + .withNewMetadata() + .withName(targetName + "-scaledobject") + .withNamespace(hiveCluster.getMetadata().getNamespace()) + .withLabels(Labels.forComponent(hiveCluster, component)) + .endMetadata() + .withAdditionalProperties(Map.of("spec", spec)) + .build(); + } + + @Override + protected String getResourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-" + component + "-scaledobject"; + } + + /** Returns the autoscaling spec for the component. */ + protected abstract AutoscalingSpec getAutoscalingSpec(HiveCluster hiveCluster); + + /** Returns max replicas (typically the static replicas count from spec). */ + protected abstract int getMaxReplicas(HiveCluster hiveCluster); + + /** Returns the KEDA polling interval in seconds. */ + protected abstract int getPollingInterval(); + + /** Returns the "advanced" section (HPA behavior configuration). */ + protected abstract Map getAdvanced( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, int maxReplicas); + + /** Returns the list of KEDA triggers. */ + protected abstract List> getTriggers( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, + int maxReplicas, String targetName); + + /** + * HiveServer2 ScaledObject: scales on hs2_active_sessions + CPU. + */ + public static class HiveServer2 extends HiveScaledObjectDependent { + public HiveServer2() { + super("hiveserver2", "Deployment"); + } + + @Override + protected AutoscalingSpec getAutoscalingSpec(HiveCluster hiveCluster) { + return hiveCluster.getSpec().hiveServer2().autoscaling(); + } + + @Override + protected int getMaxReplicas(HiveCluster hiveCluster) { + return hiveCluster.getSpec().hiveServer2().replicas(); + } + + @Override + protected int getPollingInterval() { + return 30; + } + + @Override + protected Map getAdvanced( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, int maxReplicas) { + return buildHpaBehavior( + autoscaling.scaleDownStabilizationSeconds(), "Pods", 1, 60, + autoscaling.scaleUpStabilizationSeconds(), "Percent", 100, 60); + } + + @Override + protected List> getTriggers( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, + int maxReplicas, String targetName) { + List> triggers = new ArrayList<>(); + // Use sum() so KEDA computes desired replicas from total session count. + // desired = ceil(sum / threshold). With sum=2, threshold=1: desired=2 + // → prevents premature scale-down while sessions are active. + // avg() would divide across pods, hiding load and causing scale-down + // of pods with sessions. + triggers.add(buildPrometheusTrigger( + "hs2_open_sessions", + String.format( + "sum(hs2_open_sessions{namespace=\"%s\",pod=~\"%s-.*\"}) or vector(0)", + hiveCluster.getMetadata().getNamespace(), targetName), + String.valueOf(autoscaling.scaleUpThreshold()))); + Map cpuTrigger = buildCpuTrigger( + autoscaling, hiveCluster.getSpec().hiveServer2().resources(), + "HiveServer2", LOG); + if (cpuTrigger != null) { + triggers.add(cpuTrigger); + } + // When scale-to-zero is enabled, add KEDA HTTP Add-on external-push + // trigger to wake HS2 from 0 when requests arrive at the interceptor. + if (autoscaling.minReplicas() == 0) { + String routeName = HiveServer2InterceptorRouteDependent.resourceName(hiveCluster); + triggers.add(Map.of( + "type", "external-push", + "metadata", Map.of( + "scalerAddress", + "keda-add-ons-http-external-scaler.keda:9090", + "interceptorRoute", routeName + ) + )); + } + return triggers; + } + } + + /** + * Metastore ScaledObject: scales on open_connections + CPU. + */ + public static class Metastore extends HiveScaledObjectDependent { + public Metastore() { + super("metastore", "Deployment"); + } + + @Override + protected AutoscalingSpec getAutoscalingSpec(HiveCluster hiveCluster) { + return hiveCluster.getSpec().metastore().autoscaling(); + } + + @Override + protected int getMaxReplicas(HiveCluster hiveCluster) { + return hiveCluster.getSpec().metastore().replicas(); + } + + @Override + protected int getPollingInterval() { + return 30; + } + + @Override + protected Map getAdvanced( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, int maxReplicas) { + return buildHpaBehavior( + autoscaling.scaleDownStabilizationSeconds(), "Pods", 1, 60, + autoscaling.scaleUpStabilizationSeconds(), "Percent", 50, 60); + } + + @Override + protected List> getTriggers( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, + int maxReplicas, String targetName) { + List> triggers = new ArrayList<>(); + // HMS runs in HTTP transport mode — connections are per-request (stateless), + // so open_connections is always ~0. Use aggregate API request rate instead. + // Note: Prometheus 3.x rejects rate() on __name__ regex selectors, so we + // compute rate manually as (sum(counters) - sum(counters offset 2m)) / 120. + triggers.add(buildPrometheusTrigger( + "hive_metastore_api_rate", + String.format( + "(sum({__name__=~\"api_.+_total\",namespace=\"%s\",pod=~\"%s-.*\"})" + + " - sum({__name__=~\"api_.+_total\",namespace=\"%s\",pod=~\"%s-.*\"} offset 2m))" + + " / 120 or vector(0)", + hiveCluster.getMetadata().getNamespace(), targetName, + hiveCluster.getMetadata().getNamespace(), targetName), + String.valueOf(autoscaling.scaleUpThreshold()))); + Map cpuTrigger = buildCpuTrigger( + autoscaling, hiveCluster.getSpec().metastore().resources(), + "Metastore", LOG); + if (cpuTrigger != null) { + triggers.add(cpuTrigger); + } + return triggers; + } + } + + /** + * LLAP ScaledObject: scales on NumQueuedRequests + HS2 activation trigger. + * Scale-down is slow (preserves in-memory cache). + */ + public static class Llap extends HiveScaledObjectDependent { + public Llap() { + super("llap", "StatefulSet"); + } + + @Override + protected AutoscalingSpec getAutoscalingSpec(HiveCluster hiveCluster) { + return hiveCluster.getSpec().llap().autoscaling(); + } + + @Override + protected int getMaxReplicas(HiveCluster hiveCluster) { + return hiveCluster.getSpec().llap().replicas(); + } + + @Override + protected int getPollingInterval() { + return 5; + } + + @Override + protected Map getAdvanced( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, int maxReplicas) { + // Scale-up stabilization=0: LLAP is a reactive dependent that must + // track HS2 immediately — no delay on scale-up. + return buildHpaBehavior( + autoscaling.scaleDownStabilizationSeconds(), "Pods", 1, autoscaling.scaleDownStabilizationSeconds(), + 0, "Pods", maxReplicas, 15); + } + + @Override + protected List> getTriggers( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, + int maxReplicas, String targetName) { + String hs2TargetName = hiveCluster.getMetadata().getName() + "-hiveserver2"; + String namespace = hiveCluster.getMetadata().getNamespace(); + return List.of( + buildPrometheusTrigger( + "llap_total_busy_slots", + String.format( + "avg(" + + "hadoop_llapdaemon_executornumqueuedrequests{namespace=\"%1$s\",pod=~\"%2$s-.*\"}" + + " + on(pod) hadoop_llapdaemon_executornumexecutorsconfigured{namespace=\"%1$s\",pod=~\"%2$s-.*\"}" + + " - on(pod) hadoop_llapdaemon_executornumexecutorsavailable{namespace=\"%1$s\",pod=~\"%2$s-.*\"}" + + ") or vector(0)", + namespace, targetName), + String.valueOf(autoscaling.scaleUpThreshold())), + buildHs2ActivationTrigger(namespace, hs2TargetName, maxReplicas) + ); + } + } + + /** + * TezAM ScaledObject: scales on HS2 session demand. + * Each HS2 pod claims {@code sessions.per.default.queue} TezAM sessions + * (exclusive binding). Demand = active HS2 pods × sessions per queue. + * Primary trigger: count of HS2 pods with open sessions × sessions_per_queue. + */ + public static class TezAm extends HiveScaledObjectDependent { + public TezAm() { + super("tezam", "StatefulSet"); + } + + @Override + protected AutoscalingSpec getAutoscalingSpec(HiveCluster hiveCluster) { + return hiveCluster.getSpec().tezAm().autoscaling(); + } + + @Override + protected int getMaxReplicas(HiveCluster hiveCluster) { + return hiveCluster.getSpec().tezAm().replicas(); + } + + @Override + protected int getPollingInterval() { + return 5; + } + + @Override + protected Map getAdvanced( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, int maxReplicas) { + // Scale-up stabilization=0: TezAM is a reactive dependent that must + // track HS2 sessions immediately — no delay on scale-up. + return buildHpaBehavior( + autoscaling.scaleDownStabilizationSeconds(), "Pods", 1, 60, + 0, "Pods", maxReplicas, 15); + } + + @Override + protected List> getTriggers( + HiveCluster hiveCluster, AutoscalingSpec autoscaling, + int maxReplicas, String targetName) { + String hs2TargetName = hiveCluster.getMetadata().getName() + "-hiveserver2"; + String namespace = hiveCluster.getMetadata().getNamespace(); + + // Read sessions.per.default.queue from HS2 configOverrides (default 1). + // Each HS2 pod pre-warms this many TezAM sessions in its pool. + int sessionsPerQueue = ConfigUtils.getInt( + hiveCluster.getSpec().hiveServer2().configOverrides(), + ConfigUtils.HIVE_SERVER2_TEZ_SESSIONS_PER_QUEUE_KEY, + null, ConfigUtils.HIVE_SERVER2_TEZ_SESSIONS_PER_QUEUE_DEFAULT); + + List> triggers = new ArrayList<>(); + // Trigger 1: Concurrent demand — total open sessions across all HS2 pods. + // Each session may run a query needing its own TezAM. + // threshold=1 → desired = total open sessions. + triggers.add(buildPrometheusTrigger( + "hs2_tezam_session_demand", + String.format( + "sum(hs2_open_sessions{namespace=\"%s\",pod=~\"%s-.*\"}) or vector(0)", + namespace, hs2TargetName), + "1")); + // Trigger 2: Pre-warm — each running HS2 pod needs sessions_per_queue TezAMs + // in its pool (claimed eagerly at startup by default). + // threshold=1 → desired = HS2_pod_count × sessions_per_queue. + triggers.add(buildPrometheusTrigger( + "hs2_tezam_prewarm", + String.format( + "count(hs2_open_sessions{namespace=\"%s\",pod=~\"%s-.*\"}) * %d or vector(0)", + namespace, hs2TargetName, sessionsPerQueue), + "1")); + // KEDA uses max(trigger1, trigger2) → ensures enough TezAMs for both + // concurrent queries AND per-HS2 pre-warm pools. + triggers.add(buildHs2ActivationTrigger(namespace, hs2TargetName, maxReplicas)); + return triggers; + } + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ConfigMapDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ConfigMapDependent.java deleted file mode 100644 index 9bb0597cc960..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ConfigMapDependent.java +++ /dev/null @@ -1,72 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import java.util.Map; - -import io.fabric8.kubernetes.api.model.ConfigMap; -import io.fabric8.kubernetes.api.model.ConfigMapBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; -import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; -import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the hive-site.xml ConfigMap for HiveServer2. */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=hiveserver2," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class HiveServer2ConfigMapDependent - extends HiveDependentResource { - - public static final String COMPONENT = "hiveserver2"; - - public HiveServer2ConfigMapDependent() { - super(ConfigMap.class); - } - - @Override - protected ConfigMap desired(HiveCluster hiveCluster, - Context context) { - HiveClusterSpec spec = hiveCluster.getSpec(); - - Map props = - HiveConfigBuilder.getHiveServer2HiveSite(hiveCluster, spec); - Map tezProps = HiveConfigBuilder.getTezSite(spec); - - return new ConfigMapBuilder() - .withNewMetadata() - .withName(resourceName(hiveCluster)) - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) - .endMetadata() - .addToData("hive-site.xml", HadoopXmlBuilder.buildXml(props)) - .addToData("tez-site.xml", HadoopXmlBuilder.buildXml(tezProps)) - .build(); - } - - /** Returns the ConfigMap resource name for this HiveCluster. */ - public static String resourceName(HiveCluster hiveCluster) { - return hiveCluster.getMetadata().getName() + "-hiveserver2-config"; - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2DeploymentDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2DeploymentDependent.java index ccb3048dea98..20d27a46b0f7 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2DeploymentDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2DeploymentDependent.java @@ -34,6 +34,7 @@ import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; import org.apache.hive.kubernetes.operator.model.spec.HiveServer2Spec; import org.apache.hive.kubernetes.operator.util.ConfigUtils; import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; @@ -55,6 +56,12 @@ public HiveServer2DeploymentDependent() { super(Deployment.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected Deployment desired(HiveCluster hiveCluster, Context context) { @@ -125,21 +132,28 @@ protected Deployment desired(HiveCluster hiveCluster, hs2.configOverrides(), ConfigUtils.HIVE_SERVER2_THRIFT_PORT_KEY, null, ConfigUtils.HIVE_SERVER2_THRIFT_PORT_DEFAULT); + int hs2HttpPort = ConfigUtils.getInt( + hs2.configOverrides(), + ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT); int hs2WebUiPort = ConfigUtils.getInt( hs2.configOverrides(), ConfigUtils.HIVE_SERVER2_WEBUI_PORT_KEY, null, ConfigUtils.HIVE_SERVER2_WEBUI_PORT_DEFAULT); - List ports = List.of( - new ContainerPortBuilder() - .withName("thrift") - .withContainerPort(hs2ThriftPort).build(), - new ContainerPortBuilder() - .withName("webui") - .withContainerPort(hs2WebUiPort).build() - ); + List ports = new ArrayList<>(); + ports.add(new ContainerPortBuilder() + .withName("thrift") + .withContainerPort(hs2ThriftPort).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("http") + .withContainerPort(hs2HttpPort).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("webui") + .withContainerPort(hs2WebUiPort).withProtocol("TCP").build()); - Probe readinessProbe = buildTcpProbe(hs2ThriftPort, hs2.readinessProbe(), 15, 10, 3); - Probe livenessProbe = buildTcpProbe(hs2ThriftPort, hs2.livenessProbe(), 120, 30, 10); + // Probes target the HTTP transport port (default mode) + Probe readinessProbe = buildTcpProbe(hs2HttpPort, hs2.readinessProbe(), 15, 10, 3); + Probe livenessProbe = buildTcpProbe(hs2HttpPort, hs2.livenessProbe(), 120, 30, 10); boolean tezAmEnabled = spec.tezAm().isEnabled(); @@ -155,8 +169,8 @@ protected Deployment desired(HiveCluster hiveCluster, List volumes = new ArrayList<>(); volumes.add(buildProjectedConfigVolume("hive-config", - HiveServer2ConfigMapDependent.resourceName(hiveCluster), - HadoopConfigMapDependent.resourceName(hiveCluster))); + HiveConfigMapDependent.HiveServer2.resourceName(hiveCluster), + HiveConfigMapDependent.Hadoop.resourceName(hiveCluster))); if (tezAmEnabled) { volumeMounts.add( @@ -185,6 +199,13 @@ protected Deployment desired(HiveCluster hiveCluster, replaceConfMountWithSubPaths(volumeMounts, "hive-config", "hive-site.xml", "tez-site.xml", "core-site.xml"); + // Add Prometheus JMX Exporter when autoscaling is enabled + AutoscalingSpec autoscaling = hs2.autoscaling(); + if (autoscaling.isEnabled()) { + addJmxExporter(spec.image(), COMPONENT, + initContainers, volumeMounts, volumes, envVars, ports); + } + // Pre-compute config hash for the pod template annotation. // This ensures the Deployment is created with the correct hash // from the start (single ReplicaSet) and triggers rolling @@ -194,6 +215,13 @@ protected Deployment desired(HiveCluster hiveCluster, HadoopXmlBuilder.buildXml(HiveConfigBuilder.getTezSite(spec)), HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHadoopCoreSite(spec))); + // When autoscaling is enabled, preserve current replica count (KEDA/HPA manages it). + AutoscalingSpec hs2Autoscaling = hs2.autoscaling(); + int initialReplicas = hs2Autoscaling != null && hs2Autoscaling.minReplicas() == 0 + ? 0 : hs2.replicas(); + Integer replicas = resolveReplicaCount( + hiveCluster, context, hs2Autoscaling, hs2.replicas(), initialReplicas); + Deployment deployment = new DeploymentBuilder() .withNewMetadata() .withName(resourceName(hiveCluster)) @@ -201,7 +229,7 @@ protected Deployment desired(HiveCluster hiveCluster, .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) .endMetadata() .withNewSpec() - .withReplicas(hs2.replicas()) + .withReplicas(replicas) .withNewSelector() .withMatchLabels(selectorLabels) .endSelector() @@ -233,21 +261,28 @@ protected Deployment desired(HiveCluster hiveCluster, applySpreadAffinityIfAbsent( deployment.getSpec().getTemplate().getSpec(), selectorLabels); - if (spec.volumes() != null) { - deployment.getSpec().getTemplate().getSpec().getVolumes().addAll(spec.volumes()); - } - if (spec.volumeMounts() != null) { - deployment.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(spec.volumeMounts()); - } - if (hs2.extraVolumes() != null) { - deployment.getSpec().getTemplate().getSpec().getVolumes().addAll(hs2.extraVolumes()); - } - if (hs2.extraVolumeMounts() != null) { - deployment.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(hs2.extraVolumeMounts()); + // Graceful scale-down: deregister from ZK, then poll JMX Exporter (port 9404) for sessions. + if (autoscaling.isEnabled()) { + List zkDeregister = List.of( + "echo '[preStop] Deregistering HiveServer2 from ZooKeeper...'", + "hive --service hiveserver2 --deregister $(hive --service version 2>/dev/null | head -1 || echo '4.0.0')" + + " || echo '[preStop] WARNING: ZK deregister failed'"); + String preStopScript = buildDrainScript( + "Waiting for open sessions to drain", + "hs2_open_sessions", "SESSIONS", + "All sessions drained. Shutting down.", + 5, 6, zkDeregister); + applyAutoscalingLifecycle( + deployment.getSpec().getTemplate().getSpec(), + deployment.getSpec().getTemplate().getMetadata(), + preStopScript, autoscaling.gracePeriodSeconds(), + autoscaling.metricsScrapeIntervalSeconds()); } + appendUserVolumes(deployment.getSpec().getTemplate().getSpec(), + spec.volumes(), spec.volumeMounts(), + hs2.extraVolumes(), hs2.extraVolumeMounts()); + return deployment; } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2HttpScaledObjectDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2HttpScaledObjectDependent.java new file mode 100644 index 000000000000..055bd878d2f3 --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2HttpScaledObjectDependent.java @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import io.fabric8.kubernetes.api.model.GenericKubernetesResource; +import io.fabric8.kubernetes.api.model.GenericKubernetesResourceBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.processing.GroupVersionKind; +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; +import org.apache.hive.kubernetes.operator.util.ConfigUtils; +import org.apache.hive.kubernetes.operator.util.Labels; + +/** + * Manages a KEDA HTTPScaledObject for HiveServer2 scale-to-zero. + *

+ * Requires the KEDA HTTP Add-on to be installed in the cluster. + * The HTTP Add-on creates an interceptor proxy that: + *

    + *
  • Sits in front of the HS2 Service
  • + *
  • Queues incoming beeline/HTTP requests when HS2 has 0 pods
  • + *
  • Triggers KEDA to scale HS2 from 0 to 1
  • + *
  • Forwards the queued request once a pod is ready
  • + *
+ *

+ * This dependent is activated ONLY when minReplicas == 0 (scale-to-zero mode). + * When minReplicas > 0, the regular ScaledObject (Prometheus-based) is used instead. + */ +public class HiveServer2HttpScaledObjectDependent extends HiveGenericDependentResource { + + public HiveServer2HttpScaledObjectDependent() { + super(new GroupVersionKind("http.keda.sh", "v1alpha1", "HTTPScaledObject")); + } + + @Override + protected GenericKubernetesResource desired(HiveCluster hiveCluster, + Context context) { + AutoscalingSpec autoscaling = hiveCluster.getSpec().hiveServer2().autoscaling(); + int maxReplicas = hiveCluster.getSpec().hiveServer2().replicas(); + String clusterName = hiveCluster.getMetadata().getName(); + String namespace = hiveCluster.getMetadata().getNamespace(); + String deploymentName = clusterName + "-hiveserver2"; + String serviceName = clusterName + "-hiveserver2"; + + int httpPort = ConfigUtils.getInt( + hiveCluster.getSpec().hiveServer2().configOverrides(), + ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT); + + Map spec = new HashMap<>(); + + // Hosts the interceptor matches for routing. + // Includes: internal service FQDN, short name, interceptor proxy name + // (for in-cluster kubectl exec), and localhost (for port-forward). + spec.put("hosts", List.of( + serviceName + "." + namespace + ".svc.cluster.local", + serviceName, + "keda-add-ons-http-interceptor-proxy.keda.svc", + "localhost" + )); + spec.put("pathPrefixes", List.of("/")); + + // Target deployment and service + spec.put("scaleTargetRef", Map.of( + "name", deploymentName, + "kind", "Deployment", + "apiVersion", "apps/v1", + "service", serviceName, + "port", httpPort + )); + + // Replica bounds + spec.put("replicas", Map.of( + "min", 0, + "max", maxReplicas + )); + + // Scaling metric: scale up when there are pending requests + spec.put("scalingMetric", Map.of( + "requestRate", Map.of( + "granularity", "1s", + "targetValue", autoscaling.scaleUpThreshold(), + "window", "1m" + ) + )); + + // Cooldown before scaling back to 0 + spec.put("scaledownPeriod", autoscaling.cooldownSeconds()); + + return new GenericKubernetesResourceBuilder() + .withApiVersion("http.keda.sh/v1alpha1") + .withKind("HTTPScaledObject") + .withNewMetadata() + .withName(resourceName(hiveCluster)) + .withNamespace(namespace) + .withLabels(Labels.forComponent(hiveCluster, "hiveserver2")) + .endMetadata() + .withAdditionalProperties(Map.of("spec", spec)) + .build(); + } + + @Override + protected String getResourceName(HiveCluster hiveCluster) { + return resourceName(hiveCluster); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-hiveserver2-httpso"; + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2InterceptorRouteDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2InterceptorRouteDependent.java new file mode 100644 index 000000000000..de6e3bb71d5c --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2InterceptorRouteDependent.java @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import io.fabric8.kubernetes.api.model.GenericKubernetesResource; +import io.fabric8.kubernetes.api.model.GenericKubernetesResourceBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.processing.GroupVersionKind; +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; +import org.apache.hive.kubernetes.operator.util.ConfigUtils; +import org.apache.hive.kubernetes.operator.util.Labels; + +/** + * Manages a KEDA InterceptorRoute for HiveServer2 scale-to-zero routing. + *

+ * Unlike HTTPScaledObject, InterceptorRoute only configures interceptor + * routing without auto-creating a ScaledObject. This allows us to manage + * scaling entirely through a single Prometheus-based ScaledObject that + * combines session/CPU awareness with the HTTP interceptor wake-from-zero + * trigger. + *

+ * Requires the KEDA HTTP Add-on to be installed in the cluster. + */ +public class HiveServer2InterceptorRouteDependent extends HiveGenericDependentResource { + + public HiveServer2InterceptorRouteDependent() { + super(new GroupVersionKind("http.keda.sh", "v1beta1", "InterceptorRoute")); + } + + @Override + protected GenericKubernetesResource desired(HiveCluster hiveCluster, + Context context) { + AutoscalingSpec autoscaling = hiveCluster.getSpec().hiveServer2().autoscaling(); + String clusterName = hiveCluster.getMetadata().getName(); + String namespace = hiveCluster.getMetadata().getNamespace(); + String serviceName = clusterName + "-hiveserver2"; + + int httpPort = ConfigUtils.getInt( + hiveCluster.getSpec().hiveServer2().configOverrides(), + ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT); + + // Hosts the interceptor matches for routing + List hosts = new ArrayList<>(List.of( + serviceName + "." + namespace + ".svc.cluster.local", + serviceName, + "keda-add-ons-http-interceptor-proxy.keda.svc", + "localhost" + )); + + Map spec = new HashMap<>(); + + // Target backend service + spec.put("target", Map.of( + "service", serviceName, + "port", httpPort + )); + + // Routing rules + spec.put("rules", List.of( + Map.of( + "hosts", hosts, + "paths", List.of(Map.of("value", "/")) + ) + )); + + // Scaling metric (required field, used by interceptor for queue management) + spec.put("scalingMetric", Map.of( + "concurrency", Map.of( + "targetValue", autoscaling.scaleUpThreshold() + ) + )); + + return new GenericKubernetesResourceBuilder() + .withApiVersion("http.keda.sh/v1beta1") + .withKind("InterceptorRoute") + .withNewMetadata() + .withName(resourceName(hiveCluster)) + .withNamespace(namespace) + .withLabels(Labels.forComponent(hiveCluster, "hiveserver2")) + .endMetadata() + .withAdditionalProperties(Map.of("spec", spec)) + .build(); + } + + @Override + protected String getResourceName(HiveCluster hiveCluster) { + return resourceName(hiveCluster); + } + + public static String resourceName(HiveCluster hiveCluster) { + return hiveCluster.getMetadata().getName() + "-hiveserver2-route"; + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ServiceDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ServiceDependent.java deleted file mode 100644 index a9707ac0dfa6..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServer2ServiceDependent.java +++ /dev/null @@ -1,79 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import io.fabric8.kubernetes.api.model.IntOrString; -import io.fabric8.kubernetes.api.model.Service; -import io.fabric8.kubernetes.api.model.ServiceBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.model.spec.HiveServer2Spec; -import org.apache.hive.kubernetes.operator.util.ConfigUtils; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the Kubernetes Service for HiveServer2 (Thrift and WebUI ports). */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=hiveserver2," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class HiveServer2ServiceDependent - extends HiveDependentResource { - - public HiveServer2ServiceDependent() { - super(Service.class); - } - - @Override - protected Service desired(HiveCluster hiveCluster, - Context context) { - HiveServer2Spec hs2 = hiveCluster.getSpec().hiveServer2(); - int thriftPort = ConfigUtils.getInt(hs2.configOverrides(), - ConfigUtils.HIVE_SERVER2_THRIFT_PORT_KEY, - null, ConfigUtils.HIVE_SERVER2_THRIFT_PORT_DEFAULT); - int webUiPort = ConfigUtils.getInt(hs2.configOverrides(), - ConfigUtils.HIVE_SERVER2_WEBUI_PORT_KEY, - null, ConfigUtils.HIVE_SERVER2_WEBUI_PORT_DEFAULT); - - return new ServiceBuilder() - .withNewMetadata() - .withName(hiveCluster.getMetadata().getName() + "-hiveserver2") - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, - HiveServer2DeploymentDependent.COMPONENT)) - .endMetadata() - .withNewSpec() - .withType(hs2.serviceType()) - .withSelector(Labels.selectorForComponent(hiveCluster, - HiveServer2DeploymentDependent.COMPONENT)) - .addNewPort() - .withName("thrift") - .withPort(thriftPort) - .withTargetPort(new IntOrString(thriftPort)) - .endPort() - .addNewPort() - .withName("webui") - .withPort(webUiPort) - .withTargetPort(new IntOrString(webUiPort)) - .endPort() - .endSpec() - .build(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServiceDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServiceDependent.java new file mode 100644 index 000000000000..edd048e8a322 --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/HiveServiceDependent.java @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.dependent; + +import io.fabric8.kubernetes.api.model.IntOrString; +import io.fabric8.kubernetes.api.model.Service; +import io.fabric8.kubernetes.api.model.ServiceBuilder; +import io.javaoperatorsdk.operator.api.reconciler.Context; +import io.javaoperatorsdk.operator.api.config.informer.Informer; +import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; +import org.apache.hive.kubernetes.operator.model.HiveCluster; +import org.apache.hive.kubernetes.operator.util.ConfigUtils; +import org.apache.hive.kubernetes.operator.util.Labels; + +/** + * Unified Kubernetes Service dependent for all Hive components. + * Subclassed per component to define component-specific service type and ports. + */ +public abstract class HiveServiceDependent + extends HiveDependentResource { + + private final String component; + + protected HiveServiceDependent(String component) { + super(Service.class); + this.component = component; + } + + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return primary.getMetadata().getName() + "-" + component; + } + + @Override + protected Service desired(HiveCluster hiveCluster, + Context context) { + ServiceBuilder builder = new ServiceBuilder() + .withNewMetadata() + .withName(hiveCluster.getMetadata().getName() + "-" + component) + .withNamespace(hiveCluster.getMetadata().getNamespace()) + .withLabels(Labels.forComponent(hiveCluster, component)) + .endMetadata() + .withNewSpec() + .withSelector(Labels.selectorForComponent(hiveCluster, component)) + .endSpec(); + customizeSpec(builder, hiveCluster); + return builder.build(); + } + + /** Subclasses override to set service type and add ports. */ + protected abstract void customizeSpec(ServiceBuilder builder, HiveCluster hiveCluster); + + /** HiveServer2 Service: configurable type, thrift + http + webui ports. */ + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=hiveserver2," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class HiveServer2 extends HiveServiceDependent { + public HiveServer2() { + super("hiveserver2"); + } + + @Override + protected void customizeSpec(ServiceBuilder builder, HiveCluster hiveCluster) { + var hs2 = hiveCluster.getSpec().hiveServer2(); + int thriftPort = ConfigUtils.getInt(hs2.configOverrides(), + ConfigUtils.HIVE_SERVER2_THRIFT_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_THRIFT_PORT_DEFAULT); + int httpPort = ConfigUtils.getInt(hs2.configOverrides(), + ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT); + int webUiPort = ConfigUtils.getInt(hs2.configOverrides(), + ConfigUtils.HIVE_SERVER2_WEBUI_PORT_KEY, + null, ConfigUtils.HIVE_SERVER2_WEBUI_PORT_DEFAULT); + builder.editSpec() + .withType(hs2.serviceType()) + .withSessionAffinity("ClientIP") + .addNewPort().withName("thrift").withProtocol("TCP") + .withPort(thriftPort).withTargetPort(new IntOrString(thriftPort)).endPort() + .addNewPort().withName("http").withProtocol("TCP") + .withPort(httpPort).withTargetPort(new IntOrString(httpPort)).endPort() + .addNewPort().withName("webui").withProtocol("TCP") + .withPort(webUiPort).withTargetPort(new IntOrString(webUiPort)).endPort() + .endSpec(); + } + } + + /** Metastore Service: ClusterIP, thrift + rest ports. */ + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=metastore," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class Metastore extends HiveServiceDependent { + public Metastore() { + super("metastore"); + } + + @Override + protected void customizeSpec(ServiceBuilder builder, HiveCluster hiveCluster) { + int thriftPort = ConfigUtils.getInt( + hiveCluster.getSpec().metastore().configOverrides(), + ConfigUtils.METASTORE_THRIFT_PORT_KEY, + ConfigUtils.METASTORE_THRIFT_PORT_HIVE_KEY, + ConfigUtils.METASTORE_THRIFT_PORT_DEFAULT); + builder.editSpec() + .withType("ClusterIP") + .addNewPort().withName("thrift").withProtocol("TCP") + .withPort(thriftPort).withTargetPort(new IntOrString(thriftPort)).endPort() + .addNewPort().withName("rest").withProtocol("TCP") + .withPort(9001).withTargetPort(new IntOrString(9001)).endPort() + .endSpec(); + } + } + + /** LLAP headless Service: required by StatefulSet for stable DNS. */ + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=llap," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class Llap extends HiveServiceDependent { + public Llap() { + super("llap"); + } + + @Override + protected void customizeSpec(ServiceBuilder builder, HiveCluster hiveCluster) { + builder.editSpec() + .withClusterIP("None") + .addNewPort().withName("management").withProtocol("TCP") + .withPort(15004).withTargetPort(new IntOrString(15004)).endPort() + .addNewPort().withName("shuffle").withProtocol("TCP") + .withPort(15551).withTargetPort(new IntOrString(15551)).endPort() + .addNewPort().withName("web").withProtocol("TCP") + .withPort(15002).withTargetPort(new IntOrString(15002)).endPort() + .endSpec(); + } + } + + /** TezAM headless Service: required by StatefulSet for stable DNS. */ + @KubernetesDependent( + informer = @Informer(labelSelector = "app.kubernetes.io/component=tezam," + + "app.kubernetes.io/managed-by=hive-kubernetes-operator") + ) + public static class TezAm extends HiveServiceDependent { + public TezAm() { + super("tezam"); + } + + @Override + protected void customizeSpec(ServiceBuilder builder, HiveCluster hiveCluster) { + builder.editSpec() + .withClusterIP("None") + .endSpec(); + } + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapConfigMapDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapConfigMapDependent.java deleted file mode 100644 index 2ad6955dadb8..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapConfigMapDependent.java +++ /dev/null @@ -1,68 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import java.util.Map; - -import io.fabric8.kubernetes.api.model.ConfigMap; -import io.fabric8.kubernetes.api.model.ConfigMapBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; -import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the llap-daemon-site.xml ConfigMap for LLAP daemons. */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=llap," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class LlapConfigMapDependent - extends HiveDependentResource { - - public static final String COMPONENT = "llap"; - - public LlapConfigMapDependent() { - super(ConfigMap.class); - } - - @Override - protected ConfigMap desired(HiveCluster hiveCluster, - Context context) { - Map props = - HiveConfigBuilder.getLlapDaemonSite(hiveCluster.getSpec()); - - return new ConfigMapBuilder() - .withNewMetadata() - .withName(resourceName(hiveCluster)) - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) - .endMetadata() - .addToData("llap-daemon-site.xml", - HadoopXmlBuilder.buildXml(props)) - .build(); - } - - /** Returns the ConfigMap resource name for this HiveCluster. */ - public static String resourceName(HiveCluster hiveCluster) { - return hiveCluster.getMetadata().getName() + "-llap-config"; - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapServiceDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapServiceDependent.java deleted file mode 100644 index 108f29347a97..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapServiceDependent.java +++ /dev/null @@ -1,77 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import io.fabric8.kubernetes.api.model.IntOrString; -import io.fabric8.kubernetes.api.model.Service; -import io.fabric8.kubernetes.api.model.ServiceBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** - * Manages the headless Kubernetes Service for LLAP daemons. - * Required by the StatefulSet for stable DNS entries and ZooKeeper registration. - */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=llap," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class LlapServiceDependent - extends HiveDependentResource { - - public LlapServiceDependent() { - super(Service.class); - } - - @Override - protected Service desired(HiveCluster hiveCluster, - Context context) { - return new ServiceBuilder() - .withNewMetadata() - .withName(hiveCluster.getMetadata().getName() + "-llap") - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, - LlapStatefulSetDependent.COMPONENT)) - .endMetadata() - .withNewSpec() - .withClusterIP("None") - .withSelector(Labels.selectorForComponent(hiveCluster, - LlapStatefulSetDependent.COMPONENT)) - .addNewPort() - .withName("management") - .withPort(15004) - .withTargetPort(new IntOrString(15004)) - .endPort() - .addNewPort() - .withName("shuffle") - .withPort(15551) - .withTargetPort(new IntOrString(15551)) - .endPort() - .addNewPort() - .withName("web") - .withPort(15002) - .withTargetPort(new IntOrString(15002)) - .endPort() - .endSpec() - .build(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapStatefulSetDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapStatefulSetDependent.java index c8c044d22ce9..171766e6f341 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapStatefulSetDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapStatefulSetDependent.java @@ -34,6 +34,7 @@ import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; import org.apache.hive.kubernetes.operator.model.spec.LlapSpec; import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; @@ -56,6 +57,12 @@ public LlapStatefulSetDependent() { super(StatefulSet.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected StatefulSet desired(HiveCluster hiveCluster, Context context) { @@ -81,16 +88,15 @@ protected StatefulSet desired(HiveCluster hiveCluster, envVars.addAll(spec.envVars()); } - List ports = List.of( - new ContainerPortBuilder() - .withName("management").withContainerPort(15004).build(), - new ContainerPortBuilder() - .withName("shuffle").withContainerPort(15551).build(), - new ContainerPortBuilder() - .withName("web").withContainerPort(15002).build(), - new ContainerPortBuilder() - .withName("output").withContainerPort(15003).build() - ); + List ports = new ArrayList<>(); + ports.add(new ContainerPortBuilder() + .withName("management").withContainerPort(15004).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("shuffle").withContainerPort(15551).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("web").withContainerPort(15002).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("output").withContainerPort(15003).withProtocol("TCP").build()); Probe readinessProbe = buildTcpProbe(15004, llap.readinessProbe(), 15, 10, 3); @@ -106,8 +112,8 @@ protected StatefulSet desired(HiveCluster hiveCluster, List volumes = new ArrayList<>(); volumes.add(buildProjectedConfigVolume("llap-config", - LlapConfigMapDependent.resourceName(hiveCluster), - HadoopConfigMapDependent.resourceName(hiveCluster))); + HiveConfigMapDependent.Llap.resourceName(hiveCluster), + HiveConfigMapDependent.Hadoop.resourceName(hiveCluster))); List initContainers = new ArrayList<>(); addExternalJars(spec.image(), spec.externalJars(), @@ -115,11 +121,25 @@ protected StatefulSet desired(HiveCluster hiveCluster, replaceConfMountWithSubPaths(volumeMounts, "llap-config", "llap-daemon-site.xml", "core-site.xml"); + // Add Prometheus JMX Exporter when autoscaling is enabled + AutoscalingSpec autoscaling = llap.autoscaling(); + if (autoscaling.isEnabled()) { + addJmxExporter(spec.image(), COMPONENT, + initContainers, volumeMounts, volumes, envVars, ports); + } + // Pre-compute config hash for the pod template annotation. String configHash = sha256( HadoopXmlBuilder.buildXml(HiveConfigBuilder.getLlapDaemonSite(spec)), HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHadoopCoreSite(spec))); + // When autoscaling is enabled, preserve current replica count (KEDA/HPA manages it). + AutoscalingSpec llapAutoscaling = llap.autoscaling(); + int initialReplicas = llapAutoscaling != null && llapAutoscaling.minReplicas() == 0 + ? 0 : llap.replicas(); + Integer replicas = resolveReplicaCount( + hiveCluster, context, llapAutoscaling, llap.replicas(), initialReplicas); + StatefulSet statefulSet = new StatefulSetBuilder() .withNewMetadata() .withName(resourceName(hiveCluster)) @@ -127,7 +147,7 @@ protected StatefulSet desired(HiveCluster hiveCluster, .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) .endMetadata() .withNewSpec() - .withReplicas(llap.replicas()) + .withReplicas(replicas) .withServiceName(headlessServiceName) .withNewSelector() .withMatchLabels(selectorLabels) @@ -159,20 +179,26 @@ protected StatefulSet desired(HiveCluster hiveCluster, applySpreadAffinityIfAbsent( statefulSet.getSpec().getTemplate().getSpec(), selectorLabels); - if (spec.volumes() != null) { - statefulSet.getSpec().getTemplate().getSpec().getVolumes().addAll(spec.volumes()); - } - if (spec.volumeMounts() != null) { - statefulSet.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(spec.volumeMounts()); - } - if (llap.extraVolumes() != null) { - statefulSet.getSpec().getTemplate().getSpec().getVolumes().addAll(llap.extraVolumes()); - } - if (llap.extraVolumeMounts() != null) { - statefulSet.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(llap.extraVolumeMounts()); + // Graceful scale-down: poll JMX Exporter (port 9404) until all executors idle. + if (autoscaling.isEnabled()) { + String preStopScript = buildDualMetricDrainScript( + "Waiting for LLAP executors to become idle", + "hadoop_llapdaemon_executornumexecutorsavailable{", "AVAILABLE", + "hadoop_llapdaemon_executornumexecutors{", "TOTAL", + "LLAP executor metrics not found. JMX Exporter may not be configured.", + "All executors idle. Shutting down.", + "Executors available=$AVAILABLE / total=$TOTAL \u2014 waiting...", + 10, 6); + applyAutoscalingLifecycle( + statefulSet.getSpec().getTemplate().getSpec(), + statefulSet.getSpec().getTemplate().getMetadata(), + preStopScript, autoscaling.gracePeriodSeconds(), + autoscaling.metricsScrapeIntervalSeconds()); } + + appendUserVolumes(statefulSet.getSpec().getTemplate().getSpec(), + spec.volumes(), spec.volumeMounts(), + llap.extraVolumes(), llap.extraVolumeMounts()); return statefulSet; } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreConfigMapDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreConfigMapDependent.java deleted file mode 100644 index b429335f76e0..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreConfigMapDependent.java +++ /dev/null @@ -1,67 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import java.util.Map; - -import io.fabric8.kubernetes.api.model.ConfigMap; -import io.fabric8.kubernetes.api.model.ConfigMapBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; -import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the metastore-site.xml ConfigMap for the Hive Metastore. */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=metastore," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class MetastoreConfigMapDependent - extends HiveDependentResource { - - public static final String COMPONENT = "metastore"; - - public MetastoreConfigMapDependent() { - super(ConfigMap.class); - } - - @Override - protected ConfigMap desired(HiveCluster hiveCluster, - Context context) { - Map props = - HiveConfigBuilder.getMetastoreSite(hiveCluster.getSpec()); - - return new ConfigMapBuilder() - .withNewMetadata() - .withName(resourceName(hiveCluster)) - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) - .endMetadata() - .addToData("metastore-site.xml", HadoopXmlBuilder.buildXml(props)) - .build(); - } - - /** Returns the ConfigMap resource name for this HiveCluster. */ - public static String resourceName(HiveCluster hiveCluster) { - return hiveCluster.getMetadata().getName() + "-metastore-config"; - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreDeploymentDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreDeploymentDependent.java index 46a95426c969..4b39d0341a51 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreDeploymentDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreDeploymentDependent.java @@ -36,6 +36,7 @@ import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; import org.apache.hive.kubernetes.operator.model.spec.DatabaseConfig; import org.apache.hive.kubernetes.operator.util.ConfigUtils; import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; @@ -56,6 +57,12 @@ public MetastoreDeploymentDependent() { super(Deployment.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected Deployment desired(HiveCluster hiveCluster, Context context) { @@ -77,12 +84,11 @@ protected Deployment desired(HiveCluster hiveCluster, ConfigUtils.METASTORE_THRIFT_PORT_KEY, ConfigUtils.METASTORE_THRIFT_PORT_HIVE_KEY, ConfigUtils.METASTORE_THRIFT_PORT_DEFAULT); - List ports = List.of( - new ContainerPortBuilder() - .withName("thrift").withContainerPort(thriftPort).build(), - new ContainerPortBuilder() - .withName("rest").withContainerPort(9001).build() - ); + List ports = new ArrayList<>(); + ports.add(new ContainerPortBuilder() + .withName("thrift").withContainerPort(thriftPort).withProtocol("TCP").build()); + ports.add(new ContainerPortBuilder() + .withName("rest").withContainerPort(9001).withProtocol("TCP").build()); Probe readinessProbe = buildTcpProbe(thriftPort, spec.metastore().readinessProbe(), 15, 10, 3); Probe livenessProbe = buildTcpProbe(thriftPort, spec.metastore().livenessProbe(), 60, 30, 5); @@ -107,6 +113,13 @@ protected Deployment desired(HiveCluster hiveCluster, replaceConfMountWithSubPaths(volumeMounts, "hive-config", "metastore-site.xml", "core-site.xml"); + // Add Prometheus JMX Exporter when autoscaling is enabled + AutoscalingSpec autoscaling = spec.metastore().autoscaling(); + if (autoscaling.isEnabled()) { + addJmxExporter(spec.image(), COMPONENT, + initContainers, volumeMounts, volumes, envVars, ports); + } + // Pre-compute config hash for the pod template annotation. // This ensures the Deployment is created with the correct hash // from the start (single ReplicaSet) and triggers rolling @@ -115,6 +128,13 @@ protected Deployment desired(HiveCluster hiveCluster, HadoopXmlBuilder.buildXml(HiveConfigBuilder.getMetastoreSite(spec)), HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHadoopCoreSite(spec))); + // When autoscaling is enabled, preserve current replica count (KEDA/HPA manages it). + AutoscalingSpec msAutoscaling = spec.metastore().autoscaling(); + int initialReplicas = msAutoscaling != null + ? Math.max(1, msAutoscaling.minReplicas()) : spec.metastore().replicas(); + Integer replicas = resolveReplicaCount( + hiveCluster, context, msAutoscaling, spec.metastore().replicas(), initialReplicas); + Deployment deployment = new DeploymentBuilder() .withNewMetadata() .withName(resourceName(hiveCluster)) @@ -122,7 +142,7 @@ protected Deployment desired(HiveCluster hiveCluster, .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) .endMetadata() .withNewSpec() - .withReplicas(spec.metastore().replicas()) + .withReplicas(replicas) .withNewSelector() .withMatchLabels(selectorLabels) .endSelector() @@ -155,20 +175,25 @@ protected Deployment desired(HiveCluster hiveCluster, applySpreadAffinityIfAbsent( deployment.getSpec().getTemplate().getSpec(), selectorLabels); - if (spec.volumes() != null) { - deployment.getSpec().getTemplate().getSpec().getVolumes().addAll(spec.volumes()); - } - if (spec.volumeMounts() != null) { - deployment.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(spec.volumeMounts()); - } - if (spec.metastore().extraVolumes() != null) { - deployment.getSpec().getTemplate().getSpec().getVolumes().addAll(spec.metastore().extraVolumes()); - } - if (spec.metastore().extraVolumeMounts() != null) { - deployment.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(spec.metastore().extraVolumeMounts()); + // HMS uses HTTP transport mode — connections are stateless, so no session + // drain is needed. The preStop hook simply sends SIGTERM directly to the + // JVM (the shell entrypoint doesn't forward signals from K8s). + if (autoscaling.isEnabled()) { + String preStopScript = String.join("\n", + "#!/bin/bash", + "echo '[preStop] Sending SIGTERM to Metastore Java process...'", + "kill $(pgrep -f 'java.*org.apache') 2>/dev/null", + "exit 0"); + applyAutoscalingLifecycle( + deployment.getSpec().getTemplate().getSpec(), + deployment.getSpec().getTemplate().getMetadata(), + preStopScript, autoscaling.gracePeriodSeconds(), + autoscaling.metricsScrapeIntervalSeconds()); } + + appendUserVolumes(deployment.getSpec().getTemplate().getSpec(), + spec.volumes(), spec.volumeMounts(), + spec.metastore().extraVolumes(), spec.metastore().extraVolumeMounts()); return deployment; } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreServiceDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreServiceDependent.java deleted file mode 100644 index 2620a24e01d7..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/MetastoreServiceDependent.java +++ /dev/null @@ -1,75 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import io.fabric8.kubernetes.api.model.IntOrString; -import io.fabric8.kubernetes.api.model.Service; -import io.fabric8.kubernetes.api.model.ServiceBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.ConfigUtils; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** Manages the Kubernetes Service for the Hive Metastore (Thrift + REST ports). */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=metastore," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class MetastoreServiceDependent - extends HiveDependentResource { - - public MetastoreServiceDependent() { - super(Service.class); - } - - @Override - protected Service desired(HiveCluster hiveCluster, - Context context) { - int thriftPort = ConfigUtils.getInt( - hiveCluster.getSpec().metastore().configOverrides(), - ConfigUtils.METASTORE_THRIFT_PORT_KEY, - ConfigUtils.METASTORE_THRIFT_PORT_HIVE_KEY, - ConfigUtils.METASTORE_THRIFT_PORT_DEFAULT); - return new ServiceBuilder() - .withNewMetadata() - .withName(hiveCluster.getMetadata().getName() + "-metastore") - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, - MetastoreDeploymentDependent.COMPONENT)) - .endMetadata() - .withNewSpec() - .withType("ClusterIP") - .withSelector(Labels.selectorForComponent(hiveCluster, - MetastoreDeploymentDependent.COMPONENT)) - .addNewPort() - .withName("thrift") - .withPort(thriftPort) - .withTargetPort(new IntOrString(thriftPort)) - .endPort() - .addNewPort() - .withName("rest") - .withPort(9001) - .withTargetPort(new IntOrString(9001)) - .endPort() - .endSpec() - .build(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/SchemaInitJobDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/SchemaInitJobDependent.java index a23c0c477436..25d0eb39a0f9 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/SchemaInitJobDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/SchemaInitJobDependent.java @@ -53,6 +53,12 @@ public SchemaInitJobDependent() { super(Job.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected Job desired(HiveCluster hiveCluster, Context context) { diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/ScratchPvcDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/ScratchPvcDependent.java index 6a645f043574..230ba47edd13 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/ScratchPvcDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/ScratchPvcDependent.java @@ -55,6 +55,12 @@ public ScratchPvcDependent() { super(PersistentVolumeClaim.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected PersistentVolumeClaim desired(HiveCluster hiveCluster, Context context) { diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmServiceDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmServiceDependent.java deleted file mode 100644 index 781685286038..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmServiceDependent.java +++ /dev/null @@ -1,62 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent; - -import io.fabric8.kubernetes.api.model.Service; -import io.fabric8.kubernetes.api.model.ServiceBuilder; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.config.informer.Informer; -import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; -import org.apache.hive.kubernetes.operator.model.HiveCluster; -import org.apache.hive.kubernetes.operator.util.Labels; - -/** - * Manages the headless Kubernetes Service for Tez Application Master. - * Required by the StatefulSet for stable DNS entries so that - * HiveServer2 can resolve TezAM pod hostnames for RPC communication. - */ -@KubernetesDependent( - informer = @Informer(labelSelector = "app.kubernetes.io/component=tezam," - + "app.kubernetes.io/managed-by=hive-kubernetes-operator") -) -public class TezAmServiceDependent - extends HiveDependentResource { - - public TezAmServiceDependent() { - super(Service.class); - } - - @Override - protected Service desired(HiveCluster hiveCluster, - Context context) { - return new ServiceBuilder() - .withNewMetadata() - .withName(hiveCluster.getMetadata().getName() + "-tezam") - .withNamespace(hiveCluster.getMetadata().getNamespace()) - .withLabels(Labels.forComponent(hiveCluster, - TezAmStatefulSetDependent.COMPONENT)) - .endMetadata() - .withNewSpec() - .withClusterIP("None") - .withSelector(Labels.selectorForComponent(hiveCluster, - TezAmStatefulSetDependent.COMPONENT)) - .endSpec() - .build(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmStatefulSetDependent.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmStatefulSetDependent.java index 5cc7a3f800f3..ac83286e346c 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmStatefulSetDependent.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/TezAmStatefulSetDependent.java @@ -23,6 +23,7 @@ import java.util.Map; import io.fabric8.kubernetes.api.model.Container; +import io.fabric8.kubernetes.api.model.ContainerPort; import io.fabric8.kubernetes.api.model.EnvVar; import io.fabric8.kubernetes.api.model.apps.StatefulSet; import io.fabric8.kubernetes.api.model.apps.StatefulSetBuilder; @@ -31,6 +32,7 @@ import io.javaoperatorsdk.operator.processing.dependent.kubernetes.KubernetesDependent; import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.model.HiveClusterSpec; +import org.apache.hive.kubernetes.operator.model.spec.AutoscalingSpec; import org.apache.hive.kubernetes.operator.model.spec.TezAmSpec; import org.apache.hive.kubernetes.operator.util.HadoopXmlBuilder; import org.apache.hive.kubernetes.operator.util.HiveConfigBuilder; @@ -57,6 +59,12 @@ public TezAmStatefulSetDependent() { super(StatefulSet.class); } + @Override + protected String getSecondaryResourceName(HiveCluster primary, + Context context) { + return resourceName(primary); + } + @Override protected StatefulSet desired(HiveCluster hiveCluster, Context context) { @@ -98,8 +106,8 @@ protected StatefulSet desired(HiveCluster hiveCluster, List volumes = new ArrayList<>(); volumes.add(buildProjectedConfigVolume("hive-config", - HiveServer2ConfigMapDependent.resourceName(hiveCluster), - HadoopConfigMapDependent.resourceName(hiveCluster))); + HiveConfigMapDependent.HiveServer2.resourceName(hiveCluster), + HiveConfigMapDependent.Hadoop.resourceName(hiveCluster))); volumes.add(new io.fabric8.kubernetes.api.model.VolumeBuilder() .withName("scratch") .withNewPersistentVolumeClaim() @@ -107,12 +115,20 @@ protected StatefulSet desired(HiveCluster hiveCluster, .endPersistentVolumeClaim() .build()); + List ports = new ArrayList<>(); List initContainers = new ArrayList<>(); addExternalJars(spec.image(), spec.externalJars(), initContainers, volumeMounts, volumes, envVars); replaceConfMountWithSubPaths(volumeMounts, "hive-config", "hive-site.xml", "tez-site.xml", "core-site.xml"); + // Add Prometheus JMX Exporter when autoscaling is enabled + AutoscalingSpec autoscaling = tezAm.autoscaling(); + if (autoscaling.isEnabled()) { + addJmxExporter(spec.image(), COMPONENT, + initContainers, volumeMounts, volumes, envVars, ports); + } + // Pre-compute config hash for the pod template annotation. // TezAM uses the same ConfigMaps as HS2 (hive-site.xml + tez-site.xml + core-site.xml). String configHash = sha256( @@ -120,6 +136,13 @@ protected StatefulSet desired(HiveCluster hiveCluster, HadoopXmlBuilder.buildXml(HiveConfigBuilder.getTezSite(spec)), HadoopXmlBuilder.buildXml(HiveConfigBuilder.getHadoopCoreSite(spec))); + // When autoscaling is enabled, preserve current replica count (KEDA/HPA manages it). + AutoscalingSpec tezAmAutoscaling = tezAm.autoscaling(); + int initialReplicas = tezAmAutoscaling != null && tezAmAutoscaling.minReplicas() == 0 + ? 0 : tezAm.replicas(); + Integer replicas = resolveReplicaCount( + hiveCluster, context, tezAmAutoscaling, tezAm.replicas(), initialReplicas); + StatefulSet statefulSet = new StatefulSetBuilder() .withNewMetadata() .withName(resourceName(hiveCluster)) @@ -127,7 +150,7 @@ protected StatefulSet desired(HiveCluster hiveCluster, .withLabels(Labels.forComponent(hiveCluster, COMPONENT)) .endMetadata() .withNewSpec() - .withReplicas(tezAm.replicas()) + .withReplicas(replicas) .withServiceName(headlessServiceName) .withNewSelector() .withMatchLabels(selectorLabels) @@ -145,6 +168,7 @@ protected StatefulSet desired(HiveCluster hiveCluster, .withImage(spec.image()) .withImagePullPolicy(spec.imagePullPolicy()) .withEnv(envVars) + .withPorts(ports) .withResources(buildResources(tezAm.resources())) .withVolumeMounts(volumeMounts) .endContainer() @@ -157,20 +181,23 @@ protected StatefulSet desired(HiveCluster hiveCluster, applySpreadAffinityIfAbsent( statefulSet.getSpec().getTemplate().getSpec(), selectorLabels); - if (spec.volumes() != null) { - statefulSet.getSpec().getTemplate().getSpec().getVolumes().addAll(spec.volumes()); - } - if (spec.volumeMounts() != null) { - statefulSet.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(spec.volumeMounts()); - } - if (tezAm.extraVolumes() != null) { - statefulSet.getSpec().getTemplate().getSpec().getVolumes().addAll(tezAm.extraVolumes()); - } - if (tezAm.extraVolumeMounts() != null) { - statefulSet.getSpec().getTemplate().getSpec().getContainers().get(0).getVolumeMounts() - .addAll(tezAm.extraVolumeMounts()); + // Graceful scale-down: poll JMX Exporter (port 9404) for DAGsRunning to reach 0. + if (autoscaling.isEnabled()) { + String preStopScript = buildDrainScript( + "Waiting for active DAGs to complete", + "tez_am_dagsrunning", "DAGS", + "No active DAGs. Safe to terminate Tez AM.", + 10, 6, null); + applyAutoscalingLifecycle( + statefulSet.getSpec().getTemplate().getSpec(), + statefulSet.getSpec().getTemplate().getMetadata(), + preStopScript, autoscaling.gracePeriodSeconds(), + autoscaling.metricsScrapeIntervalSeconds()); } + + appendUserVolumes(statefulSet.getSpec().getTemplate().getSpec(), + spec.volumes(), spec.volumeMounts(), + tezAm.extraVolumes(), tezAm.extraVolumeMounts()); return statefulSet; } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/HiveServer2Precondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/HiveServer2Precondition.java deleted file mode 100644 index a36002dbf886..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/HiveServer2Precondition.java +++ /dev/null @@ -1,53 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.apps.Deployment; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Precondition for HiveServer2 Deployment. - * If Metastore is external, proceed immediately. - * If managed, wait for Metastore pods to be ready. - */ -public class HiveServer2Precondition implements Condition { - - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - - if (!primary.getSpec().metastore().isEnabled()) { - return true; - } - - int desiredReplicas = primary.getSpec().metastore().replicas(); - return context.getSecondaryResources(Deployment.class).stream() - .filter(d -> d.getMetadata().getName().equals(primary.getMetadata().getName() + "-metastore")) - .findFirst() - .map(deployment -> deployment.getStatus() != null - && deployment.getStatus().getReadyReplicas() != null - && deployment.getStatus().getReadyReplicas() >= desiredReplicas) - .orElse(false); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/LlapEnabledCondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/LlapEnabledCondition.java deleted file mode 100644 index a113c50efbff..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/LlapEnabledCondition.java +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.HasMetadata; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Activation condition for LLAP dependent resources. - * Returns true only when spec.llap.enabled is true. - */ -public class LlapEnabledCondition - implements Condition { - - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - return primary.getSpec().llap().isEnabled(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreEnabledCondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreEnabledCondition.java deleted file mode 100644 index b1cb4139ac96..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreEnabledCondition.java +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.HasMetadata; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Activation condition for Metastore dependent resources. - * Returns true only when spec.metastore.enabled is true. - */ -public class MetastoreEnabledCondition implements Condition { - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - return primary.getSpec().metastore().isEnabled(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreReadyCondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreReadyCondition.java deleted file mode 100644 index 7b3169f32043..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/MetastoreReadyCondition.java +++ /dev/null @@ -1,49 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.apps.Deployment; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Ready condition that checks whether the Metastore Deployment has the - * desired number of ready replicas. Used to gate HiveServer2 Deployment. - */ -public class MetastoreReadyCondition - implements Condition { - - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - if (!primary.getSpec().metastore().isEnabled()) { - return true; - } - int desiredReplicas = primary.getSpec().metastore().replicas(); - return dependentResource.getSecondaryResource(primary, context) - .map(deployment -> deployment.getStatus() != null - && deployment.getStatus().getReadyReplicas() != null - && deployment.getStatus().getReadyReplicas() >= desiredReplicas) - .orElse(false); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/SchemaJobCompletedCondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/SchemaJobCompletedCondition.java deleted file mode 100644 index 1b0b44318596..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/SchemaJobCompletedCondition.java +++ /dev/null @@ -1,48 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.batch.v1.Job; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Ready condition that checks whether the schema initialization Job - * has completed successfully. Used to gate Metastore Deployment creation. - */ -public class SchemaJobCompletedCondition - implements Condition { - - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - if (!primary.getSpec().metastore().isEnabled()) { - return true; - } - return dependentResource.getSecondaryResource(primary, context) - .map(job -> job.getStatus() != null - && job.getStatus().getSucceeded() != null - && job.getStatus().getSucceeded() >= 1) - .orElse(false); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/TezAmEnabledCondition.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/TezAmEnabledCondition.java deleted file mode 100644 index 85ae7e45dbdb..000000000000 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/condition/TezAmEnabledCondition.java +++ /dev/null @@ -1,41 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hive.kubernetes.operator.dependent.condition; - -import io.fabric8.kubernetes.api.model.HasMetadata; -import io.javaoperatorsdk.operator.api.reconciler.Context; -import io.javaoperatorsdk.operator.api.reconciler.dependent.DependentResource; -import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; -import org.apache.hive.kubernetes.operator.model.HiveCluster; - -/** - * Activation condition for Tez AM dependent resources. - * Returns true only when spec.tezAm.enabled is true. - */ -public class TezAmEnabledCondition - implements Condition { - - @Override - public boolean isMet( - DependentResource dependentResource, - HiveCluster primary, - Context context) { - return primary.getSpec().tezAm().isEnabled(); - } -} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/HiveClusterSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/HiveClusterSpec.java index 40dd8a771203..1897582bd18e 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/HiveClusterSpec.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/HiveClusterSpec.java @@ -78,6 +78,14 @@ public record HiveClusterSpec( public HiveClusterSpec { Objects.requireNonNull(zookeeper, "zookeeper must be provided in the HiveCluster spec"); + metastore = metastore != null ? metastore : new MetastoreSpec( + 1, null, null, null, null, null, null, true, null, null, null, null); + hiveServer2 = hiveServer2 != null ? hiveServer2 : new HiveServer2Spec( + 1, null, null, null, null, null, null, null, null, null); + llap = llap != null ? llap : new LlapSpec( + 1, null, null, null, null, true, null, null, null, null, null); + tezAm = tezAm != null ? tezAm : new TezAmSpec( + 1, null, null, null, null, true, null, null, null); envVars = envVars != null ? envVars : List.of(); externalJars = externalJars != null ? externalJars : List.of(); volumes = volumes != null ? volumes : List.of(); diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/AutoscalingSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/AutoscalingSpec.java new file mode 100644 index 000000000000..ab02949d7f25 --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/AutoscalingSpec.java @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.model.spec; + +import com.fasterxml.jackson.annotation.JsonPropertyDescription; +import io.fabric8.generator.annotation.Default; + +/** Autoscaling configuration for a Hive component. Uses KEDA ScaledObjects for metric-based scaling. */ +public record AutoscalingSpec( + @JsonPropertyDescription("Whether autoscaling is enabled for this component") + @Default("false") + Boolean enabled, + @JsonPropertyDescription("Minimum number of replicas (floor for scale-down). " + + "Set to 0 for scale-to-zero (HS2 requires KEDA HTTP Add-on for wake-from-zero)") + @Default("0") + Integer minReplicas, + @JsonPropertyDescription("Threshold that triggers scale-up (component-specific: " + + "sessions for HS2, connections for HMS, queue depth for LLAP, " + + "pending tasks for TezAM)") + @Default("80") + Integer scaleUpThreshold, + @JsonPropertyDescription("Threshold that triggers scale-down for Prometheus-based metrics") + @Default("20") + Integer scaleDownThreshold, + @JsonPropertyDescription("Target CPU average value for scaling (e.g., '1500m' or '1'). " + + "If omitted, CPU scaling is disabled.") + String targetCpuValue, + @JsonPropertyDescription("CPU average value below which the trigger is inactive. " + + "Required if targetCpuValue is set.") + String activationCpuValue, + @JsonPropertyDescription("Cooldown period in seconds after all KEDA triggers are inactive " + + "before scaling from 1 to 0 (scale-to-zero delay)") + @Default("600") + Integer cooldownSeconds, + @JsonPropertyDescription("Stabilization window in seconds for scale-up decisions. " + + "HPA picks the highest recommendation within this window to prevent flapping.") + @Default("60") + Integer scaleUpStabilizationSeconds, + @JsonPropertyDescription("Stabilization window in seconds for scale-down decisions. " + + "HPA picks the highest recommendation within this window to prevent premature scale-down.") + @Default("300") + Integer scaleDownStabilizationSeconds, + @JsonPropertyDescription("Maximum time in seconds to wait for graceful drain " + + "during scale-down before the pod is forcibly terminated. " + + "The pod terminates immediately once sessions/connections drain to 0; " + + "this value is only the upper safety cap.") + @Default("3600") + Integer gracePeriodSeconds, + @JsonPropertyDescription("Prometheus scrape interval in seconds for this component's metrics. " + + "Lower values make autoscaling react faster but increase Prometheus load.") + @Default("10") + Integer metricsScrapeIntervalSeconds) { + + public AutoscalingSpec { + enabled = enabled != null ? enabled : false; + minReplicas = minReplicas != null ? minReplicas : 0; + scaleUpThreshold = scaleUpThreshold != null ? scaleUpThreshold : 80; + scaleDownThreshold = scaleDownThreshold != null ? scaleDownThreshold : 20; + cooldownSeconds = cooldownSeconds != null ? cooldownSeconds : 600; + scaleUpStabilizationSeconds = scaleUpStabilizationSeconds != null ? scaleUpStabilizationSeconds : 60; + scaleDownStabilizationSeconds = scaleDownStabilizationSeconds != null ? scaleDownStabilizationSeconds : 300; + gracePeriodSeconds = gracePeriodSeconds != null ? gracePeriodSeconds : 3600; + metricsScrapeIntervalSeconds = metricsScrapeIntervalSeconds != null ? metricsScrapeIntervalSeconds : 10; + } + + public boolean isEnabled() { + return enabled; + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/HiveServer2Spec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/HiveServer2Spec.java index 78164fb32de6..b81ad83b41b7 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/HiveServer2Spec.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/HiveServer2Spec.java @@ -51,7 +51,9 @@ public record HiveServer2Spec( @JsonPropertyDescription("Readiness probe configuration") ProbeSpec readinessProbe, @JsonPropertyDescription("Liveness probe configuration") - ProbeSpec livenessProbe) { + ProbeSpec livenessProbe, + @JsonPropertyDescription("Autoscaling configuration (requires KEDA installed in the cluster)") + AutoscalingSpec autoscaling) { public HiveServer2Spec { replicas = replicas != null ? replicas : 1; @@ -59,5 +61,7 @@ public record HiveServer2Spec( extraVolumes = extraVolumes != null ? extraVolumes : List.of(); extraVolumeMounts = extraVolumeMounts != null ? extraVolumeMounts : List.of(); externalJars = externalJars != null ? externalJars : List.of(); + autoscaling = autoscaling != null ? autoscaling : new AutoscalingSpec( + false, 0, 80, 20, null, null, 600, 60, 300, 300, 10); } } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/LlapSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/LlapSpec.java index 17ff5967ff9a..c24bac5a1116 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/LlapSpec.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/LlapSpec.java @@ -55,7 +55,9 @@ public record LlapSpec( @JsonPropertyDescription("LLAP service hosts identifier for ZooKeeper registration") String serviceHosts, @JsonPropertyDescription("Readiness probe configuration") - ProbeSpec readinessProbe) { + ProbeSpec readinessProbe, + @JsonPropertyDescription("Autoscaling configuration (requires KEDA installed in the cluster)") + AutoscalingSpec autoscaling) { public LlapSpec { replicas = replicas != null ? replicas : 1; @@ -65,6 +67,8 @@ public record LlapSpec( serviceHosts = serviceHosts != null ? serviceHosts : "@llap0"; extraVolumes = extraVolumes != null ? extraVolumes : List.of(); extraVolumeMounts = extraVolumeMounts != null ? extraVolumeMounts : List.of(); + autoscaling = autoscaling != null ? autoscaling : new AutoscalingSpec( + false, 0, 1, 0, null, null, 900, 60, 300, 600, 10); } public boolean isEnabled() { diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/MetastoreSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/MetastoreSpec.java index 307c17221ee7..61c2cf0635e2 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/MetastoreSpec.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/MetastoreSpec.java @@ -56,7 +56,9 @@ public record MetastoreSpec( @JsonPropertyDescription("Readiness probe configuration") ProbeSpec readinessProbe, @JsonPropertyDescription("Liveness probe configuration") - ProbeSpec livenessProbe) { + ProbeSpec livenessProbe, + @JsonPropertyDescription("Autoscaling configuration (requires KEDA installed in the cluster)") + AutoscalingSpec autoscaling) { public MetastoreSpec { replicas = replicas != null ? replicas : 1; @@ -66,6 +68,8 @@ public record MetastoreSpec( enabled = enabled != null ? enabled : true; extraVolumes = extraVolumes != null ? extraVolumes : List.of(); extraVolumeMounts = extraVolumeMounts != null ? extraVolumeMounts : List.of(); + autoscaling = autoscaling != null ? autoscaling : new AutoscalingSpec( + false, 1, 75, 30, null, null, 300, 60, 300, 60, 10); } public boolean isEnabled() { diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/TezAmSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/TezAmSpec.java index a0494c2c5e73..716e4025c50d 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/TezAmSpec.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/model/spec/TezAmSpec.java @@ -52,7 +52,9 @@ public record TezAmSpec( String scratchStorageSize, @JsonPropertyDescription("StorageClass for the shared scratch PVC. " + "Must support ReadWriteMany access. If null, uses cluster default.") - String scratchStorageClassName) { + String scratchStorageClassName, + @JsonPropertyDescription("Autoscaling configuration (requires KEDA installed in the cluster)") + AutoscalingSpec autoscaling) { public TezAmSpec { replicas = replicas != null ? replicas : 1; @@ -60,6 +62,8 @@ public record TezAmSpec( scratchStorageSize = scratchStorageSize != null ? scratchStorageSize : "1Gi"; extraVolumes = extraVolumes != null ? extraVolumes : List.of(); extraVolumeMounts = extraVolumeMounts != null ? extraVolumeMounts : List.of(); + autoscaling = autoscaling != null ? autoscaling : new AutoscalingSpec( + false, 0, 5, 10, null, null, 600, 60, 300, 120, 10); } public boolean isEnabled() { diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java index 20332cb4127c..71453ef0335b 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java @@ -35,28 +35,6 @@ import io.javaoperatorsdk.operator.api.reconciler.ErrorStatusUpdateControl; import io.javaoperatorsdk.operator.api.reconciler.Reconciler; import io.javaoperatorsdk.operator.api.reconciler.UpdateControl; -import io.javaoperatorsdk.operator.api.reconciler.Workflow; -import io.javaoperatorsdk.operator.api.reconciler.dependent.Dependent; -import org.apache.hive.kubernetes.operator.dependent.HadoopConfigMapDependent; -import org.apache.hive.kubernetes.operator.dependent.HiveServer2ConfigMapDependent; -import org.apache.hive.kubernetes.operator.dependent.HiveServer2DeploymentDependent; -import org.apache.hive.kubernetes.operator.dependent.HiveServer2ServiceDependent; -import org.apache.hive.kubernetes.operator.dependent.LlapConfigMapDependent; -import org.apache.hive.kubernetes.operator.dependent.LlapServiceDependent; -import org.apache.hive.kubernetes.operator.dependent.LlapStatefulSetDependent; -import org.apache.hive.kubernetes.operator.dependent.MetastoreConfigMapDependent; -import org.apache.hive.kubernetes.operator.dependent.MetastoreDeploymentDependent; -import org.apache.hive.kubernetes.operator.dependent.MetastoreServiceDependent; -import org.apache.hive.kubernetes.operator.dependent.SchemaInitJobDependent; -import org.apache.hive.kubernetes.operator.dependent.ScratchPvcDependent; -import org.apache.hive.kubernetes.operator.dependent.TezAmServiceDependent; -import org.apache.hive.kubernetes.operator.dependent.TezAmStatefulSetDependent; -import org.apache.hive.kubernetes.operator.dependent.condition.HiveServer2Precondition; -import org.apache.hive.kubernetes.operator.dependent.condition.LlapEnabledCondition; -import org.apache.hive.kubernetes.operator.dependent.condition.MetastoreEnabledCondition; -import org.apache.hive.kubernetes.operator.dependent.condition.MetastoreReadyCondition; -import org.apache.hive.kubernetes.operator.dependent.condition.SchemaJobCompletedCondition; -import org.apache.hive.kubernetes.operator.dependent.condition.TezAmEnabledCondition; import org.apache.hive.kubernetes.operator.model.HiveCluster; import org.apache.hive.kubernetes.operator.model.HiveClusterStatus; import org.apache.hive.kubernetes.operator.model.status.ComponentStatus; @@ -68,54 +46,21 @@ * Orchestrates all dependent resources with proper dependency ordering. */ @ControllerConfiguration -@Workflow(dependents = { - // --- ConfigMap dependents --- - @Dependent(name = "hadoop-configmap", type = HadoopConfigMapDependent.class), - @Dependent(name = "metastore-configmap", type = MetastoreConfigMapDependent.class, - activationCondition = MetastoreEnabledCondition.class), - @Dependent(name = "hiveserver2-configmap", type = HiveServer2ConfigMapDependent.class), - // --- Job dependents --- - @Dependent(name = "schema-init-job", type = SchemaInitJobDependent.class, dependsOn = {"metastore-configmap", - "hadoop-configmap"}, readyPostcondition = SchemaJobCompletedCondition.class, - activationCondition = MetastoreEnabledCondition.class), - // --- Deployment dependents --- - @Dependent(name = "metastore-deployment", type = MetastoreDeploymentDependent.class, dependsOn = { - "schema-init-job"}, readyPostcondition = MetastoreReadyCondition.class, - activationCondition = MetastoreEnabledCondition.class), - // --- Service dependents --- - @Dependent(name = "metastore-service", type = MetastoreServiceDependent.class, dependsOn = { - "metastore-configmap"}, activationCondition = MetastoreEnabledCondition.class), - @Dependent(name = "hiveserver2-deployment", type = HiveServer2DeploymentDependent.class, dependsOn = { - "hiveserver2-configmap", "hadoop-configmap"}, reconcilePrecondition = HiveServer2Precondition.class), - @Dependent(name = "hiveserver2-service", type = HiveServer2ServiceDependent.class, dependsOn = { - "hiveserver2-configmap"}), - // --- LLAP (conditional) --- - @Dependent(name = "llap-configmap", type = LlapConfigMapDependent.class, - activationCondition = LlapEnabledCondition.class), - @Dependent(name = "llap-statefulset", type = LlapStatefulSetDependent.class, dependsOn = {"llap-configmap", - "hadoop-configmap"}, activationCondition = LlapEnabledCondition.class), - @Dependent(name = "llap-service", type = LlapServiceDependent.class, - activationCondition = LlapEnabledCondition.class), - // --- TezAM (conditional) --- - @Dependent(name = "scratch-pvc", type = ScratchPvcDependent.class, - activationCondition = TezAmEnabledCondition.class), - @Dependent(name = "tezam-service", type = TezAmServiceDependent.class, - activationCondition = TezAmEnabledCondition.class), - @Dependent(name = "tezam-statefulset", type = TezAmStatefulSetDependent.class, dependsOn = {"hiveserver2-configmap", - "hadoop-configmap", "tezam-service", "scratch-pvc"}, activationCondition = TezAmEnabledCondition.class)}) public class HiveClusterReconciler implements Reconciler { private static final Logger LOG = LoggerFactory.getLogger(HiveClusterReconciler.class); @Override public UpdateControl reconcile(HiveCluster resource, Context context) { - LOG.debug("Reconciling HiveCluster: {}/{}", resource.getMetadata().getNamespace(), - resource.getMetadata().getName()); + LOG.debug("Reconciling HiveCluster: {}/{} generation={}", + resource.getMetadata().getNamespace(), + resource.getMetadata().getName(), + resource.getMetadata().getGeneration()); HiveClusterStatus existingStatus = resource.getStatus(); HiveClusterStatus newStatus = buildStatus(resource, context, existingStatus); - if (Objects.equals(existingStatus, newStatus)) { + if (statusEqualsIgnoringTimestamps(existingStatus, newStatus)) { return UpdateControl.noUpdate(); } @@ -126,8 +71,8 @@ public UpdateControl reconcile(HiveCluster resource, Context updateErrorStatus(HiveCluster resource, Context context, Exception e) { - LOG.error("Error reconciling HiveCluster: {}/{}", resource.getMetadata().getNamespace(), - resource.getMetadata().getName(), e); + LOG.error("Error reconciling HiveCluster: {}/{} - {}", resource.getMetadata().getNamespace(), + resource.getMetadata().getName(), e.getMessage(), e); HiveClusterStatus status = resource.getStatus() != null ? resource.getStatus() : new HiveClusterStatus(); @@ -172,9 +117,13 @@ private HiveClusterStatus buildStatus(HiveCluster resource, // Metastore status boolean metastoreReady; if (resource.getSpec().metastore().isEnabled()) { + // When autoscaling, desired = minReplicas (KEDA manages beyond that) + int metastoreDesired = resource.getSpec().metastore().autoscaling().isEnabled() + ? Math.max(1, resource.getSpec().metastore().autoscaling().minReplicas()) + : resource.getSpec().metastore().replicas(); ComponentStatus metastoreStatus = buildComponentStatus(context, Deployment.class, resource.getMetadata().getName() + "-metastore", - resource.getSpec().metastore().replicas(), + metastoreDesired, d -> d.getStatus() != null && d.getStatus().getReadyReplicas() != null ? d.getStatus().getReadyReplicas() : 0); @@ -192,15 +141,17 @@ private HiveClusterStatus buildStatus(HiveCluster resource, existingConditions)); } - // HiveServer2 status + // HiveServer2 status — when scale-to-zero, 0/0 is a valid "ready" state (idle) + int hs2Desired = resource.getSpec().hiveServer2().autoscaling().isEnabled() + ? resource.getSpec().hiveServer2().autoscaling().minReplicas() + : resource.getSpec().hiveServer2().replicas(); ComponentStatus hs2Status = buildComponentStatus(context, Deployment.class, resource.getMetadata().getName() + "-hiveserver2", - resource.getSpec().hiveServer2().replicas(), + hs2Desired, d -> d.getStatus() != null && d.getStatus().getReadyReplicas() != null ? d.getStatus().getReadyReplicas() : 0); status.setHiveServer2(hs2Status); - boolean hs2Ready = - hs2Status.getReadyReplicas() >= hs2Status.getDesiredReplicas() && hs2Status.getDesiredReplicas() > 0; + boolean hs2Ready = hs2Status.getReadyReplicas() >= hs2Status.getDesiredReplicas(); conditions.add(buildCondition("HiveServer2Ready", hs2Ready ? "True" : "False", hs2Ready ? "DeploymentReady" : "DeploymentNotReady", hs2Ready ? "HiveServer2 is ready" : "HiveServer2 not yet ready", @@ -208,17 +159,23 @@ private HiveClusterStatus buildStatus(HiveCluster resource, // LLAP status (optional) if (resource.getSpec().llap().isEnabled()) { + int llapDesired = resource.getSpec().llap().autoscaling().isEnabled() + ? resource.getSpec().llap().autoscaling().minReplicas() + : resource.getSpec().llap().replicas(); status.setLlap(buildComponentStatus(context, StatefulSet.class, resource.getMetadata().getName() + "-llap", - resource.getSpec().llap().replicas(), + llapDesired, s -> s.getStatus() != null && s.getStatus().getReadyReplicas() != null ? s.getStatus().getReadyReplicas() : 0)); } // TezAM status (optional) if (resource.getSpec().tezAm().isEnabled()) { + int tezAmDesired = resource.getSpec().tezAm().autoscaling().isEnabled() + ? resource.getSpec().tezAm().autoscaling().minReplicas() + : resource.getSpec().tezAm().replicas(); status.setTezAm(buildComponentStatus(context, StatefulSet.class, resource.getMetadata().getName() + "-tezam", - resource.getSpec().tezAm().replicas(), + tezAmDesired, s -> s.getStatus() != null && s.getStatus().getReadyReplicas() != null ? s.getStatus().getReadyReplicas() : 0)); } @@ -265,14 +222,86 @@ private Condition buildCondition(String type, String conditionStatus, condition.setReason(reason); condition.setMessage(message); - // Preserve lastTransitionTime when the condition status has not changed + // Preserve lastTransitionTime from ANY existing condition of this type + // (regardless of status) to avoid generating new timestamps on every + // reconcile which would cause an infinite status-patch loop. String preservedTime = existingConditions.stream() - .filter(c -> type.equals(c.getType()) && conditionStatus.equals(c.getStatus())) + .filter(c -> type.equals(c.getType())) .map(Condition::getLastTransitionTime) .findFirst() .orElse(null); - condition.setLastTransitionTime(preservedTime != null ? preservedTime : Instant.now().toString()); + if (preservedTime != null) { + // Only update the timestamp if the status actually changed + String oldStatus = existingConditions.stream() + .filter(c -> type.equals(c.getType())) + .map(Condition::getStatus) + .findFirst() + .orElse(null); + if (conditionStatus.equals(oldStatus)) { + condition.setLastTransitionTime(preservedTime); + } else { + condition.setLastTransitionTime(Instant.now().toString()); + } + } else { + condition.setLastTransitionTime(Instant.now().toString()); + } return condition; } + + /** + * Compares two HiveClusterStatus objects ignoring condition timestamps. + * This prevents infinite reconciliation loops caused by informer cache lag: + * after a status patch, the informer may still have the old status, causing + * the next reconcile to see a "different" status (new timestamp vs old) and + * patch again, perpetuating the loop. + */ + private boolean statusEqualsIgnoringTimestamps(HiveClusterStatus a, HiveClusterStatus b) { + if (a == b) { + return true; + } + if (a == null || b == null) { + return false; + } + if (!Objects.equals(a.getObservedGeneration(), b.getObservedGeneration())) { + return false; + } + if (!Objects.equals(a.getMetastore(), b.getMetastore())) { + return false; + } + if (!Objects.equals(a.getHiveServer2(), b.getHiveServer2())) { + return false; + } + if (!Objects.equals(a.getLlap(), b.getLlap())) { + return false; + } + if (!Objects.equals(a.getTezAm(), b.getTezAm())) { + return false; + } + // Compare conditions by type+status+reason+message, ignoring lastTransitionTime + return conditionsEqualIgnoringTime(a.getConditions(), b.getConditions()); + } + + private boolean conditionsEqualIgnoringTime(List a, List b) { + if (a == b) { + return true; + } + if (a == null || b == null) { + return a == null && b == null; + } + if (a.size() != b.size()) { + return false; + } + for (int i = 0; i < a.size(); i++) { + Condition ca = a.get(i); + Condition cb = b.get(i); + if (!Objects.equals(ca.getType(), cb.getType()) + || !Objects.equals(ca.getStatus(), cb.getStatus()) + || !Objects.equals(ca.getReason(), cb.getReason()) + || !Objects.equals(ca.getMessage(), cb.getMessage())) { + return false; + } + } + return true; + } } diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveWorkflowSpec.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveWorkflowSpec.java new file mode 100644 index 000000000000..46aa53890573 --- /dev/null +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveWorkflowSpec.java @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hive.kubernetes.operator.reconciler; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.Set; + +import io.fabric8.kubernetes.api.model.apps.Deployment; +import io.javaoperatorsdk.operator.api.config.dependent.DependentResourceSpec; +import io.javaoperatorsdk.operator.api.config.workflow.WorkflowSpec; +import io.javaoperatorsdk.operator.processing.dependent.workflow.Condition; +import org.apache.hive.kubernetes.operator.dependent.HiveConfigMapDependent; +import org.apache.hive.kubernetes.operator.dependent.HiveServer2DeploymentDependent; +import org.apache.hive.kubernetes.operator.dependent.HiveServer2InterceptorRouteDependent; +import org.apache.hive.kubernetes.operator.dependent.HivePdbDependent; +import org.apache.hive.kubernetes.operator.dependent.HiveScaledObjectDependent; +import org.apache.hive.kubernetes.operator.dependent.HiveServiceDependent; +import org.apache.hive.kubernetes.operator.dependent.LlapStatefulSetDependent; +import org.apache.hive.kubernetes.operator.dependent.MetastoreDeploymentDependent; +import org.apache.hive.kubernetes.operator.dependent.SchemaInitJobDependent; +import org.apache.hive.kubernetes.operator.dependent.ScratchPvcDependent; +import org.apache.hive.kubernetes.operator.dependent.TezAmStatefulSetDependent; +import org.apache.hive.kubernetes.operator.model.HiveCluster; + +/** + * Programmatic workflow specification for the Hive Kubernetes Operator. + * Replaces the annotation-based {@code @Workflow} on the reconciler with + * explicit {@link DependentResourceSpec} entries and inline lambda conditions. + * This eliminates 12 single-method condition wrapper classes. + */ +public final class HiveWorkflowSpec implements WorkflowSpec { + + private static final Condition METASTORE_ENABLED = + (dr, primary, ctx) -> primary.getSpec().metastore().isEnabled(); + + private static final Condition LLAP_ENABLED = + (dr, primary, ctx) -> primary.getSpec().llap().isEnabled(); + + private static final Condition TEZAM_ENABLED = + (dr, primary, ctx) -> primary.getSpec().tezAm().isEnabled(); + + private static final Condition METASTORE_AUTOSCALING = + (dr, primary, ctx) -> primary.getSpec().metastore().isEnabled() + && primary.getSpec().metastore().autoscaling().isEnabled(); + + private static final Condition LLAP_AUTOSCALING = + (dr, primary, ctx) -> primary.getSpec().llap().isEnabled() + && primary.getSpec().llap().autoscaling().isEnabled(); + + private static final Condition TEZAM_AUTOSCALING = + (dr, primary, ctx) -> primary.getSpec().tezAm().isEnabled() + && primary.getSpec().tezAm().autoscaling().isEnabled(); + + private static final Condition HS2_AUTOSCALING = + (dr, primary, ctx) -> primary.getSpec().hiveServer2().autoscaling().isEnabled(); + + private static final Condition HS2_SCALE_TO_ZERO = + (dr, primary, ctx) -> primary.getSpec().hiveServer2().autoscaling().isEnabled() + && primary.getSpec().hiveServer2().autoscaling().minReplicas() == 0; + + // SPECS must be declared AFTER all conditions to avoid static init order issues. + private static final List SPECS = buildSpecs(); + + @SuppressWarnings({"rawtypes", "unchecked"}) + private static List buildSpecs() { + List specs = new ArrayList<>(); + + // --- ConfigMap dependents --- + specs.add(new DependentResourceSpec( + HiveConfigMapDependent.Hadoop.class, "hadoop-configmap", + Set.of(), null, null, null, null, null)); + + specs.add(new DependentResourceSpec( + HiveConfigMapDependent.Metastore.class, "metastore-configmap", + Set.of(), null, null, null, METASTORE_ENABLED, null)); + + specs.add(new DependentResourceSpec( + HiveConfigMapDependent.HiveServer2.class, "hiveserver2-configmap", + Set.of(), null, null, null, null, null)); + + // --- Job dependents --- + specs.add(new DependentResourceSpec( + SchemaInitJobDependent.class, "schema-init-job", + Set.of("metastore-configmap", "hadoop-configmap"), + schemaJobCompleted(), null, null, METASTORE_ENABLED, null)); + + // --- Deployment dependents --- + specs.add(new DependentResourceSpec( + MetastoreDeploymentDependent.class, "metastore-deployment", + Set.of("schema-init-job"), + metastoreReady(), null, null, METASTORE_ENABLED, null)); + + // --- Service dependents --- + specs.add(new DependentResourceSpec( + HiveServiceDependent.Metastore.class, "metastore-service", + Set.of("metastore-configmap"), + null, null, null, METASTORE_ENABLED, null)); + + specs.add(new DependentResourceSpec( + HiveServer2DeploymentDependent.class, "hiveserver2-deployment", + Set.of("hiveserver2-configmap", "hadoop-configmap"), + null, hs2Precondition(), null, null, null)); + + specs.add(new DependentResourceSpec( + HiveServiceDependent.HiveServer2.class, "hiveserver2-service", + Set.of("hiveserver2-configmap"), + null, null, null, null, null)); + + // --- LLAP (conditional) --- + specs.add(new DependentResourceSpec( + HiveConfigMapDependent.Llap.class, "llap-configmap", + Set.of(), null, null, null, LLAP_ENABLED, null)); + + specs.add(new DependentResourceSpec( + LlapStatefulSetDependent.class, "llap-statefulset", + Set.of("llap-configmap", "hadoop-configmap"), + null, null, null, LLAP_ENABLED, null)); + + specs.add(new DependentResourceSpec( + HiveServiceDependent.Llap.class, "llap-service", + Set.of(), null, null, null, LLAP_ENABLED, null)); + + // --- TezAM (conditional) --- + specs.add(new DependentResourceSpec( + ScratchPvcDependent.class, "scratch-pvc", + Set.of(), null, null, null, TEZAM_ENABLED, null)); + + specs.add(new DependentResourceSpec( + HiveServiceDependent.TezAm.class, "tezam-service", + Set.of(), null, null, null, TEZAM_ENABLED, null)); + + specs.add(new DependentResourceSpec( + TezAmStatefulSetDependent.class, "tezam-statefulset", + Set.of("hiveserver2-configmap", "hadoop-configmap", "tezam-service", "scratch-pvc"), + null, null, null, TEZAM_ENABLED, null)); + + specs.add(new DependentResourceSpec( + HiveScaledObjectDependent.HiveServer2.class, "hs2-scaledobject", + Set.of("hiveserver2-deployment"), + null, HS2_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HiveServer2InterceptorRouteDependent.class, "hs2-interceptor-route", + Set.of("hiveserver2-deployment"), + null, HS2_SCALE_TO_ZERO, null, null, null)); + + specs.add(new DependentResourceSpec( + HiveScaledObjectDependent.Metastore.class, "metastore-scaledobject", + Set.of("metastore-deployment"), + null, METASTORE_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HiveScaledObjectDependent.Llap.class, "llap-scaledobject", + Set.of("llap-statefulset", "hs2-scaledobject"), + null, LLAP_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HiveScaledObjectDependent.TezAm.class, "tezam-scaledobject", + Set.of("tezam-statefulset", "hs2-scaledobject"), + null, TEZAM_AUTOSCALING, null, null, null)); + + // --- Autoscaling: PodDisruptionBudgets (conditional) --- + specs.add(new DependentResourceSpec( + HivePdbDependent.HiveServer2.class, "hs2-pdb", + Set.of("hiveserver2-deployment"), + null, HS2_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HivePdbDependent.Metastore.class, "metastore-pdb", + Set.of("metastore-deployment"), + null, METASTORE_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HivePdbDependent.Llap.class, "llap-pdb", + Set.of("llap-statefulset"), + null, LLAP_AUTOSCALING, null, null, null)); + + specs.add(new DependentResourceSpec( + HivePdbDependent.TezAm.class, "tezam-pdb", + Set.of("tezam-statefulset"), + null, TEZAM_AUTOSCALING, null, null, null)); + + return Collections.unmodifiableList(specs); + } + + /** + * Ready postcondition: schema initialization Job must complete successfully + * before the Metastore Deployment is created. + */ + private static Condition schemaJobCompleted() { + return (dependentResource, primary, context) -> { + if (!primary.getSpec().metastore().isEnabled()) { + return true; + } + return dependentResource.getSecondaryResource(primary, context) + .map(job -> { + var j = (io.fabric8.kubernetes.api.model.batch.v1.Job) job; + return j.getStatus() != null + && j.getStatus().getSucceeded() != null + && j.getStatus().getSucceeded() >= 1; + }) + .orElse(false); + }; + } + + /** + * Ready postcondition: Metastore Deployment must have the desired number + * of ready replicas before downstream dependents proceed. + */ + private static Condition metastoreReady() { + return (dependentResource, primary, context) -> { + if (!primary.getSpec().metastore().isEnabled()) { + return true; + } + int desiredReplicas; + if (primary.getSpec().metastore().autoscaling().isEnabled()) { + desiredReplicas = Math.max(1, primary.getSpec().metastore().autoscaling().minReplicas()); + } else { + desiredReplicas = primary.getSpec().metastore().replicas(); + } + return dependentResource.getSecondaryResource(primary, context) + .map(resource -> { + var deployment = (Deployment) resource; + return deployment.getStatus() != null + && deployment.getStatus().getReadyReplicas() != null + && deployment.getStatus().getReadyReplicas() >= desiredReplicas; + }) + .orElse(false); + }; + } + + /** + * Reconcile precondition for HiveServer2: if Metastore is managed, + * wait for it to be ready before reconciling HS2. + */ + private static Condition hs2Precondition() { + return (dependentResource, primary, context) -> { + if (!primary.getSpec().metastore().isEnabled()) { + return true; + } + int desiredReplicas; + if (primary.getSpec().metastore().autoscaling().isEnabled()) { + desiredReplicas = Math.max(1, primary.getSpec().metastore().autoscaling().minReplicas()); + } else { + desiredReplicas = primary.getSpec().metastore().replicas(); + } + return context.getSecondaryResources(Deployment.class).stream() + .filter(d -> d.getMetadata().getName().equals( + primary.getMetadata().getName() + "-metastore")) + .findFirst() + .map(deployment -> deployment.getStatus() != null + && deployment.getStatus().getReadyReplicas() != null + && deployment.getStatus().getReadyReplicas() >= desiredReplicas) + .orElse(false); + }; + } + + @Override + public List getDependentResourceSpecs() { + return SPECS; + } + + @Override + public boolean isExplicitInvocation() { + return false; + } + + @Override + public boolean handleExceptionsInReconciler() { + return true; + } +} diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/ConfigUtils.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/ConfigUtils.java index 0f86201817e7..95e66bd91979 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/ConfigUtils.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/ConfigUtils.java @@ -73,14 +73,49 @@ private ConfigUtils() { public static final String HIVE_LLAP_DAEMON_NUM_EXECUTORS_KEY = "hive.llap.daemon.num.executors"; + public static final String METASTORE_SERVER_TRANSPORT_MODE_KEY = "metastore.server.thrift.transport.mode"; + public static final String METASTORE_SERVER_TRANSPORT_MODE_DEFAULT = "http"; + + public static final String METASTORE_SERVER_HTTP_PATH_KEY = "metastore.server.thrift.http.path"; + public static final String METASTORE_SERVER_HTTP_PATH_DEFAULT = "metastore"; + + public static final String METASTORE_CLIENT_TRANSPORT_MODE_KEY = "hive.metastore.client.thrift.transport.mode"; + public static final String METASTORE_CLIENT_TRANSPORT_MODE_DEFAULT = "http"; + + public static final String METASTORE_CLIENT_HTTP_PATH_KEY = "metastore.client.thrift.http.path"; + public static final String METASTORE_CLIENT_HTTP_PATH_DEFAULT = "metastore"; + + public static final String METASTORE_SERVER_MAX_THREADS_KEY = "metastore.server.max.threads"; + public static final String METASTORE_SERVER_MAX_THREADS_HIVE_KEY = "hive.metastore.server.max.threads"; + public static final int METASTORE_SERVER_MAX_THREADS_DEFAULT = 1000; + public static final String HIVE_METASTORE_URIS_KEY = "hive.metastore.uris"; + public static final String HIVE_SERVER2_TEZ_SESSIONS_PER_QUEUE_KEY = "hive.server2.tez.sessions.per.default.queue"; + public static final int HIVE_SERVER2_TEZ_SESSIONS_PER_QUEUE_DEFAULT = 1; + public static final String HIVE_SERVER2_THRIFT_PORT_KEY = "hive.server2.thrift.port"; public static final int HIVE_SERVER2_THRIFT_PORT_DEFAULT = 10000; + public static final String HIVE_SERVER2_THRIFT_HTTP_PORT_KEY = "hive.server2.thrift.http.port"; + public static final int HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT = 10001; + + public static final String HIVE_SERVER2_THRIFT_HTTP_PATH_KEY = "hive.server2.thrift.http.path"; + public static final String HIVE_SERVER2_THRIFT_HTTP_PATH_DEFAULT = "cliservice"; + + public static final String HIVE_SERVER2_TRANSPORT_MODE_KEY = "hive.server2.transport.mode"; + public static final String HIVE_SERVER2_TRANSPORT_MODE_DEFAULT = "http"; + public static final String HIVE_SERVER2_WEBUI_PORT_KEY = "hive.server2.webui.port"; public static final int HIVE_SERVER2_WEBUI_PORT_DEFAULT = 10002; + /** Port for the Prometheus JMX Exporter agent (serves /metrics in text format). */ + public static final int PROMETHEUS_JMX_EXPORTER_PORT = 9404; + + /** Default URL for the Prometheus JMX Exporter javaagent JAR. */ + public static final String JMX_EXPORTER_JAR_URL = + "https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/1.0.1/jmx_prometheus_javaagent-1.0.1.jar"; + public static final String TEZ_AM_SESSION_MODE_KEY = "tez.am.mode.session"; public static final String TEZ_IGNORE_LIB_URIS_KEY = "tez.ignore.lib.uris"; diff --git a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/HiveConfigBuilder.java b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/HiveConfigBuilder.java index 5db24e95d3f3..f046b685f653 100644 --- a/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/HiveConfigBuilder.java +++ b/packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/util/HiveConfigBuilder.java @@ -57,9 +57,20 @@ public static Map getHiveServer2HiveSite( if (metastoreUri != null && !metastoreUri.isEmpty()) { props.put(ConfigUtils.METASTORE_URIS_KEY, metastoreUri); } + // Client-side HTTP transport mode to match metastore server config. + props.put(ConfigUtils.METASTORE_CLIENT_TRANSPORT_MODE_KEY, + ConfigUtils.METASTORE_CLIENT_TRANSPORT_MODE_DEFAULT); + props.put(ConfigUtils.METASTORE_CLIENT_HTTP_PATH_KEY, + ConfigUtils.METASTORE_CLIENT_HTTP_PATH_DEFAULT); props.put(ConfigUtils.HIVE_METASTORE_WAREHOUSE_KEY, spec.metastore().warehouseDir()); props.put(ConfigUtils.HIVE_SERVER2_ENABLE_DOAS_KEY, "false"); + props.put(ConfigUtils.HIVE_SERVER2_TRANSPORT_MODE_KEY, + ConfigUtils.HIVE_SERVER2_TRANSPORT_MODE_DEFAULT); + props.put(ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_KEY, + String.valueOf(ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PORT_DEFAULT)); + props.put(ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PATH_KEY, + ConfigUtils.HIVE_SERVER2_THRIFT_HTTP_PATH_DEFAULT); props.put(ConfigUtils.HIVE_TEZ_EXEC_INPLACE_PROGRESS_KEY, "false"); props.put(ConfigUtils.HIVE_TEZ_EXEC_SUMMARY_KEY, "true"); props.put(ConfigUtils.HIVE_JAR_DIRECTORY_KEY, "/tmp"); @@ -95,6 +106,14 @@ public static Map getHiveServer2HiveSite( props.put("mapreduce.framework.name", "local"); } + // Enable JMX metrics when autoscaling is active. + // The Prometheus JMX Exporter agent (added by the operator) reads JMX MBeans + // and exposes them in Prometheus text format at /metrics on the metrics port. + if (spec.hiveServer2().autoscaling().isEnabled()) { + props.put("hive.server2.metrics.enabled", "true"); + props.put("hive.server2.metrics.reporter", "JMX"); + } + if (spec.hiveServer2().configOverrides() != null) { props.putAll(spec.hiveServer2().configOverrides()); } @@ -149,6 +168,13 @@ public static Map getMetastoreSite(HiveClusterSpec spec) { MetastoreSpec metastore = spec.metastore(); Map props = new LinkedHashMap<>(); + // HTTP transport mode: stateless connections allow safe scale-down + // without breaking active client connections. + props.put(ConfigUtils.METASTORE_SERVER_TRANSPORT_MODE_KEY, + ConfigUtils.METASTORE_SERVER_TRANSPORT_MODE_DEFAULT); + props.put(ConfigUtils.METASTORE_SERVER_HTTP_PATH_KEY, + ConfigUtils.METASTORE_SERVER_HTTP_PATH_DEFAULT); + props.put(ConfigUtils.METASTORE_WAREHOUSE_KEY, metastore.warehouseDir()); @@ -165,6 +191,14 @@ public static Map getMetastoreSite(HiveClusterSpec spec) { } } + // Enable JMX metrics when autoscaling is active. + // The Prometheus JMX Exporter agent reads JMX MBeans and exposes them + // in Prometheus text format at /metrics on the metrics port. + if (metastore.autoscaling().isEnabled()) { + props.put("metastore.metrics.enabled", "true"); + props.put("metastore.metrics.reporter", "JMX"); + } + if (metastore.configOverrides() != null) { props.putAll(metastore.configOverrides()); } diff --git a/ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestVectorMapJoinOuterGenerateResultOperator.java b/ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestVectorMapJoinOuterGenerateResultOperator.java index 35553d9cb445..85e6882d4d68 100644 --- a/ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestVectorMapJoinOuterGenerateResultOperator.java +++ b/ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/TestVectorMapJoinOuterGenerateResultOperator.java @@ -54,7 +54,6 @@ */ class TestVectorMapJoinOuterGenerateResultOperator { - /** Concrete subclass that exposes the generateOuterNulls* methods to tests. */ private static final class TestableOuterOp extends VectorMapJoinOuterGenerateResultOperator { @Override protected String getLoggingPrefix() {