Skip to content

[DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls#1715

Draft
marceloneppel wants to merge 9 commits into
16/edgefrom
fix/proxy-bypass-patroni-api
Draft

[DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls#1715
marceloneppel wants to merge 9 commits into
16/edgefrom
fix/proxy-bypass-patroni-api

Conversation

@marceloneppel

@marceloneppel marceloneppel commented May 26, 2026

Copy link
Copy Markdown
Member

Issue

When deployed behind an HTTP proxy (configured via http-proxy/https-proxy model config and cloudinit-userdata), the charm gets stuck in "awaiting start of the primary" and get_primary() returns None.

Root cause: httpx.AsyncClient (used by parallel_patroni_get_request for multi-node Patroni queries) picks up HTTPS_PROXY from /etc/environment and routes intra-cluster API calls through the proxy. Unlike requests/urllib3, httpx does not honor CIDR ranges in no_proxy (e.g. 10.0.0.0/8), so Squid receives the CONNECT request and returns httpx.ProxyError: 403 Forbidden. This prevents cluster_status() from completing, so _start_primary never sets cluster_initialised, and replicas remain stuck.

Solution

Ignore proxy environment variables for all intra-cluster Patroni communication:

Testing

Add an integration test (tests/integration/test_proxy.py) that deploys a 3-unit cluster with both model-config proxy settings and cloudinit-userdata writing proxy vars to /etc/environment, then asserts the units reach active, the Patroni REST API is reachable on every unit, and the get-primary action works.

The spread task (tests/spread/test_proxy.py/task.yaml) stands up a real Squid proxy on the LXD bridge in prepare and tears it down in restore, and runs the test against the shared, arch-constrained testing model (so it passes on both amd64 and arm64).

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

Closes #1714.

When deployed behind an HTTP proxy, httpx and requests route intra-cluster
Patroni API calls through the proxy, causing get_primary() to return None
and leaving the charm stuck in "awaiting start of the primary".

Set trust_env=False on all HTTP clients (requests.Session and
httpx.AsyncClient) so proxy environment variables are ignored for
Patroni communication.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@github-actions github-actions Bot added the Libraries: Out of sync The charm libs used are out-of-sync label May 27, 2026
…atroni-api

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Validates that a 3-node deployment behind an HTTP proxy reaches
active status, with proxy env vars injected via cloudinit-userdata
into /etc/environment. Covers Patroni API reachability and primary
election through proxied environments where httpx would otherwise
fail with ProxyError due to CIDR-based no_proxy being ignored.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel changed the title fix: bypass HTTP proxy for Patroni REST API calls fix: bypass HTTP proxy for intra-cluster Patroni API calls May 28, 2026
@marceloneppel marceloneppel added the bug Something isn't working as expected label May 28, 2026
… test

The proxy integration test requires a real Squid proxy on the LXD bridge
for containers to route through. Add prepare/restore steps to task.yaml
to install, configure, and tear down Squid automatically. Replace
run_command_on_unit (which uses bare juju exec without -m) with
juju.ssh() to ensure commands target the correct model.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The proxy integration test created its own Juju model via
temp_model_fixture, which carried no architecture constraint. Juju
defaults new models to arch=amd64, so on the arm64-only CI cloud the
deploy failed with "invalid constraint value: arch=amd64".

Run the test against the shared `testing` model instead, which already
gets `arch=$(dpkg --print-architecture)` from the spread prepare-each
step. The spread task now passes `--model testing`, and the proxy
config is applied to that model with set_config before the deploy so
cloudinit-userdata still reaches the units' machines.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@marceloneppel marceloneppel changed the title fix: bypass HTTP proxy for intra-cluster Patroni API calls [DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls Jun 1, 2026
@marceloneppel marceloneppel deleted the fix/proxy-bypass-patroni-api branch June 2, 2026 12:18
@marceloneppel marceloneppel restored the fix/proxy-bypass-patroni-api branch June 2, 2026 19:53
@marceloneppel marceloneppel reopened this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working as expected Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant