[DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls#1715
Draft
marceloneppel wants to merge 9 commits into
Draft
[DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls#1715marceloneppel wants to merge 9 commits into
marceloneppel wants to merge 9 commits into
Conversation
When deployed behind an HTTP proxy, httpx and requests route intra-cluster Patroni API calls through the proxy, causing get_primary() to return None and leaving the charm stuck in "awaiting start of the primary". Set trust_env=False on all HTTP clients (requests.Session and httpx.AsyncClient) so proxy environment variables are ignored for Patroni communication. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Validates that a 3-node deployment behind an HTTP proxy reaches active status, with proxy env vars injected via cloudinit-userdata into /etc/environment. Covers Patroni API reachability and primary election through proxied environments where httpx would otherwise fail with ProxyError due to CIDR-based no_proxy being ignored. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
… test The proxy integration test requires a real Squid proxy on the LXD bridge for containers to route through. Add prepare/restore steps to task.yaml to install, configure, and tear down Squid automatically. Replace run_command_on_unit (which uses bare juju exec without -m) with juju.ssh() to ensure commands target the correct model. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The proxy integration test created its own Juju model via temp_model_fixture, which carried no architecture constraint. Juju defaults new models to arch=amd64, so on the arm64-only CI cloud the deploy failed with "invalid constraint value: arch=amd64". Run the test against the shared `testing` model instead, which already gets `arch=$(dpkg --print-architecture)` from the spread prepare-each step. The spread task now passes `--model testing`, and the proxy config is applied to that model with set_config before the deploy so cloudinit-userdata still reaches the units' machines. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
…atroni-api Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
When deployed behind an HTTP proxy (configured via
http-proxy/https-proxymodel config andcloudinit-userdata), the charm gets stuck in "awaiting start of the primary" andget_primary()returnsNone.Root cause:
httpx.AsyncClient(used byparallel_patroni_get_requestfor multi-node Patroni queries) picks upHTTPS_PROXYfrom/etc/environmentand routes intra-cluster API calls through the proxy. Unlikerequests/urllib3, httpx does not honor CIDR ranges inno_proxy(e.g.10.0.0.0/8), so Squid receives the CONNECT request and returnshttpx.ProxyError: 403 Forbidden. This preventscluster_status()from completing, so_start_primarynever setscluster_initialised, and replicas remain stuck.Solution
Ignore proxy environment variables for all intra-cluster Patroni communication:
src/cluster.py: route every Patroni REST API call through a sharedrequests.Sessionwithtrust_env=False.scripts/cluster_topology_observer.py: settrust_env=Falseon thehttpx.AsyncClient.postgresql-charms-single-kernelto16.2.3([DPE-10235] fix: bypass HTTP proxy for intra-cluster Patroni API calls postgresql-single-kernel-library#132), which carries the matchingtrust_env=Falsefix for the library'shttpx.AsyncClientused byparallel_patroni_get_request.Testing
Add an integration test (
tests/integration/test_proxy.py) that deploys a 3-unit cluster with both model-config proxy settings andcloudinit-userdatawriting proxy vars to/etc/environment, then asserts the units reachactive, the Patroni REST API is reachable on every unit, and theget-primaryaction works.The spread task (
tests/spread/test_proxy.py/task.yaml) stands up a real Squid proxy on the LXD bridge inprepareand tears it down inrestore, and runs the test against the shared, arch-constrainedtestingmodel (so it passes on both amd64 and arm64).Checklist
Closes #1714.