Skip to content

Fix/dpe 10284 standby create extension#1765

Draft
marceloneppel wants to merge 4 commits into
16/edgefrom
fix/dpe-10284-standby-create-extension
Draft

Fix/dpe 10284 standby create extension#1765
marceloneppel wants to merge 4 commits into
16/edgefrom
fix/dpe-10284-standby-create-extension

Conversation

@marceloneppel

Copy link
Copy Markdown
Member

Issue

Solution

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

…ster

A standby cluster's PostgreSQL runs as a read-only hot standby, so its
predefined roles, system users, default database and extensions are
provisioned on the primary cluster and arrive through streaming
replication. After a full-cluster outage and restart, the standby
leader's deferred start event re-enters the bootstrap path and runs that
write DDL (CREATE EXTENSION ...) against the read-only instance, which
PostgreSQL rejects with ReadOnlySqlTransaction. The start hook then fails
on every retry, leaving the standby leader stuck in error even though
replication is healthy.

Skip user/role setup when the unit belongs to a standby cluster. Standby
detection is derived from the async-replication relation state, which
survives a cold reboot and does not depend on the Patroni API being up
(unlike a leader-role lookup), so it holds in exactly the
all-units-rebooting window where this bug occurs.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Standby-cluster detection now lives on the charm as is_standby_cluster, so
the backups handler no longer needs its own delegating wrapper. Call the
charm property directly to keep this check in a single place and drop the
dead indirection.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
@github-actions github-actions Bot added the Libraries: Out of sync The charm libs used are out-of-sync label Jun 11, 2026
…dby-create-extension

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
The existing async-replication and stereo-mode suites never restart an entire
standby cluster, so the DPE-10284 regression (the standby leader crashing its
start hook with ReadOnlySqlTransaction after a full outage) had no integration
coverage. Add a dedicated test that deploys two clusters, establishes async
replication, force-stops and restarts every unit on both sides, and asserts the
standby leader recovers to an active standby instead of getting stuck in error.

Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Libraries: Out of sync The charm libs used are out-of-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant