Skip to content

P1 security hardening for #385: pause-cooldown snapshot + distributor per-root cap#446

Closed
seongyun-ko wants to merge 1 commit into
pankaj/feat/security-upgradesfrom
security/p1-halt-coverage-hardening
Closed

P1 security hardening for #385: pause-cooldown snapshot + distributor per-root cap#446
seongyun-ko wants to merge 1 commit into
pankaj/feat/security-upgradesfrom
security/p1-halt-coverage-hardening

Conversation

@seongyun-ko
Copy link
Copy Markdown
Contributor

@seongyun-ko seongyun-ko commented May 29, 2026

P1 security hardening for #385

Two follow-on fixes, each small and tested. Base: pankaj/feat/security-upgrades.


1. Fix the pause "cooldown" so it can't be silently shrunk or block the first pause (PausableUntil)

Background: after someone pauses a contract, they have to wait out a cooldown before they can pause again (anti-spam). The cooldown was being recalculated live from the current pause-duration setting, which caused two bugs.

Bug 1 — the cooldown can be shrunk after the fact. It's computed from the duration setting as it is right now, not as it was when the pause happened.

Example: Alice pauses with the duration set to 30 days, so her cooldown should run ~37 days. Someone later lowers the duration setting to 8 hours. Alice's cooldown is now recomputed as ~7 days and she can pause again far earlier than intended — a different role just shortened her cooldown without touching her pause.

Bug 2 — the very first pause can be blocked on a fresh chain. A never-paused account is treated as "last paused at timestamp 0", so the check reads 0 + duration + cooldown > now. On mainnet now is huge so it's fine, but on a new chain with a low starting timestamp the emergency pause is dead for the first ~37 days.

Example: we deploy these contracts to a new L2 whose clock starts near 0. The Guardian tries the emergency pause on day 1 — it reverts with "cooldown active", even though nobody has ever paused.

The fix: snapshot each pauser's cooldown end at the moment they pause (a new cooldownUntil value) instead of recomputing it. A later settings change can't move it (fixes bug 1), and a never-paused account has a cooldown of 0 so the first pause always works (fixes bug 2). Behaviour is identical to before whenever the duration setting isn't changed between pauses — all existing tests pass unchanged.


2. Cap how much one rewards batch can pay out (CumulativeMerkleRewardsDistributor)

Background: rewards are published as a Merkle root (a signed list of "who can claim how much"). Publishing a root is done by an operations key. Today, claims are checked against a whitelist and against double-claiming — but there's no ceiling on the total a single root can pay out.

Example: an attacker who steals the operations key publishes a root that says they're owed the contract's entire balance, waits out the 2-day delay, and drains it. The whitelist doesn't help if the attacker also controls a whitelisted address.

The fix: an admin (the 2-day timelock) can set a per-batch ceiling per token, e.g. "never pay out more than 500 ETH under any one rewards root." Claims that would push a root's total past the ceiling revert. It's opt-in — a ceiling of 0 means "no limit" (current behaviour), so nothing changes until governance sets one. Once set, a stolen operations key can't drain beyond the approved amount.

  • Tests: a 150-ETH cap lets a 100-ETH claim through, then blocks a claim that would push the batch total to 300; and a 0 cap behaves exactly as before.

One item we tried and deliberately dropped: pausing the withdrawal claim step

The review suggested gating claimWithdraw with the pause too. We implemented it — and it broke three existing tests named test_claimWithdraw_succeedsWhenPaused. Those tests are there on purpose: claiming already-finalized withdrawals is meant to keep working during a pause, so a halt doesn't trap users' money that the protocol already owes them. We reverted it.

If you do want the claim step halted during a confirmed exploit (e.g. corrupted finalization), that's a deliberate trade-off and would mean updating those tests — flagging it rather than deciding for you.

Three token-transfer items left for a design decision (not in this PR)

These change behaviour the team wrote on purpose and would touch many token tests, so I'm flagging rather than forcing:

  • Dust-spam griefing: a per-address rate limit is charged on both sender and recipient, so anyone can spam tiny transfers to a rate-limited address and freeze its incoming transfers. → charge sender-only, or skip (don't revert) when the recipient's limit is empty.
  • Blacklisting a router freezes everyone: the transfer check also blocks the caller (msg.sender), so blacklisting a shared router/pool (Curve, Pendle, Aave…) breaks it for all honest users. → use a separate, multisig-only "spender" list.
  • No protection for our own addresses: nothing stops a rate-limit/blacklist being applied to internal addresses (LP, RedemptionManager, token proxies), which would freeze the whole system. → exempt known internal addresses.

Test status

  • New tests pass: PausableUntilTest 32/32, CumulativeMerkleRewardsDistributorTest 29/29.
  • Pre-existing latest-block fork failures in WithdrawRequestNFTTest / EtherFiOracleTest reproduce on the untouched base branch and are unrelated.

- PausableUntil: snapshot each pauser's cooldown at pause time (cooldownUntil mapping)
  instead of recomputing from lastPauseTimestamp + current pauseUntilDuration. Makes the
  cooldown immune to a later setPauseUntilDuration (M1) and lets a never-paused pauser
  fire the first pause on any chain regardless of block.timestamp (M2).
- CumulativeMerkleRewardsDistributor: add an opt-in, Operation-Timelock-set per-root
  payout ceiling (maxClaimablePerRoot) enforced in claim(), bounding a compromised
  EXECUTOR_OPERATIONS key that publishes a malicious root (M9). 0 = uncapped (default).

Tests: PausableUntil cooldown-snapshot + zero-baseline; distributor per-root cap. All pass.

Note: claim-leg whenNotPaused (M10) was intentionally NOT included — PriorityWithdrawalQueue
has explicit tests asserting claims succeed while paused; finalized-claim availability
during a halt is a deliberate design choice, documented in the PR for the team to revisit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant