refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#15
refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#15jolovicdev wants to merge 1 commit into
Conversation
…tracking The SQLite store maintained two independent timestamps (claimed_at and last_accessed_at) that were updated through different code paths. This led to subtle inconsistencies: - find_by_fingerprint did a side-effect UPDATE on every read, adding write contention and making reads non-deterministic. - put_commit passed a separate 'now' for last_accessed_at, which could drift from claimed_at if the commit was modified elsewhere. Changes: - put_commit now uses commit.claimed_at as the source of truth for last_accessed_at. A commit is 'accessed' when it is claimed. - find_by_fingerprint is now side-effect-free; callers that need to touch a commit already call put_commit on cache hits. - Redis _encode_commit aligned to use commit.claimed_at for last_accessed_at for cross-backend consistency. Tests verify the invariant last_accessed_at == claimed_at and that repeated cache hits no longer shift the access timestamp.
|
@ds-review review |
|
Recreating to fix GitHub UI visibility. |
There was a problem hiding this comment.
PR Review
PR: refactor: derive last_accessed_at from claimed_at to unify timestamp tracking
Important
Verdict: Request changes - 4 actionable findings, highest severity P2.
Findings (4)
<img alt="P2 Medium" src="https://img.shields.io/badge/P2-Medium-ca8a04?style=flat-square" height="20" align="absmiddle"
src/cashet/redis_store.py:123
Redis find_by_fingerprint still touches access time; backfill uses stale claimed_at
The PR removes the side-effect UPDATE in SQLite's find_by_fingerprint, but the Redis variant still calls _touch_commit which updates the access sorted set with current time, violating the intended backend consistency. Moreover, the backfill function _backfill_access_index reads last_accessed_at from the JSON, now set to claimed_at (which may be older than the original access score). During a backfill, it overwrites the access sorted set entry with this older timestamp, causing LRU eviction to consider still-active entries as old.
src/cashet/store.py:275
put_commit no longer advances last_accessed_at and may fail on None claimed_at
put_commit now uses commit.claimed_at.isoformat() instead of current time, so callers relying on put_commit for cache-hit promotion or heartbeats will not refresh the access timestamp, potentially causing premature eviction. Additionally, if commit.claimed_at is None (e.g., unclaimed commit), this will raise AttributeError. Previously, the timestamp was computed independently.
src/cashet/store.py:331
Dead code: now_iso assignment no longer used
The assignment to now_iso is no longer used because the subsequent UPDATE statement was removed. This wastes a system call and can mislead maintainers.
tests/test_store.py:1109
Tests use private _connect() to check timestamps
The new tests use client.store._connect() to directly query the database. This depends on internal implementation details (private method _connect) and may break if the store connection mechanism changes. Using a public API to retrieve the commit and check timestamps would be more robust.
How To Recheck
Reply @ds-review recheck under the relevant inline finding after pushing a fix.
| now = datetime.now(UTC).isoformat() | ||
| try: | ||
| self._put_commit_row(conn, commit, now) | ||
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) |
There was a problem hiding this comment.
put_commit no longer advances last_accessed_at and may fail on None claimed_at
put_commit now uses commit.claimed_at.isoformat() instead of current time, so callers relying on put_commit for cache-hit promotion or heartbeats will not refresh the access timestamp, potentially causing premature eviction. Additionally, if commit.claimed_at is None (e.g., unclaimed commit), this will raise AttributeError. Previously, the timestamp was computed independently.
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) | |
| self._put_commit_row(conn, commit, (commit.claimed_at or datetime.now(UTC)).isoformat()) |
| "created_at": commit.created_at.isoformat(), | ||
| "claimed_at": commit.claimed_at.isoformat(), | ||
| "last_accessed_at": datetime.now(UTC).isoformat(), | ||
| "last_accessed_at": commit.claimed_at.isoformat(), |
There was a problem hiding this comment.
Redis find_by_fingerprint still touches access time; backfill uses stale claimed_at
The PR removes the side-effect UPDATE in SQLite's find_by_fingerprint, but the Redis variant still calls _touch_commit which updates the access sorted set with current time, violating the intended backend consistency. Moreover, the backfill function _backfill_access_index reads last_accessed_at from the JSON, now set to claimed_at (which may be older than the original access score). During a backfill, it overwrites the access sorted set entry with this older timestamp, causing LRU eviction to consider still-active entries as old.
| conn = self._connect() | ||
| now = datetime.now(UTC) | ||
| now_iso = now.isoformat() | ||
| now_iso = datetime.now(UTC).isoformat() |
| ref = client.submit(work) | ||
| assert ref.load() == 42 | ||
|
|
||
| conn = client.store._connect() |
There was a problem hiding this comment.
Tests use private _connect() to check timestamps
The new tests use client.store._connect() to directly query the database. This depends on internal implementation details (private method _connect) and may break if the store connection mechanism changes. Using a public API to retrieve the commit and check timestamps would be more robust.
Summary
Unifies
last_accessed_atwithclaimed_atacross SQLite and Redis backends, eliminating a redundant timestamp and the side-effect write infind_by_fingerprint.Problem
The SQLite store tracked two independent timestamps:
claimed_at— set when a worker acquired the claimlast_accessed_at— updated on everyfind_by_fingerprintreadThis caused two issues:
find_by_fingerprintdid anUPDATE commits SET last_accessed_at = ?, turning every cache lookup into a write.put_commitpassed a freshnowforlast_accessed_at, which could diverge fromclaimed_atif the commit row was updated by a heartbeat or status change elsewhere.Changes
src/cashet/store.pyput_commitnow deriveslast_accessed_atfromcommit.claimed_at.isoformat()instead of a separatenow.find_by_fingerprintis now side-effect-free; theUPDATE last_accessed_athas been removed.src/cashet/redis_store.py_encode_commitusescommit.claimed_atforlast_accessed_atto keep both backends consistent.tests/test_store.py&tests/test_async_client.pytest_last_accessed_at_derived_from_claimed_at— verifies the invariant after task completion.test_cache_hit_does_not_shift_last_accessed_at— verifies that repeated cache hits no longer move the access timestamp.Why this is safe
Callers that need to "touch" a commit (cache-hit promotion, heartbeat) already call
put_commit, which writes the row with the currentclaimed_at. Removing the implicit UPDATE insidefind_by_fingerprintjust makes the behavior explicit and idempotent.Testing
All 298 tests pass (45 skipped for Redis).