refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#16
refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#16jolovicdev wants to merge 1 commit into
Conversation
…tracking The SQLite store maintained two independent timestamps (claimed_at and last_accessed_at) that were updated through different code paths. This led to subtle inconsistencies: - find_by_fingerprint did a side-effect UPDATE on every read, adding write contention and making reads non-deterministic. - put_commit passed a separate 'now' for last_accessed_at, which could drift from claimed_at if the commit was modified elsewhere. Changes: - put_commit now uses commit.claimed_at as the source of truth for last_accessed_at. A commit is 'accessed' when it is claimed. - find_by_fingerprint is now side-effect-free; callers that need to touch a commit already call put_commit on cache hits. - Redis _encode_commit aligned to use commit.claimed_at for last_accessed_at for cross-backend consistency. Tests verify the invariant last_accessed_at == claimed_at and that repeated cache hits no longer shift the access timestamp.
There was a problem hiding this comment.
PR Review
PR: refactor: derive last_accessed_at from claimed_at to unify timestamp tracking
Important
Verdict: Request changes - 7 actionable findings, highest severity P1.
Findings (7)
Redis find_by_fingerprint still updates access index via _touch_commit
src/cashet/redis_store.py:395
The SQLite store no longer updates any timestamp on cache hits, but Redis still calls _touch_commit (line 395) which updates the access index sorted set. This creates inconsistent eviction behavior between backends. Remove the _touch_commit call from find_by_fingerprint for consistency.
_index_commit_commands sets access index to current time on every put_commit
src/cashet/redis_store.py:249
Line 249 writes now_ts to the access index for every put_commit call, even for updates that are not cache hits (e.g., status changes). This unintentionally defers eviction for commits that are updated but not accessed. Use commit.claimed_at or do not update the access index on put_commit at all.
_touch_commit updates access index but not commit's last_accessed_at field
src/cashet/redis_store.py:626-628
The _touch_commit method only updates the sorted set cashet:index:last_accessed with a new timestamp, but does not update the last_accessed_at field inside the commit JSON stored at cashet:commit:{hash}. Any code reading the commit's own field will see the original claimed_at instead of the touch time. Either update the commit data or remove _touch_commit entirely.
<img alt="P2 Medium" src="https://img.shields.io/badge/P2-Medium-ca8a04?style=flat-square" height="20" align="absmiddle"
src/cashet/store.py:275
Possible AttributeError if commit.claimed_at is None
The call commit.claimed_at.isoformat() will raise AttributeError if claimed_at is None. While claimed_at is typically set before put_commit, defensive code should verify it is not None to prevent crashes from malformed Commit objects.
src/cashet/store.py:275
Missing documentation or changelog for changed last_accessed_at semantics
The PR changes last_accessed_at to always equal claimed_at, and removes the side-effect update on reads. This is a breaking change for users relying on the old behavior (e.g., for monitoring or custom eviction logic). Add a note to the changelog and update the README or API docs to explain the new semantics.
src/cashet/store.py:275
Potential timezone inconsistency with claimed_at.isoformat()
The call commit.claimed_at.isoformat() may include a timezone offset if claimed_at is timezone-aware with a non-UTC offset. Other timestamps in the store use datetime.now(UTC).isoformat() which always appends +00:00. This could cause comparison issues if the two ISO strings are compared lexicographically. Ensure claimed_at is always in UTC before calling isoformat().
src/cashet/store.py:331
Dead variable now_iso in find_by_fingerprint
After removing the UPDATE last_accessed_at, the variable now_iso computed on line 332 is no longer used. This wastes a datetime formatting call and may trigger linting warnings. Remove the line.
Warning
Some findings could not be anchored to changed diff lines, so they were kept here instead of
being posted as plain timeline comments.
How To Recheck
Reply @ds-review recheck under the relevant inline finding after pushing a fix.
| now = datetime.now(UTC).isoformat() | ||
| try: | ||
| self._put_commit_row(conn, commit, now) | ||
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) |
There was a problem hiding this comment.
Possible AttributeError if commit.claimed_at is None
The call commit.claimed_at.isoformat() will raise AttributeError if claimed_at is None. While claimed_at is typically set before put_commit, defensive code should verify it is not None to prevent crashes from malformed Commit objects.
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) | |
| assert commit.claimed_at is not None, "claimed_at must not be None" | |
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) |
| conn = self._connect() | ||
| now = datetime.now(UTC) | ||
| now_iso = now.isoformat() | ||
| now_iso = datetime.now(UTC).isoformat() |
| now = datetime.now(UTC).isoformat() | ||
| try: | ||
| self._put_commit_row(conn, commit, now) | ||
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) |
There was a problem hiding this comment.
Missing documentation or changelog for changed last_accessed_at semantics
The PR changes last_accessed_at to always equal claimed_at, and removes the side-effect update on reads. This is a breaking change for users relying on the old behavior (e.g., for monitoring or custom eviction logic). Add a note to the changelog and update the README or API docs to explain the new semantics.
| now = datetime.now(UTC).isoformat() | ||
| try: | ||
| self._put_commit_row(conn, commit, now) | ||
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) |
There was a problem hiding this comment.
Potential timezone inconsistency with claimed_at.isoformat()
The call commit.claimed_at.isoformat() may include a timezone offset if claimed_at is timezone-aware with a non-UTC offset. Other timestamps in the store use datetime.now(UTC).isoformat() which always appends +00:00. This could cause comparison issues if the two ISO strings are compared lexicographically. Ensure claimed_at is always in UTC before calling isoformat().
| self._put_commit_row(conn, commit, commit.claimed_at.isoformat()) | |
| # Normalize to UTC before converting | |
| claimed_utc = commit.claimed_at.astimezone(UTC) | |
| self._put_commit_row(conn, commit, claimed_utc.isoformat()) |
|
@ds-review review |
Summary
Unifies
last_accessed_atwithclaimed_atacross SQLite and Redis backends, eliminating a redundant timestamp and the side-effect write infind_by_fingerprint.Problem
The SQLite store tracked two independent timestamps:
claimed_at— set when a worker acquired the claimlast_accessed_at— updated on everyfind_by_fingerprintreadThis caused two issues:
find_by_fingerprintdid anUPDATE commits SET last_accessed_at = ?, turning every cache lookup into a write.put_commitpassed a freshnowforlast_accessed_at, which could diverge fromclaimed_atif the commit row was updated by a heartbeat or status change elsewhere.Changes
src/cashet/store.pyput_commitnow deriveslast_accessed_atfromcommit.claimed_at.isoformat()instead of a separatenow.find_by_fingerprintis now side-effect-free; theUPDATE last_accessed_athas been removed.src/cashet/redis_store.py_encode_commitusescommit.claimed_atforlast_accessed_atto keep both backends consistent.tests/test_store.py&tests/test_async_client.pytest_last_accessed_at_derived_from_claimed_at— verifies the invariant after task completion.test_cache_hit_does_not_shift_last_accessed_at— verifies that repeated cache hits no longer move the access timestamp.Why this is safe
Callers that need to "touch" a commit (cache-hit promotion, heartbeat) already call
put_commit, which writes the row with the currentclaimed_at. Removing the implicit UPDATE insidefind_by_fingerprintjust makes the behavior explicit and idempotent.Testing
All 298 tests pass (45 skipped for Redis).