Skip to content

refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#15

Closed
jolovicdev wants to merge 1 commit into
masterfrom
test/consolidate-timestamp-tracking
Closed

refactor: derive last_accessed_at from claimed_at to unify timestamp tracking#15
jolovicdev wants to merge 1 commit into
masterfrom
test/consolidate-timestamp-tracking

Conversation

@jolovicdev
Copy link
Copy Markdown
Owner

Summary

Unifies last_accessed_at with claimed_at across SQLite and Redis backends, eliminating a redundant timestamp and the side-effect write in find_by_fingerprint.

Problem

The SQLite store tracked two independent timestamps:

  • claimed_at — set when a worker acquired the claim
  • last_accessed_at — updated on every find_by_fingerprint read

This caused two issues:

  1. Write contention on readsfind_by_fingerprint did an UPDATE commits SET last_accessed_at = ?, turning every cache lookup into a write.
  2. Driftput_commit passed a fresh now for last_accessed_at, which could diverge from claimed_at if the commit row was updated by a heartbeat or status change elsewhere.

Changes

  • src/cashet/store.py
    • put_commit now derives last_accessed_at from commit.claimed_at.isoformat() instead of a separate now.
    • find_by_fingerprint is now side-effect-free; the UPDATE last_accessed_at has been removed.
  • src/cashet/redis_store.py
    • _encode_commit uses commit.claimed_at for last_accessed_at to keep both backends consistent.
  • tests/test_store.py & tests/test_async_client.py
    • test_last_accessed_at_derived_from_claimed_at — verifies the invariant after task completion.
    • test_cache_hit_does_not_shift_last_accessed_at — verifies that repeated cache hits no longer move the access timestamp.

Why this is safe

Callers that need to "touch" a commit (cache-hit promotion, heartbeat) already call put_commit, which writes the row with the current claimed_at. Removing the implicit UPDATE inside find_by_fingerprint just makes the behavior explicit and idempotent.

Testing

All 298 tests pass (45 skipped for Redis).

…tracking

The SQLite store maintained two independent timestamps (claimed_at and
last_accessed_at) that were updated through different code paths. This
led to subtle inconsistencies:

- find_by_fingerprint did a side-effect UPDATE on every read, adding
  write contention and making reads non-deterministic.
- put_commit passed a separate 'now' for last_accessed_at, which could
  drift from claimed_at if the commit was modified elsewhere.

Changes:
- put_commit now uses commit.claimed_at as the source of truth for
  last_accessed_at. A commit is 'accessed' when it is claimed.
- find_by_fingerprint is now side-effect-free; callers that need to
  touch a commit already call put_commit on cache hits.
- Redis _encode_commit aligned to use commit.claimed_at for
  last_accessed_at for cross-backend consistency.

Tests verify the invariant last_accessed_at == claimed_at and that
repeated cache hits no longer shift the access timestamp.
Copy link
Copy Markdown

@ds-review ds-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review encountered an error and could not complete.

@jolovicdev
Copy link
Copy Markdown
Owner Author

@ds-review review

@jolovicdev
Copy link
Copy Markdown
Owner Author

Recreating to fix GitHub UI visibility.

@jolovicdev jolovicdev closed this May 7, 2026
Copy link
Copy Markdown

@ds-review ds-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

PR: refactor: derive last_accessed_at from claimed_at to unify timestamp tracking

Important

Verdict: Request changes - 4 actionable findings, highest severity P2.

Findings (4)

P2 Medium <img alt="P2 Medium" src="https://img.shields.io/badge/P2-Medium-ca8a04?style=flat-square" height="20" align="absmiddle"

src/cashet/redis_store.py:123

P2 Medium Redis find_by_fingerprint still touches access time; backfill uses stale claimed_at

The PR removes the side-effect UPDATE in SQLite's find_by_fingerprint, but the Redis variant still calls _touch_commit which updates the access sorted set with current time, violating the intended backend consistency. Moreover, the backfill function _backfill_access_index reads last_accessed_at from the JSON, now set to claimed_at (which may be older than the original access score). During a backfill, it overwrites the access sorted set entry with this older timestamp, causing LRU eviction to consider still-active entries as old.

P2 Medium **P1 High ****

src/cashet/store.py:275

P1 High put_commit no longer advances last_accessed_at and may fail on None claimed_at

put_commit now uses commit.claimed_at.isoformat() instead of current time, so callers relying on put_commit for cache-hit promotion or heartbeats will not refresh the access timestamp, potentially causing premature eviction. Additionally, if commit.claimed_at is None (e.g., unclaimed commit), this will raise AttributeError. Previously, the timestamp was computed independently.

P2 Medium **P3 Low De

src/cashet/store.py:331

P3 Low Dead code: now_iso assignment no longer used

The assignment to now_iso is no longer used because the subsequent UPDATE statement was removed. This wastes a system call and can mislead maintainers.

P2 Medium **P3 Low Te

tests/test_store.py:1109

P3 Low Tests use private _connect() to check timestamps

The new tests use client.store._connect() to directly query the database. This depends on internal implementation details (private method _connect) and may break if the store connection mechanism changes. Using a public API to retrieve the commit and check timestamps would be more robust.

How To Recheck

Reply @ds-review recheck under the relevant inline finding after pushing a fix.

Comment thread src/cashet/store.py
now = datetime.now(UTC).isoformat()
try:
self._put_commit_row(conn, commit, now)
self._put_commit_row(conn, commit, commit.claimed_at.isoformat())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 High put_commit no longer advances last_accessed_at and may fail on None claimed_at

put_commit now uses commit.claimed_at.isoformat() instead of current time, so callers relying on put_commit for cache-hit promotion or heartbeats will not refresh the access timestamp, potentially causing premature eviction. Additionally, if commit.claimed_at is None (e.g., unclaimed commit), this will raise AttributeError. Previously, the timestamp was computed independently.

Suggested change
self._put_commit_row(conn, commit, commit.claimed_at.isoformat())
self._put_commit_row(conn, commit, (commit.claimed_at or datetime.now(UTC)).isoformat())

Comment thread src/cashet/redis_store.py
"created_at": commit.created_at.isoformat(),
"claimed_at": commit.claimed_at.isoformat(),
"last_accessed_at": datetime.now(UTC).isoformat(),
"last_accessed_at": commit.claimed_at.isoformat(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Medium Redis find_by_fingerprint still touches access time; backfill uses stale claimed_at

The PR removes the side-effect UPDATE in SQLite's find_by_fingerprint, but the Redis variant still calls _touch_commit which updates the access sorted set with current time, violating the intended backend consistency. Moreover, the backfill function _backfill_access_index reads last_accessed_at from the JSON, now set to claimed_at (which may be older than the original access score). During a backfill, it overwrites the access sorted set entry with this older timestamp, causing LRU eviction to consider still-active entries as old.

Comment thread src/cashet/store.py
conn = self._connect()
now = datetime.now(UTC)
now_iso = now.isoformat()
now_iso = datetime.now(UTC).isoformat()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Low Dead code: now_iso assignment no longer used

The assignment to now_iso is no longer used because the subsequent UPDATE statement was removed. This wastes a system call and can mislead maintainers.

Comment thread tests/test_store.py
ref = client.submit(work)
assert ref.load() == 42

conn = client.store._connect()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Low Tests use private _connect() to check timestamps

The new tests use client.store._connect() to directly query the database. This depends on internal implementation details (private method _connect) and may break if the store connection mechanism changes. Using a public API to retrieve the commit and check timestamps would be more robust.

@jolovicdev jolovicdev deleted the test/consolidate-timestamp-tracking branch May 10, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant