fix: Keep named request queues across runs#2015
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2015 +/- ##
==========================================
- Coverage 93.35% 93.33% -0.02%
==========================================
Files 179 179
Lines 12482 12488 +6
==========================================
+ Hits 11652 11656 +4
- Misses 830 832 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes BasicCrawler.run() so the default implicit purge on consecutive runs does not wipe a user-supplied named RequestQueue (including when wrapped by ThrottlingRequestManager), aligning run() behavior with the “named storages are persistent” contract used elsewhere in the storage layer.
Changes:
- Update
BasicCrawler.run()purge logic to skip purging when the effective queue is a namedRequestQueue(with special handling forThrottlingRequestManager). - Clarify the
purge_request_queuedocstring to document the named-queue exemption. - Add two regression tests covering consecutive runs with a named queue directly and with a named queue wrapped by
ThrottlingRequestManager.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/crawlee/crawlers/_basic/_basic_crawler.py |
Skips implicit purge for named request queues (including when wrapped in ThrottlingRequestManager). |
tests/unit/crawlers/_basic/test_basic_crawler.py |
Adds regression tests ensuring named queues survive consecutive run() calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Problem
A second
crawler.run()with the defaultpurge_request_queue=Truepurged the request manager unconditionally, including a user-supplied namedRequestQueue. Named storages are documented as persistent, andStorageClient._purge_if_neededalready exempts them from implicit purging.Changes
The implicit purge in
run()now skips named queues, including a named queue wrapped in aThrottlingRequestManager.Verification
Two regression tests that fail without the fix: a named queue survives a second
run(), and the same holds when the named queue is wrapped in aThrottlingRequestManager.Behavior change
This intentionally changes observable behavior, since the old behavior was the bug: named request queues survive repeated runs.