Summary
Three sites in this repo perform a "look up a resource by some identifier" operation by issuing a single GET against a CSAPI list endpoint and iterating the returned page client-side. None of them follow the next HATEOAS link that the OGC API — Connected Systems pagination contract requires. This is a latent correctness bug in the bootstrap idempotency layer that becomes a real correctness bug whenever the server (a) does not honor the query filter (?uid=, ?outputName=, etc.) and (b) holds more items than fit on the first page.
The current mitigation — &limit=1000 added in commit 92f584b5 — papers over the symptom for fleets with ≤1000 items per collection. It is documented in the commit message itself as a Go-server-pagination workaround. It does not fix the underlying issue and silently breaks at scale.
This issue captures the bug, the failure modes, the affected sites, and a recommended direction. It does not prescribe an implementation — that's the maintainer's call.
Background — why this matters in a publisher context
OSHConnect-Python is, primarily, a publisher fleet: long-running services that POST observations into a CSAPI server, fronted by an idempotent bootstrap phase that ensures procedures, systems, datastreams, and deployments exist before publishers start. The bootstrap is meant to be safely re-runnable on every deploy / docker compose up. That safety hinges on find_by_uid (and its siblings) correctly answering "does this resource already exist?".
When find_by_uid returns a false negative — "no, the resource doesn't exist" — for a resource that in fact exists, the ensure_* family attempts to recreate it. On a strict server this returns HTTP 409 and api_post raises RuntimeError, aborting bootstrap. On a tolerant server it silently creates a duplicate UID and corrupts the deployment. Either outcome breaks the idempotency contract that the publisher fleet's deploy automation depends on.
This is therefore a deploy-time correctness bug in publisher infrastructure, not a read-side display bug. That distinction matters: the consequence of getting it wrong is duplicated systems / orphaned datastreams / non-deterministic re-deploys, not a missing row in some UI.
Affected sites — three places, one shape
| # |
File |
Function |
Endpoint pattern |
Single page? |
Filter relied on |
| 1 |
publishers/bootstrap_helpers.py |
find_by_uid(base_url, auth, collection, uid) |
{collection}?uid={uid}&limit=1000 |
Yes — single GET, client-side filter loop |
?uid= |
| 2 |
publishers/bootstrap_helpers.py |
find_datastream(system_id, output_name) |
systems/{id}/datastreams |
Yes — single GET, iterates result["items"] |
?outputName= (not used) |
| 3 |
src/oshconnect/base.py |
_discover_system_ds(...) |
retrieve_resource(APIResourceTypes.SYSTEM, ...) items |
Yes — walks raw_res.json().get("items", []) once |
none |
All three sites:
- Issue exactly one HTTP GET.
- Iterate the returned page client-side to find the matching item.
- Return
None / raise not-found if the item isn't in that page.
- Do not read or follow
links[?(@.rel=='next')].
Repo-wide grep for next / rel="next" / rel='next' / paginate / pagination-link traversal: zero matches in any code path. The codebase has no concept of pagination today.
find_by_uid — verbatim current implementation
publishers/bootstrap_helpers.py:
_uid_cache: dict[str, str] = {}
def find_by_uid(base_url: str, auth: str, collection: str, uid: str) -> str | None:
"""Find a resource by UID in a collection. Returns server ID or None."""
cache_key = f"{collection}:{uid}"
if cache_key in _uid_cache:
return _uid_cache[cache_key]
result = api_get(base_url, f"{collection}?uid={uid}&limit=1000", auth)
if result:
# Support both GeoJSON (features) and flat JSON (items) collections
items = result.get("items", []) or result.get("features", [])
for item in items:
props = item.get("properties", item)
if props.get("uid") == uid:
item_id = item.get("id") or props.get("id")
if item_id:
_uid_cache[cache_key] = str(item_id)
return str(item_id)
return None
The ?uid={uid} filter is intended to make the server return at most one match (in which case pagination is irrelevant). The &limit=1000 is the safety net for when the server ignores ?uid=. Both assumptions can fail simultaneously.
Failure-mode matrix
Server honors ?uid= filter? |
Collection size |
Result |
| Yes |
any |
✅ Works correctly. Filter narrows to 0/1 items; pagination is moot. |
| No |
≤ 1000 items |
✅ Works because of the workaround. Current state of fleets running against the Go CSAPI server. |
| No |
> 1000 items |
❌ Silent false-negative. find_by_uid returns None for resources that exist on the server. ensure_* then tries to recreate the resource, leading to either HTTP 409 → RuntimeError (strict server) or silent duplicate-UID creation (tolerant server). Bootstrap idempotency contract broken. |
| Yes, but server ignores it under load / for nested collections |
any |
❌ Same as above. |
The third row is dormant in current production deployments because no fleet has crossed 1000 items per collection. It is not absent — the publisher fleet pattern is designed to scale (Fort Huachuca v2.3 scenarios, multi-tenant deployments, bigger sensor manifests). The bug fires the moment a collection grows past the magic number.
The same matrix applies, mutatis mutandis, to find_datastream (collection: per-system datastreams; failure when a system has many outputs) and _discover_system_ds (collection: top-level systems; failure on busy multi-tenant servers).
How the workaround was introduced
Commit 92f584b5 — "fix: add limit=1000 to find_by_uid for Go server pagination" — 2026-04-17. Diff: +1 / -1, single line. The commit message is candid that the change is a workaround for a server-pagination behavior, not a correctness fix. This issue exists to record that fact and propose closing the gap properly.
A related server-side issue (connected-systems-go#5 — "Go server ignores ?uid=") covers the immediate trigger of the find_by_uid failure on the new Go CSAPI server. If/when that lands, find_by_uid becomes correct again for collections of any size on that one server, because the filter narrows to 0/1 items.
The right fix on the Python side is still to walk next links, for two reasons:
- Filter quirks are a per-server reality. Some other CSAPI server tomorrow will have its own filter coverage gap, throttling, partial filter-honoring under load, or simply different parsing of
?uid=. Without server-side filtering, pagination is the spec-defined path.
- The OGC pagination contract is the same regardless of filtering. OGC 23-001 §7.6 defines
limit as optional with a server-defined default and next HATEOAS links as the conformance-required mechanism for retrieving subsequent pages. A correct OGC client walks links; it does not assume a single page.
So this fix is not contingent on the Go server fix. They're complementary; both should land, and either one alone is insufficient for full correctness.
Recommended direction
The shape of the fix is the maintainer's call. What follows is one direction that fits the existing module structure with minimal surface change.
Add a small page-iteration helper to publishers/bootstrap_helpers.py (and a sibling to src/oshconnect/base.py if the library should not depend on the publisher module — currently they don't share an HTTP layer; bootstrap_helpers.py uses stdlib urllib, src/oshconnect/api_helpers.py uses requests).
Sketch — urllib-side, for bootstrap_helpers.py:
def _iter_pages(base_url: str, path: str, auth: str, *, max_pages: int = 100):
"""
Yield items from a CSAPI list endpoint, walking `next` HATEOAS links.
Yields items one at a time across all pages. Caller is responsible for
early termination once the desired item is found.
Args:
base_url: Server base URL.
path: Collection path (e.g. 'systems?uid=foo').
auth: Basic-auth header value.
max_pages: Safety cap against pathological circular link chains.
Raises:
RuntimeError: If max_pages is exceeded.
"""
url = path # api_get composes with base_url
pages_seen = 0
seen_urls: set[str] = set()
while url:
if pages_seen >= max_pages:
raise RuntimeError(
f"_iter_pages exceeded {max_pages} pages for {path}; "
"possible circular `next` chain"
)
if url in seen_urls:
raise RuntimeError(f"_iter_pages saw a circular `next` link at {url}")
seen_urls.add(url)
result = api_get(base_url, url, auth)
if not result:
return
items = result.get("items", []) or result.get("features", [])
for item in items:
yield item
pages_seen += 1
# Find the `next` link.
next_link = next(
(link for link in (result.get("links") or []) if link.get("rel") == "next"),
None,
)
if not next_link or not next_link.get("href"):
return
# `next` href may be absolute or path-relative; normalize to a path the
# existing api_get can consume.
url = _normalize_next_url(base_url, next_link["href"])
Then find_by_uid collapses to:
def find_by_uid(base_url: str, auth: str, collection: str, uid: str) -> str | None:
cache_key = f"{collection}:{uid}"
if cache_key in _uid_cache:
return _uid_cache[cache_key]
# Keep `?uid={uid}` so a filter-aware server can short-circuit;
# walk pages so a filter-ignoring server still works.
for item in _iter_pages(base_url, f"{collection}?uid={uid}", auth):
props = item.get("properties", item)
if props.get("uid") == uid:
item_id = item.get("id") or props.get("id")
if item_id:
_uid_cache[cache_key] = str(item_id)
return str(item_id)
return None
And find_datastream similarly switches to _iter_pages. The library-side _discover_system_ds either uses a requests-based sibling helper or is refactored to share a thin wrapper.
Notes on the sketch:
- Drops the magic
limit=1000 entirely. The server's default page size is fine; iteration handles whatever it returns.
max_pages and seen_urls are defense against pathological servers (circular next chains have been observed in non-OGC paginated APIs; cheap insurance).
- Caller iterates lazily; can break out as soon as the target item is found, so for a filter-honoring server the cost is one HTTP request.
- Negative-result caching:
_uid_cache currently caches only successful lookups. Worth a comment that this is intentional — caching None would be incorrect across redeploys where the resource is created out-of-band.
Other things worth touching while we're here (optional)
find_datastream: same fix shape, same module, costs almost nothing extra to do in the same PR. Recommend doing it together so all three sites are consistent.
_discover_system_ds: library-side sibling; uses requests not urllib. Either (a) live with two _iter_pages implementations (one per HTTP layer) or (b) take this opportunity to move the publisher fleet onto requests (modest dependency change, library already takes a requests dep). Either is defensible; (a) is the smaller diff.
- Comment on
_uid_cache semantics explaining why only positive results are cached, so future contributors don't "fix" it.
- Bootstrap-test fixture: consider adding a fixture / fake server (or a recorded HTTP cassette via
vcrpy/responses) that returns multi-page responses so the iteration logic is exercised in unit tests. Without this, the bug is invisible to CI on a small fixture corpus — exactly how it slipped past in the first place.
What's intentionally NOT in scope for this issue
- ❌ Adopting a new HTTP client (e.g.
httpx) — out of scope; orthogonal architectural choice.
- ❌ Adding async support to the publisher fleet — out of scope.
- ❌ Auto-retry / exponential backoff at the page-walk level —
api_get already retries via _with_retry; pagination is a separate concern.
- ❌ A general-purpose CSAPI Python client library — the existing
src/oshconnect is what it is; this issue only fixes the three concrete bugs.
- ❌ Changes to the publishers themselves (
iss_publisher.py, etc.) — they consume the bootstrap output; once bootstrap is correct, they're unaffected.
Reproduction / how to confirm
- Stand up (or point at) a CSAPI server that does not honor
?uid= filtering. The current Go CSAPI server fits — see connected-systems-go#5.
- Pre-populate the
systems collection with > 1000 systems (or temporarily set the server's default limit to a small value, e.g. 10, and pre-populate > 10).
- Run
python -m publishers.iss.bootstrap_iss against it. Observe that find_by_uid returns None for systems that exist beyond the first page, and ensure_system then either fails with HTTP 409 or silently creates a duplicate.
Once the fix lands, the same scenario should bootstrap idempotently with no duplicates and no 409s.
Severity / risk
Medium. Currently latent — papers over fine for current fleet sizes — but:
- Three sites of the same shape, suggesting a missing concept rather than a one-off bug.
- The workaround is documented as a workaround in the commit message itself.
- The failure mode (silent duplicates / failed redeploys) is in deploy automation, where silent failures are especially scary.
- Trivial to fix relative to consequence at scale.
References
| # |
Source |
What it provides |
| 1 |
publishers/bootstrap_helpers.py — find_by_uid |
Site #1 |
| 2 |
publishers/bootstrap_helpers.py — find_datastream |
Site #2 |
| 3 |
src/oshconnect/base.py — _discover_system_ds |
Site #3 |
| 4 |
Commit 92f584b5 |
Origin of the limit=1000 workaround |
| 5 |
connected-systems-go#5 |
Server-side complement: Go CSAPI server ignores ?uid= |
| 6 |
OGC 23-001 §7.6 |
OGC API — Connected Systems pagination contract: limit is server-default; next link is the conformance-required mechanism |
| 7 |
OS4CSAPI/ogc-client-CSAPI_2#167 |
TypeScript client companion: list methods will document pagination contract in JSDoc |
| 8 |
OS4CSAPI/ogc-client-CSAPI_2#170 |
TypeScript client deferred enhancement: async-iterator helper that walks next links — analog of the _iter_pages sketch above, in TypeScript |
Summary
Three sites in this repo perform a "look up a resource by some identifier" operation by issuing a single GET against a CSAPI list endpoint and iterating the returned page client-side. None of them follow the
nextHATEOAS link that the OGC API — Connected Systems pagination contract requires. This is a latent correctness bug in the bootstrap idempotency layer that becomes a real correctness bug whenever the server (a) does not honor the query filter (?uid=,?outputName=, etc.) and (b) holds more items than fit on the first page.The current mitigation —
&limit=1000added in commit92f584b5— papers over the symptom for fleets with ≤1000 items per collection. It is documented in the commit message itself as a Go-server-pagination workaround. It does not fix the underlying issue and silently breaks at scale.This issue captures the bug, the failure modes, the affected sites, and a recommended direction. It does not prescribe an implementation — that's the maintainer's call.
Background — why this matters in a publisher context
OSHConnect-Python is, primarily, a publisher fleet: long-running services that POST observations into a CSAPI server, fronted by an idempotent bootstrap phase that ensures procedures, systems, datastreams, and deployments exist before publishers start. The bootstrap is meant to be safely re-runnable on every deploy /
docker compose up. That safety hinges onfind_by_uid(and its siblings) correctly answering "does this resource already exist?".When
find_by_uidreturns a false negative — "no, the resource doesn't exist" — for a resource that in fact exists, theensure_*family attempts to recreate it. On a strict server this returns HTTP 409 andapi_postraisesRuntimeError, aborting bootstrap. On a tolerant server it silently creates a duplicate UID and corrupts the deployment. Either outcome breaks the idempotency contract that the publisher fleet's deploy automation depends on.This is therefore a deploy-time correctness bug in publisher infrastructure, not a read-side display bug. That distinction matters: the consequence of getting it wrong is duplicated systems / orphaned datastreams / non-deterministic re-deploys, not a missing row in some UI.
Affected sites — three places, one shape
publishers/bootstrap_helpers.pyfind_by_uid(base_url, auth, collection, uid){collection}?uid={uid}&limit=1000?uid=publishers/bootstrap_helpers.pyfind_datastream(system_id, output_name)systems/{id}/datastreamsresult["items"]?outputName=(not used)src/oshconnect/base.py_discover_system_ds(...)retrieve_resource(APIResourceTypes.SYSTEM, ...)itemsraw_res.json().get("items", [])onceAll three sites:
None/ raise not-found if the item isn't in that page.links[?(@.rel=='next')].Repo-wide grep for
next/rel="next"/rel='next'/paginate/ pagination-link traversal: zero matches in any code path. The codebase has no concept of pagination today.find_by_uid— verbatim current implementationpublishers/bootstrap_helpers.py:The
?uid={uid}filter is intended to make the server return at most one match (in which case pagination is irrelevant). The&limit=1000is the safety net for when the server ignores?uid=. Both assumptions can fail simultaneously.Failure-mode matrix
?uid=filter?find_by_uidreturnsNonefor resources that exist on the server.ensure_*then tries to recreate the resource, leading to either HTTP 409 →RuntimeError(strict server) or silent duplicate-UID creation (tolerant server). Bootstrap idempotency contract broken.The third row is dormant in current production deployments because no fleet has crossed 1000 items per collection. It is not absent — the publisher fleet pattern is designed to scale (Fort Huachuca v2.3 scenarios, multi-tenant deployments, bigger sensor manifests). The bug fires the moment a collection grows past the magic number.
The same matrix applies, mutatis mutandis, to
find_datastream(collection: per-system datastreams; failure when a system has many outputs) and_discover_system_ds(collection: top-level systems; failure on busy multi-tenant servers).How the workaround was introduced
Commit
92f584b5— "fix: add limit=1000 to find_by_uid for Go server pagination" — 2026-04-17. Diff:+1 / -1, single line. The commit message is candid that the change is a workaround for a server-pagination behavior, not a correctness fix. This issue exists to record that fact and propose closing the gap properly.Defense-in-depth — independent of
connected-systems-go#5A related server-side issue (
connected-systems-go#5— "Go server ignores?uid=") covers the immediate trigger of thefind_by_uidfailure on the new Go CSAPI server. If/when that lands,find_by_uidbecomes correct again for collections of any size on that one server, because the filter narrows to 0/1 items.The right fix on the Python side is still to walk
nextlinks, for two reasons:?uid=. Without server-side filtering, pagination is the spec-defined path.limitas optional with a server-defined default andnextHATEOAS links as the conformance-required mechanism for retrieving subsequent pages. A correct OGC client walks links; it does not assume a single page.So this fix is not contingent on the Go server fix. They're complementary; both should land, and either one alone is insufficient for full correctness.
Recommended direction
Add a small page-iteration helper to
publishers/bootstrap_helpers.py(and a sibling tosrc/oshconnect/base.pyif the library should not depend on the publisher module — currently they don't share an HTTP layer;bootstrap_helpers.pyuses stdliburllib,src/oshconnect/api_helpers.pyusesrequests).Sketch —
urllib-side, forbootstrap_helpers.py:Then
find_by_uidcollapses to:And
find_datastreamsimilarly switches to_iter_pages. The library-side_discover_system_dseither uses arequests-based sibling helper or is refactored to share a thin wrapper.Notes on the sketch:
limit=1000entirely. The server's default page size is fine; iteration handles whatever it returns.max_pagesandseen_urlsare defense against pathological servers (circularnextchains have been observed in non-OGC paginated APIs; cheap insurance)._uid_cachecurrently caches only successful lookups. Worth a comment that this is intentional — cachingNonewould be incorrect across redeploys where the resource is created out-of-band.Other things worth touching while we're here (optional)
find_datastream: same fix shape, same module, costs almost nothing extra to do in the same PR. Recommend doing it together so all three sites are consistent._discover_system_ds: library-side sibling; usesrequestsnoturllib. Either (a) live with two_iter_pagesimplementations (one per HTTP layer) or (b) take this opportunity to move the publisher fleet ontorequests(modest dependency change, library already takes arequestsdep). Either is defensible; (a) is the smaller diff._uid_cachesemantics explaining why only positive results are cached, so future contributors don't "fix" it.vcrpy/responses) that returns multi-page responses so the iteration logic is exercised in unit tests. Without this, the bug is invisible to CI on a small fixture corpus — exactly how it slipped past in the first place.What's intentionally NOT in scope for this issue
httpx) — out of scope; orthogonal architectural choice.api_getalready retries via_with_retry; pagination is a separate concern.src/oshconnectis what it is; this issue only fixes the three concrete bugs.iss_publisher.py, etc.) — they consume the bootstrap output; once bootstrap is correct, they're unaffected.Reproduction / how to confirm
?uid=filtering. The current Go CSAPI server fits — seeconnected-systems-go#5.systemscollection with > 1000 systems (or temporarily set the server's defaultlimitto a small value, e.g. 10, and pre-populate > 10).python -m publishers.iss.bootstrap_issagainst it. Observe thatfind_by_uidreturnsNonefor systems that exist beyond the first page, andensure_systemthen either fails with HTTP 409 or silently creates a duplicate.Once the fix lands, the same scenario should bootstrap idempotently with no duplicates and no 409s.
Severity / risk
Medium. Currently latent — papers over fine for current fleet sizes — but:
References
publishers/bootstrap_helpers.py—find_by_uidpublishers/bootstrap_helpers.py—find_datastreamsrc/oshconnect/base.py—_discover_system_ds92f584b5limit=1000workaroundconnected-systems-go#5?uid=limitis server-default;nextlink is the conformance-required mechanismOS4CSAPI/ogc-client-CSAPI_2#167OS4CSAPI/ogc-client-CSAPI_2#170nextlinks — analog of the_iter_pagessketch above, in TypeScript