Add Sourcepoint first-party CMP integration#625
Add Sourcepoint first-party CMP integration#625ChristianPavilonis wants to merge 18 commits intomainfrom
Conversation
… through first-party proxy
… URLs to first-party paths
…visibility, improve docs
aram356
left a comment
There was a problem hiding this comment.
Summary
Adds a well-structured Sourcepoint CMP first-party proxy with four rewriting layers (HTML attributes, JS body, runtime config trap, client-side DOM guard). The implementation follows existing integration patterns closely, has 14 Rust + 6 JS tests, and ships disabled by default. A few items need attention before merge.
Blocking
🔧 wrench
take_body_str()panics on non-UTF-8 upstream responses:response.take_body_str()at line 400 will panic if the upstream CDN returns invalid UTF-8 with a JS Content-Type. Read bytes and attempt conversion with a fallback instead. (sourcepoint.rs:400)- CI failure —
format-typescript: Prettier check is failing ontest/integrations/sourcepoint/script_guard.test.ts. Must be fixed before merge.
❓ question
- Redirect
Locationheaders not rewritten: The JS body rewrite is gated onstatus == 200. If the CDN returns a 3xx redirect with aLocationpointing tocdn.privacy-mgmt.com, the browser follows it back to the third-party CDN, defeating the proxy. (sourcepoint.rs:394)
Non-blocking
🤔 thinking
- Upstream response headers forwarded verbatim:
Set-Cookieand other headers fromcdn.privacy-mgmt.compass through to the client on the non-rewritten path. Same as Didomi, but worth confirming this is intended. (sourcepoint.rs:418) - Regex matches mismatched quotes: The CDN URL pattern can match
'url"(single open, double close). Extremely unlikely in minified JS but could produce malformed output. (sourcepoint.rs:65)
♻️ refactor
- Redundant
rewrite_sdkguard: Checked in bothhandles_attribute()andrewrite()— the second is unreachable. (sourcepoint.rs:437) - Head injector config property names: Hardcoded Sourcepoint config keys could go stale. A test asserting known property names appear in the generated script would catch omissions. (
sourcepoint.rs:467)
🌱 seedling
- No body size limit: The body is read entirely into memory with no upper bound. A
Content-Lengthcheck would prevent OOM from unexpected large responses. (sourcepoint.rs:400) - Missing validation test for invalid
cdn_origin: No test proves thatregister()fails whencdn_originis not a valid URL. (sourcepoint.rs:153)
⛏ nitpick
- Manual
[sourcepoint]log prefix: Other integrations rely on thelogcrate's module path rather than a manual bracketed prefix. (sourcepoint.rs:355)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: FAIL (Prettier issue in
script_guard.test.ts) - format-docs: PASS
- browser integration tests: PASS
- integration tests: PASS
- CodeQL: FAIL (likely infra)
prk-Jr
left a comment
There was a problem hiding this comment.
Reviewed commit a211eb0. Three blocking issues require fixes before merge; the rest are non-blocking.
Blocking (must fix before merge)
B1 — CI: format-typescript still failing on current HEAD
The latest commit (a211eb02) re-introduced a Prettier violation in crates/js/lib/test/integrations/sourcepoint/script_guard.test.ts. CI's format-typescript check will fail on this commit. Fix with:
cd crates/js/lib && npm run format -- --write
then commit the result.
B2 — SSRF via cdn_origin (inline comment on line 96)
See inline comment. A syntactically valid but host-unrestricted cdn_origin allows proxying to any IP including 169.254.169.254 (cloud metadata). Needs a custom host validator.
B3 — PUT/PATCH unreachable (inline comment on line 337)
routes() only registers GET and POST; the PUT/PATCH arms in handle() are dead code. Either register the methods or remove the arms.
Important (non-blocking)
I1 — Single-quoted '/unified/' missed by SP_ORIGIN_UNIFIED_PATTERN (inline on line 81) — fix is a one-character change to the regex character class.
I2 — BackendConfig::from_url per request (inline on line 357) — pre-compute in new() following the pattern in aps.rs / prebid.rs.
I3 — PR checklist says "Uses tracing macros" but codebase uses log crate
The checklist item reads "Uses tracing macros (not println!)" but CLAUDE.md specifies log. The source correctly uses log::info! throughout. The checklist entry should read "Uses log macros (not println!)" to avoid confusion for future contributors.
I4 — JS-rewrite path drops all upstream headers (inline on line 403) — Response::new() discards Vary, CORS, and security headers. Start from response.clone_without_body() instead.
I5 — No test for build_target_url with a path-bearing cdn_origin (inline on line 337) — set_path silently drops any path prefix in cdn_origin. Should be tested and either fixed or documented as an explicit constraint.
Suggestions (non-blocking)
S1 — URL normalisation logic duplicated across Rust and TypeScript
parse_sourcepoint_url in sourcepoint.rs and normalizeSourcepointUrl in script_guard.ts implement identical protocol-relative URL normalisation in parallel. Not a defect, but a cross-reference comment in each file would prevent future fixes being applied to only one side.
S2 — Redundant rewrite_sdk guard in rewrite() (inline on line 437) — unreachable because handles_attribute already gates on it. Either remove with a comment or document the intentional belt-and-suspenders.
S3 — No test for single-quoted CDN URLs (inline on line 81)
S4 — Unchecked WASM build in PR checklist
The test plan has an unchecked - [ ] WASM build item, but CI's WASM build passes. The checkbox should be checked to accurately reflect build status.
S5 — register missing # Examples doc section (inline on line 314) — CLAUDE.md requires this for all public API functions.
- Replace #[validate(url)] with a custom validator that restricts cdn_origin to *.privacy-mgmt.com hosts, preventing SSRF via arbitrary origins (e.g. cloud metadata endpoints). - Remove unreachable PUT/PATCH arms from the request body match since routes() only registers GET and POST. - Fix Prettier formatting in script_guard.test.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace take_body_str() with take_body_bytes() + String::from_utf8() to avoid panicking on non-UTF-8 upstream responses. - Rewrite Location headers on 3xx redirects that point to cdn.privacy-mgmt.com so browsers stay on the first-party proxy. - Preserve upstream CORS headers on the JS-rewrite path instead of discarding them when building a fresh Response. - Extend SP_ORIGIN_UNIFIED_PATTERN to match both single- and double-quoted "/unified/" chunk paths, preserving the original quote character in the replacement. - Normalise log prefixes from [sourcepoint] to Sourcepoint: for consistency with APS/Prebid style. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove redundant rewrite_sdk check from rewrite() since handles_attribute() already gates on it; update test to verify the guard at the handles_attribute level. - Add # Examples section to register() per documentation standards. - Add tests for cdn_origin validation (rejects non-privacy-mgmt.com hosts, accepts valid origins). - Add test for single-quoted origin+'/unified/' rewrite pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review feedback addressedAll blocking and important findings have been fixed across three commits. Replies posted on each inline thread. Blocking (fixed)
Important non-blocking (fixed)
Suggestions (fixed)
Intentionally skipped (with rationale in thread replies)
All CI checks pass locally: |
Refactor normalizeSourcepointUrl to remove the bare-domain startsWith check that triggered CodeQL "Incomplete URL substring sanitization" alerts. The host === exact match was already the security boundary; now the normalization layer no longer references the CDN hostname at all, eliminating the static analysis finding. Add a Content-Length guard (5 MB) before reading upstream response bodies into memory for JavaScript rewriting, preventing unbounded memory consumption from unexpectedly large responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aram356
left a comment
There was a problem hiding this comment.
Summary
Well-executed revision — every prior blocker is addressed and the four-layer rewriting architecture (HTML attrs / JS body / window._sp_ trap / client DOM guard) is cleanly separated. Two remaining correctness issues in the upstream response handling need fixes before merge: the body-size guard is bypassed when Content-Length is absent, and the UTF-8 fallback path silently drops upstream headers. A few non-blocking refinements around redirect rewriting, config scope, and test coverage follow.
Blocking
🔧 wrench
- Body-size guard bypassed on chunked/unknown-length responses —
MAX_REWRITE_BODY_SIZEonly applies whenContent-Lengthis present; otherwisetake_body_bytes()reads unbounded (sourcepoint.rs:456-470) - UTF-8 fallback drops upstream headers — the fallback constructs a fresh
Responsewith only status + Content-Type, discarding Set-Cookie, CORS, Cache-Control, Vary, etc., diverging from the passthrough path (sourcepoint.rs:480-488)
Non-blocking
♻️ refactor
- Redirect
Locationrewrite is a literalString::replace— misses protocol-relative//cdn.privacy-mgmt.com/...values (sourcepoint.rs:430-438) validate_cdn_origindoes not check scheme —ftp://cdn.privacy-mgmt.compasses validation and only fails at the first proxied request (sourcepoint.rs:135-150)
🤔 thinking
- Config/rewriter host scope mismatch —
validate_cdn_originaccepts*.privacy-mgmt.combut rewriters hardcodecdn.privacy-mgmt.com; configuring a different subdomain silently half-enables the integration (sourcepoint.rs:143, 69, 197, 598) - Rewritten-JS path hard-codes
Cache-Control— diverges from the passthrough path'sapply_cache_headerspolicy without explanation (sourcepoint.rs:498-501) - Property trap only catches reassignment of
window._sp_— nested mutation (window._sp_.config.X = ...) is not patched; alsos.replace()in the JS helper is first-occurrence only (sourcepoint.rs:568-600)
🌱 seedling
copy_headersforwardsAuthorization— credential-leak footgun if publishers reuse bearer tokens across origins (sourcepoint.rs:222-233)- No direct test for
handle()— redirect rewriting, UTF-8 fallback, Content-Length guard, Content-Type gating all lack unit coverage (sourcepoint.rs:372-522)
👍 praise
- Solid response to the prior review — every prior 🔧/❓ addressed with well-chosen fixes; four-layer rewriting architecture is cleanly separated and well documented
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
…t rewrite - Skip JS rewrite when Content-Length is absent (chunked/unknown) to prevent unbounded memory reads - Reuse original response on UTF-8 fallback to preserve upstream headers - Extract redirect Location rewriting into a URL-parsing helper that handles absolute, protocol-relative, and relative redirects - Tighten cdn_origin validation to exactly cdn.privacy-mgmt.com and reject non-HTTP(S) schemes - Drop Authorization from forwarded headers to prevent credential leakage - Document Cache-Control divergence and property trap limitations - Add 7 new tests for validation and redirect rewriting
aram356
left a comment
There was a problem hiding this comment.
Summary
Round-3 blockers are all correctly addressed — UTF-8 fallback now reuses the original response (preserving headers), Content-Length-missing short-circuits the rewrite before take_body_bytes, rewrite_redirect_location handles absolute / protocol-relative / relative via Url::join, validate_cdn_origin checks scheme and pins the host, and Authorization is excluded from forwarded headers. No remaining 🔧/❓ findings. Submitting as change-request per author request; the items below are refinements (🤔/♻️/🌱/⛏) rather than correctness blockers — each can be resolved or deferred to a follow-up at the author's discretion.
Non-blocking
♻️ refactor
apply_cache_headersinvoked at five return sites — redirect branch, CL-too-large branch, CL-missing branch, UTF-8-fallback branch, and the fallthrough (sourcepoint.rs:476, 505, 513, 528, 567). Restructuringhandle()around a single exit point (or wrapping the proxiedResponsein a post-handle adapter) would remove the per-branch bookkeeping. Low priority.
🌱 seedling
- No direct test for
handle()(sourcepoint.rs:414-569) — redirect rewriting, UTF-8 fallback, CL-missing skip, CL-too-large skip, and non-JS passthrough branches all lack unit coverage. Acknowledged in the round-3 response; tracking here for follow-up. - CSP interaction not documented (
docs/guide/integrations/sourcepoint.md) — publishers withContent-Security-Policy: script-src https://cdn.privacy-mgmt.comwill see rewritten first-party scripts blocked the moment they enable the integration. A note under "HTML Rewriting" calling out that CSP must allow the first-party host would save an incident report.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- browser integration tests: PASS
- CodeQL: PASS
|
Implemented Sourcepoint follow-up updates and pushed commit. What I changed:
I also resolved remaining open review threads and re-requested review from @prk-Jr. Validation:
Commit: e2fdb0c |
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
The Sourcepoint integration covers the expected rewrite layers, but I found two blocking issues around cookie semantics plus one follow-up consistency concern in the JS guard.
Blocking
🔧 wrench
-
Sourcepoint cookie state never makes the round trip: the proxy does not forward any
Cookieheader upstream, so Sourcepoint cookies previously set on the first-party endpoint are never sent back to Sourcepoint. That breaks the mainbaseEndpoint/ CNAME flow and authenticated-consent-style cookie-backed state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:256) -
Cookie-setting Sourcepoint responses are still marked public-cacheable: both the passthrough path and the rewritten-JS path can return
Cache-Control: public, max-age=...even when upstream sendsSet-Cookie. If Sourcepoint uses server-set cookies on the first-party endpoint, that risks caching user-specific consent state. (crates/trusted-server-core/src/integrations/sourcepoint.rs:269,crates/trusted-server-core/src/integrations/sourcepoint.rs:377)
Non-blocking
🤔 thinking
- JS and Rust host checks diverge on explicit ports: the client-side DOM guard compares
URL.host, while the Rust rewriter compares hostname-only. That creates inconsistent behavior for URLs likehttps://cdn.privacy-mgmt.com:443/.... (crates/js/lib/src/integrations/sourcepoint/script_guard.ts:35,crates/trusted-server-core/src/integrations/sourcepoint.rs:228)
CI Status
- fmt: PASS
- clippy: FAIL
- rust tests: FAIL
- js tests: PASS
- integration artifact prep: FAIL
|
Addressed the latest review round in commit 1a40868 and pushed to feature/sourcepoint-integration. What changed:
Validation:
I replied to and resolved the remaining review threads and re-requested review from @prk-Jr and @aram356. |
…ab/trusted-server into feature/sourcepoint-integration
aram356
left a comment
There was a problem hiding this comment.
Summary
Four-layer rewriting architecture (HTML attrs / JS body regex / window._sp_ trap / client DOM guard) is well-structured, and most prior 🔧 / ❓ findings have been addressed across rounds 1–4 (SSRF guard, UTF-8 safety, redirect rewriting, cookie allowlist, no-store on cookie responses, hostname-only host checks, scheme/host/path/query/fragment validation). However, a stale merge from main introduced a hard build break that fails three CI jobs.
Blocking
🔧 wrench
- Build break:
IntegrationProxy::handlesignature drifted from the trait —#581addedservices: &RuntimeServicesto the trait; the merge from main did not update Sourcepoint's impl.cargo fmt,cargo test, andprepare integration artifactsall fail withE0050. (sourcepoint.rs:540-549) - Use platform-agnostic
client_ipfromRuntimeServices—copy_headerscalls Fastly-specificoriginal_req.get_client_ip_addr(). WithRuntimeServicesnow plumbed through, switch toservices.client_info().client_ipto match Didomi's pattern and keeptrusted-server-coreportable. (sourcepoint.rs:269-272)
Non-blocking
🤔 thinking
- Property-trap config field allowlist is hardcoded and silently goes stale — only
baseEndpoint,mmsDomain,wrapperAPIOrigin,cmpOrigin, andmetricUrlare patched. New Sourcepoint fields would silently leak. A snapshot test asserting the exact patched set would force review when the list changes. (sourcepoint.rs:735-742) auth_cookie_nameis unvalidated — empty string, whitespace, or;/=characters are accepted. Empty string especially: a malformed cookie pair=valueparses to name"", whichauth_cookie_name = ""would forward. Add#[validate](non-empty trimmed, length ≤ 64,[A-Za-z0-9_-]). (sourcepoint.rs:124-129)
♻️ refactor
is_likely_javascript_pathmatches non-JS API paths —/wrapper/v2/*is JSON but is treated as JS-likely, forcingAccept-Encoding: identityfor every CMP API hit. Tightening the heuristic (e.g. only.jsand/unified/) preserves the optimisation only where applicable. (sourcepoint.rs:431-433)
📌 out of scope (carryover from earlier rounds, still open)
- No direct unit test for
handle()— redirect rewriting, UTF-8 fallback,Content-Lengthmissing/too-large branches still lack coverage. apply_cache_headersinvoked at five return sites inhandle()— restructuring around a single exit point would remove the per-branch bookkeeping.- CSP interaction undocumented — publishers with
Content-Security-Policy: script-src https://cdn.privacy-mgmt.comwill see rewritten first-party scripts blocked the moment they enable rewriting. A note indocs/guide/integrations/sourcepoint.mdunder "HTML Rewriting" would save an incident report.
CI Status
- cargo fmt: FAIL (caused by trait-signature drift)
- cargo test: FAIL (caused by trait-signature drift)
- prepare integration artifacts: FAIL (caused by trait-signature drift)
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- CodeQL: PASS
- Analyze (rust / actions / javascript-typescript): PASS
| let method = req.get_method().clone(); | ||
| let target_path = Self::strip_cdn_prefix(&path).ok_or_else(|| { | ||
| Report::new(Self::error(format!("Unknown Sourcepoint route: {path}"))) | ||
| })?; |
There was a problem hiding this comment.
🔧 wrench — Build break: IntegrationProxy::handle signature drifted from the trait.
#581 (Add PlatformHttpClient and PlatformBackend traits) landed on main while this PR was open and added a services: &RuntimeServices parameter to IntegrationProxy::handle. The merge from main on e296d627 did not update Sourcepoint's impl, leaving 3 params where 4 are required.
error[E0050]: method `handle` has 3 parameters but the declaration in trait
`IntegrationProxy::handle` has 4
--> crates/trusted-server-core/src/integrations/sourcepoint.rs:541:9
Three CI jobs (cargo fmt, cargo test, prepare integration artifacts) all fail on this. Fix:
async fn handle(
&self,
_settings: &Settings,
services: &RuntimeServices,
req: Request,
) -> Result<Response, Report<TrustedServerError>> {and use crate::platform::RuntimeServices; at the top.
| fn copy_headers(&self, original_req: &Request, proxy_req: &mut Request) { | ||
| if let Some(client_ip) = original_req.get_client_ip_addr() { | ||
| proxy_req.set_header("X-Forwarded-For", client_ip.to_string()); | ||
| } |
There was a problem hiding this comment.
🔧 wrench — Use platform-agnostic client_ip from RuntimeServices (companion to the trait-signature fix).
copy_headers calls original_req.get_client_ip_addr(), which is a Fastly-specific API. Now that RuntimeServices is plumbed through handle, switch to services.client_info().client_ip and pass it as a parameter to copy_headers, matching the pattern Didomi adopted in didomi.rs:200-226. This keeps trusted-server-core portable across platform backends.
// In handle():
self.copy_headers(services.client_info().client_ip, &req, &mut proxy_req);
// In copy_headers signature:
fn copy_headers(&self, client_ip: Option<IpAddr>, original_req: &Request, proxy_req: &mut Request) {
if let Some(ip) = client_ip {
proxy_req.set_header("X-Forwarded-For", ip.to_string());
}
// ...
}| "if(typeof o.config.cmpOrigin===\"string\")o.config.cmpOrigin=r(o.config.cmpOrigin);", | ||
| "}}", | ||
| "if(typeof o.metricUrl===\"string\")o.metricUrl=r(o.metricUrl);", | ||
| "return o", |
There was a problem hiding this comment.
🤔 thinking — Property-trap config field allowlist is hardcoded and silently goes stale.
The p() patcher rewrites only config.{baseEndpoint, mmsDomain, wrapperAPIOrigin, cmpOrigin} and metricUrl. If Sourcepoint adds a new URL field (e.g. events.endpoint, a cdnRoot), the rewrite silently misses it and the page leaks third-party requests. Given that this reaches into a vendor SDK we don't control, a snapshot test asserting the exact set of patched properties would force a code review whenever the list changes — re-raising aram356's earlier point. Consider also documenting how a publisher reports a missed field.
| /// Sourcepoint's standard cookie set is allowlisted automatically. | ||
| /// Configure this only when the CMP uses a custom `authCookie` name and | ||
| /// that cookie must round-trip through the first-party proxy. | ||
| pub auth_cookie_name: Option<String>, |
There was a problem hiding this comment.
🤔 thinking — auth_cookie_name is unvalidated.
Empty string, whitespace, or ; / = characters in auth_cookie_name would all be accepted at config load. Empty string in particular is problematic: a malformed cookie pair like =value parses to a name of "", which should_forward_sourcepoint_cookie("") would forward when auth_cookie_name = "". Suggest adding a #[validate] rule (non-empty after trim, length ≤ 64, [A-Za-z0-9_-]). Low likelihood but cheap to lock down.
| /// `Accept-Encoding: identity` optimisation for that request. | ||
| fn is_likely_javascript_path(path: &str) -> bool { | ||
| path.ends_with(".js") || path.starts_with("/unified/") || path.starts_with("/wrapper/") | ||
| } |
There was a problem hiding this comment.
♻️ refactor — is_likely_javascript_path matches non-JS API paths.
/wrapper/v2/messages is a JSON API but is treated as JS-likely, causing Accept-Encoding: identity for the upstream request. The runtime check (is_javascript_response based on Content-Type) correctly skips the rewrite, so this is not a correctness bug — but uncompressed JSON for /wrapper/v2/* is a mild perf cost on every CMP hit. Tightening the heuristic (e.g. only .js and /unified/) preserves the optimisation only where it can be applied.
Summary
cdn.privacy-mgmt.comandgeo.privacymanager.iotraffic through Trusted Server, eliminating third-party requests for consent management assets.Changes
crates/trusted-server-core/src/integrations/sourcepoint.rsrewrite_script_content), head injector (window._sp_property trap), Accept-Encoding scoping, backend selection, and 14 unit testscrates/trusted-server-core/src/integrations/mod.rssourcepointmodule and wire it into the integration registrycrates/js/lib/src/integrations/sourcepoint/index.tscrates/js/lib/src/integrations/sourcepoint/script_guard.ts<script>/<link>elements pointing at Sourcepoint CDN and rewrites them to first-party pathscrates/js/lib/test/integrations/sourcepoint/script_guard.test.tsdocs/guide/integrations/sourcepoint.mddocs/guide/integrations-overview.mdtrusted-server.toml[integrations.sourcepoint]config stanza (disabled by default)Closes
Closes #145
Closes #344
Closes #345
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatcargo build --package trusted-server-adapter-fastly --release --target wasm32-wasip1fastly compute serveChecklist
unwrap()in production code — useexpect("should ...")logmacros (notprintln!)