Skip to content

Split parking + platforms into dedicated modules#2

Open
sitney wants to merge 1 commit intomainfrom
consolidate-parking-platforms
Open

Split parking + platforms into dedicated modules#2
sitney wants to merge 1 commit intomainfrom
consolidate-parking-platforms

Conversation

@sitney
Copy link
Copy Markdown
Contributor

@sitney sitney commented Apr 22, 2026

Summary

  • Split parking / for-sale host list out of classifier.py into a new domain_classifier.parking module with 5 categorised subsets (for-sale marketplaces, registrar parking, monetisation networks, traffic arbitrage, content-farm affiliates) plus a URL_PARKING_PATTERNS list for host+query rules
  • Add new domain_classifier.platforms module holding the hosting/SaaS platform data that Domain Intelligence was duplicating in redirect_detector.py (now single source of truth)
  • New URL_PARKING_PATTERNS entry: ("help.com.au", "?d=") — Servers Australia parking lander. Without this rule the redirect masqueraded as an acquisition signal in DI
  • Dedupe homestead.com out of platforms (it was in both lists; parking wins on overlap)
  • Restores the richer parking coverage (105 hosts) that existed before the ml/content/fetch extraction refactor trimmed it to 26

Why

Domain Intelligence and this repo had drifted: DI was maintaining a separate PLATFORM_REDIRECT_HOSTS / URL_PLATFORM_PATTERNS set in its redirect_detector.py, and some hosts (e.g. homestead.com) appeared in both lists under different categories. DI also had to invent a platform classification to cover parker hosts like help.com.au that the classifier should be catching as parked (grade C).

This PR makes the classifier the one source of truth for all redirect-destination reference data; DI will follow up with a submodule bump and from domain_classifier.platforms import ... / deletion of its local lists.

Convention

Hosts that are primarily parking landers live in parking.py only; hosts that are primarily platforms but may also see parking traffic live in platforms.py only. Membership is mutually exclusive — parking wins on overlap.

Test plan

  • New tests/test_parking.py — 19 assertions across host match, URL pattern match, and non-match / dedupe guards
  • New tests/test_platforms.py — 22 assertions across URL pattern, host suffix, and dedupe guards
  • pytest tests/ — 172 passing (2 failing tests are unrelated test_rank.py top-ranked-override WIP, pre-existing)
  • Smoke test against a sample classify_domain run (CI / reviewer)

🤖 Generated with Claude Code

Extract the inlined _PARKING_REDIRECT_HOSTS / _is_parking_redirect from
classifier.py into a standalone domain_classifier.parking module with
five categorised host sets (for-sale marketplaces, registrar parking,
monetisation networks, traffic arbitrage, content-farm affiliates) plus
a URL_PARKING_PATTERNS list for host+query-string rules.

Add a new domain_classifier.platforms module holding the hosting/SaaS
platform data that Domain Intelligence was maintaining separately
(PLATFORM_REDIRECT_HOSTS, PLATFORM_HOSTNAME_SUFFIXES, URL_PLATFORM_PATTERNS,
is_platform_url, is_platform_host).  One source of truth; downstream
importers replace their local copies with `from domain_classifier.platforms`.

Convention: hosts that are primarily parking landers live in parking.py
only; hosts that are primarily platforms but may also see parking traffic
live in platforms.py only.  Membership is mutually exclusive — parking
wins on overlap (homestead.com was in both; kept in parking only).

New URL_PARKING_PATTERNS entry:
  ("help.com.au", "?d=")   — Servers Australia hosting lander; without
                             this rule the redirect masquerades as an
                             acquisition signal.

Restores the richer parking coverage (105 hosts) that existed before
the ml/content/fetch extraction refactor trimmed it to 26.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant