diff --git a/.agent-plan.md b/.agent-plan.md index 250b2d0..b04023e 100644 --- a/.agent-plan.md +++ b/.agent-plan.md @@ -100,9 +100,18 @@ public early-pLTV stays calendar-only (Option A); difficulty = distortion tiers now + simulation-level scaling deferred (issue #129). `LTV-Po.2` split into Po.2a (config plumbing) + Po.2b (recipe + e2e). `LTV-Po.2a` (`resolve_config` + `Generator` carry `n_customers` / `early_tenure_weeks` / `observation_date`; -lead-scoring byte-identical) opened as **#131**. Next: `LTV-Po.2b` -(b2b_saas_ltv_v1 recipe YAMLs + difficulty_params resolution + e2e round-trip — -**completes M6**). +lead-scoring byte-identical) opened as **#131**. `LTV-Po.2b` (the three +`b2b_saas_ltv_v1` recipe YAMLs — `scheme: lifecycle`, `default_population: +{n_customers: 1500}`, `narrative.yaml` with 4 industries + 3 geographies, +per-tier `difficulty_profiles.yaml`; registry auto-discovers it; +`LifecycleScheme._resolve_difficulty` resolves `difficulty_params` from the +active profile in `build_world` and carries it on `spec.config` so snapshot +distortions fire per tier; e2e `Generator.from_recipe("b2b_saas_ltv_v1").generate()` +round-trip in both modes; the two tracked-gap difficulty guards flipped) opened +as **#PENDING** — **completes LTV-M6**. `early_tenure_weeks` / +`observation_date` stay override-only (carried from Po.2a); narrative declares +≥2 industries/geographies so public `industry`/`region` keep variance (invariant +#6). **Next milestone: LTV-M7** (`LTV-Pp` — scheme-aware validation). Note: `validate_bundle` is lead-scoring-coupled — scheme-aware validation is `LTV-Pp`. diff --git a/docs/ltv/roadmap.md b/docs/ltv/roadmap.md index 314f3b6..8f9cbd2 100644 --- a/docs/ltv/roadmap.md +++ b/docs/ltv/roadmap.md @@ -46,7 +46,7 @@ protocol + registry, with the package physically reorganized into | `LTV-M3` | Customer population + lifecycle world | `LTV-Ph`, `LTV-Pi` | #113 (Ph) | | `LTV-M4` | Lifecycle simulation engine | `LTV-Pj`, `LTV-Pk` | #117 (Pj), #118 (Pk) | | `LTV-M5` | Customer snapshots + pLTV targets (both regimes) | `LTV-Pl`, `LTV-Pm` | #119 (Pl), #120 (Pm) | -| `LTV-M6` | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a), #126 (Pn.4b), #127 (Pn.4c), #128 (Pn.4d) | +| `LTV-M6` ✅ | Register LifecycleScheme + recipe + manifest/version | `LTV-Pn.1…4`, `LTV-Po` | #121 (Pn.1), #122 (Pn.2), #124 (Pn.3), #125 (Pn.4a), #126 (Pn.4b), #127 (Pn.4c), #128 (Pn.4d), #130 (Po.1), #131 (Po.2a), Po.2b | | `LTV-M7` | Validation + regression-metric calibration | `LTV-Pp` | | | `LTV-M8` | CLI, notebooks, publish | `LTV-Pq`, `LTV-Pr`, `LTV-Ps` | | @@ -399,28 +399,34 @@ methods, then public-safety, then the carried orchestrator cleanup: it). Lead-scoring config resolution is byte-identical (the lifecycle fields default-match; verified via full-bundle SHA-256 vs `main`, both modes). - Labels: `type: refactor`, `layer: api` -- [ ] **`LTV-Po.2b`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets + e2e`. The - three recipe YAMLs (`scheme: lifecycle`; `narrative.yaml` with ≥2 industries + - ≥2 geographies; `difficulty_profiles.yaml`); register in the recipe registry; - resolve `difficulty_params` from the active profile in `build_world` - (mirroring lead-scoring `_resolve_difficulty`) so snapshot distortions fire - per tier; end-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` - round-trip. Public mode stays calendar-only (Option A, locked). - **Limitation (flagged in Po.2a review):** `early_tenure_weeks` / - `observation_date` are override-only — the `Recipe` schema has no field for - them, so the recipe.yaml CANNOT declare them; Po.2b uses the - `GenerationConfig` defaults (4 weeks; observation_date derived by the - population builder). If the recipe must declare them, extend the `Recipe` - dataclass + `from_dict` + `resolve_config` recipe-defaults read (don't rely - on override). - **Constraint (flagged in Po.1 review):** the recipe `narrative.yaml` MUST - declare ≥2 `icp_industries` and ≥2 `geographies` — Po.1 makes these drive the - public `industry`/`region` columns, so a single-value vocab yields a - zero-variance firmographic feature (student_public invariant #6 violation). - Add a test asserting both columns have ≥2 distinct values in the public - bundle. - - Tests: recipe loads, full round-trip, determinism, all task splits, - public/instructor split, per-tier distortion. +- [x] **`LTV-Po.2b`** — `feat(recipes): b2b_saas_ltv_v1 recipe assets + e2e`. + Created `leadforge/recipes/b2b_saas_ltv_v1/{recipe,narrative,difficulty_profiles}.yaml` + (`scheme: lifecycle`; `default_population: {n_customers: 1500}`; `narrative.yaml` + with 4 `icp_industries` + 3 `geographies`; per-tier difficulty profiles). The + registry auto-discovers it (no manual registration). `LifecycleScheme.build_world` + now resolves `difficulty_params` from the active profile via a new + `_resolve_difficulty` (mirroring lead-scoring, minus `category_latent_correlations`) + and carries it on the returned `spec.config`, so snapshot distortions fire per + tier. End-to-end `Generator.from_recipe("b2b_saas_ltv_v1").generate()` + + `.save()` round-trip verified in both modes; public stays calendar-only + (Option A, locked). **Completes `LTV-M6`.** + - The two existing tracked-gap guards flipped: `test_difficulty_not_yet_differentiating` + → `test_difficulty_resolves_params_but_world_unchanged` (params now differ per + tier; the *world* stays identical — issue #129 still open); the explicit-param + `test_difficulty_params_thread_into_snapshots` → tier-based + `test_difficulty_tiers_produce_different_task_features` (since `_resolve_difficulty` + always overwrites `difficulty_params` from the profile, an explicitly-passed + one would be clobbered). + - **Limitation (carried from Po.2a):** `early_tenure_weeks` / `observation_date` + remain override-only — the recipe.yaml does NOT declare them; the bundle uses + the `GenerationConfig` defaults (4 weeks; observation_date derived by the + population builder). + - **Constraint satisfied (Po.1 review):** `narrative.yaml` declares ≥2 + `icp_industries` and ≥2 `geographies`; `test_public_industry_region_features_have_variance` + asserts both public columns carry ≥2 distinct values (student_public invariant #6). + - Tests: `tests/recipes/test_b2b_saas_ltv_v1.py` (discovery, asset shape, config + resolution, build_world round-trip, narrative-driven firmographics, determinism, + both-mode bundle round-trip, byte-determinism). - Labels: `type: feature`, `layer: recipes`, `layer: api` - **Deferred (issue #129):** simulation-level difficulty scaling for the lifecycle engine — making `advanced` a genuinely harder world (not just diff --git a/leadforge/recipes/b2b_saas_ltv_v1/__init__.py b/leadforge/recipes/b2b_saas_ltv_v1/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/leadforge/recipes/b2b_saas_ltv_v1/difficulty_profiles.yaml b/leadforge/recipes/b2b_saas_ltv_v1/difficulty_profiles.yaml new file mode 100644 index 0000000..89ef9c0 --- /dev/null +++ b/leadforge/recipes/b2b_saas_ltv_v1/difficulty_profiles.yaml @@ -0,0 +1,52 @@ +# Difficulty profiles for b2b_saas_ltv_v1 +# --------------------------------------------------------------------------- +# Each profile controls the signal/noise characteristics of the generated +# pLTV dataset. Higher difficulty = more realistic noise, missing data, and +# outliers in the customer snapshot features, making the supervised task harder. +# +# Scope (LTV-Po.2b): difficulty is resolved into DifficultyParams and applied as +# SNAPSHOT DISTORTIONS (noise_scale / missing_rate / outlier_rate perturb the +# feature columns; targets are exempt). Simulation-level scaling — letting +# signal_strength / committee_friction / conversion_rate_range shape the +# underlying world itself — is tracked but NOT yet wired (issue #129). Those +# knobs are carried here so the profiles are complete and forward-compatible. + +intro: + description: > + Clean signal, minimal noise. Suitable for learning pLTV/regression basics + and verifying that a pipeline runs end to end. + # Probability that the true mechanism drives the outcome (vs. noise). + # (Carried for issue #129; not yet consumed.) + signal_strength: 0.90 + # Scale multiplier applied to additive Gaussian noise in continuous features. + noise_scale: 0.10 + # Fraction of feature values set to missing (NaN). + missing_rate: 0.02 + # Fraction of rows perturbed into statistical outliers. + outlier_rate: 0.01 + # Acceptable churn-positive-rate band. (Carried for issue #129; not yet consumed.) + conversion_rate_range: [0.10, 0.20] + # Strength of buying-committee-friction effects. (Carried for issue #129.) + committee_friction: 0.10 + +intermediate: + description: > + Realistic signal-to-noise ratio. Suitable for portfolio projects, + courses, and kaggle-style pLTV competitions. + signal_strength: 0.70 + noise_scale: 0.30 + missing_rate: 0.08 + outlier_rate: 0.04 + conversion_rate_range: [0.18, 0.30] + committee_friction: 0.30 + +advanced: + description: > + High noise, realistic outliers, and significant missing data. Suitable for + ML research and realistic pLTV benchmark construction. + signal_strength: 0.50 + noise_scale: 0.55 + missing_rate: 0.18 + outlier_rate: 0.08 + conversion_rate_range: [0.25, 0.40] + committee_friction: 0.55 diff --git a/leadforge/recipes/b2b_saas_ltv_v1/narrative.yaml b/leadforge/recipes/b2b_saas_ltv_v1/narrative.yaml new file mode 100644 index 0000000..5b98f75 --- /dev/null +++ b/leadforge/recipes/b2b_saas_ltv_v1/narrative.yaml @@ -0,0 +1,101 @@ +# Narrative defaults for b2b_saas_ltv_v1 +# --------------------------------------------------------------------------- +# Baseline "story facts" for the mid-market B2B SaaS subscription-LTV vertical. +# The lifecycle scheme's population builder consumes market.icp_industries and +# market.geographies to drive each customer's firmographics; the remaining +# sub-specs document the world for the dataset card and future mechanisms. +# +# INVARIANT (snapshot-safety / zero-variance): market.icp_industries and +# market.geographies MUST each declare >= 2 values. The public relational +# export surfaces industry/region columns directly, so a single-value vocabulary +# would yield a zero-variance public feature (student_public invariant #6). + +company: + name: "Northwind Revenue Cloud" + founded_year: 2016 + hq_city: "Denver" + hq_country: "US" + stage: "Series C" + employee_range: [180, 320] + +product: + name: "Northwind Subscriptions" + category: "Subscription & Revenue Lifecycle Management" + deployment: "cloud_saas" + pricing_model: "per_seat_annual" + acv_range_usd: [12000, 90000] + contract_terms_months: [12, 24, 36] + free_trial_available: true + demo_available: true + +market: + icp_employee_range: [150, 2500] + icp_industries: + - saas + - fintech + - healthtech + - ecommerce + geographies: [US, UK, CA] + avg_deal_size_usd: 36000 + avg_sales_cycle_days: 40 + +gtm_motion: + channels: + - inbound_marketing + - sdr_outbound + - partner_referral + inbound_share: 0.50 + outbound_share: 0.30 + partner_share: 0.20 + +personas: + - role: cfo + title_variants: + - "CFO" + - "Chief Financial Officer" + - "VP Finance" + - "Head of Finance" + decision_authority: economic_buyer + typical_involvement: late_stage + + - role: revops_manager + title_variants: + - "RevOps Manager" + - "Revenue Operations Manager" + - "Director of Revenue Operations" + - "Head of RevOps" + decision_authority: champion + typical_involvement: full_cycle + + - role: billing_admin + title_variants: + - "Billing Administrator" + - "Billing Operations Lead" + - "AR Manager" + - "Subscriptions Manager" + decision_authority: end_user + typical_involvement: full_cycle + + - role: customer_success_lead + title_variants: + - "Customer Success Lead" + - "VP Customer Success" + - "Head of Customer Success" + - "Director of Customer Success" + decision_authority: technical_evaluator + typical_involvement: post_sale + +# Post-sale lifecycle stages (the subscription journey this scheme simulates). +funnel_stages: + - name: onboarding + label: "Onboarding" + - name: activated + label: "Activated" + - name: adopted + label: "Adopted" + - name: expanded + label: "Expanded" + - name: renewed + label: "Renewed" + - name: churned + label: "Churned" diff --git a/leadforge/recipes/b2b_saas_ltv_v1/recipe.yaml b/leadforge/recipes/b2b_saas_ltv_v1/recipe.yaml new file mode 100644 index 0000000..53ece37 --- /dev/null +++ b/leadforge/recipes/b2b_saas_ltv_v1/recipe.yaml @@ -0,0 +1,31 @@ +id: b2b_saas_ltv_v1 +title: "Mid-market B2B SaaS — Subscription Lifetime Value" +vertical: mid_market_b2b_saas +# Generation scheme this recipe runs (see leadforge.schemes). This recipe runs +# the customer-lifecycle / pLTV scheme rather than lead_scoring. +scheme: lifecycle +description: > + A mid-market B2B SaaS company selling subscription revenue-lifecycle + management software to 150–2,500 employee firms across the US, UK, and + Canada. The simulated world tracks each signed customer through onboarding, + adoption, expansion, payment health, and churn so that predicted lifetime + value (pLTV over 90/365/730-day forward windows) and 180-day churn emerge + from simulated subscription events rather than being sampled directly. +# The canonical pLTV regression target; the bundle also ships the 90d/730d +# windows and a 180-day churn classification task (see the lifecycle scheme). +primary_task: pltv_revenue_365d +supported_modes: + - student_public + - research_instructor +supported_difficulty: + - intro + - intermediate + - advanced +default_population: + # The lifecycle scheme is customer-centric: it samples a customer population + # (each customer owns one account + one subscription) rather than leads. + n_customers: 1500 +# The longest forward pLTV window, in days (the engine simulates through it so +# every target is fully covered). Lifecycle forward windows are locked to the +# scheme's exported constant (90/365/730); see LifecycleScheme.build_world. +horizon_days: 730 diff --git a/leadforge/schemes/lifecycle/__init__.py b/leadforge/schemes/lifecycle/__init__.py index f4b9072..52e58a0 100644 --- a/leadforge/schemes/lifecycle/__init__.py +++ b/leadforge/schemes/lifecycle/__init__.py @@ -9,6 +9,7 @@ from __future__ import annotations +import dataclasses import random from typing import TYPE_CHECKING, Any @@ -58,11 +59,14 @@ def build_world( vocabularies (``market.icp_industries`` / ``market.geographies``); a ``None`` narrative falls back to the built-in procurement-ICP defaults. - Difficulty (tracked, not silent): ``config.difficulty`` does not yet - scale the *simulation* — every tier yields the same world — so harder - tiers differ only in snapshot distortions (resolved from the recipe - profile in ``LTV-Po`` and threaded into the snapshot builders). - Simulation-level difficulty scaling is deferred (issue #129). + Difficulty: ``config.difficulty`` resolves (via :meth:`_resolve_difficulty`, + ``LTV-Po.2b``) into :class:`DifficultyParams` read from the recipe's + ``difficulty_profiles.yaml``, which the snapshot builders apply as + feature distortions (noise / missingness / outliers; targets exempt). + The resolved params ride on the returned bundle's ``spec.config`` so + :meth:`write_bundle` picks them up. Difficulty does NOT yet scale the + *simulation* — every tier yields the same underlying world — so + simulation-level scaling remains deferred (issue #129). """ from leadforge.core.exceptions import InvalidConfigError from leadforge.core.models import WorldBundle, WorldSpec @@ -85,6 +89,10 @@ def build_world( "exports the fixed set). Use the default until that wiring lands." ) + # Resolve difficulty → DifficultyParams (snapshot distortions) and carry + # them on config so the returned spec + write_bundle see them. + config = self._resolve_difficulty(config) + motif_rng = RNGRoot(config.seed).child("lifecycle_motif") motif_family = _sample_motif_family(motif_rng) @@ -112,6 +120,67 @@ def build_world( ), ) + @staticmethod + def _resolve_difficulty(config: GenerationConfig) -> GenerationConfig: + """Attach :class:`DifficultyParams` from the active difficulty profile. + + Mirrors :meth:`LeadScoringScheme._resolve_difficulty` (minus the + lead-scoring-only ``category_latent_correlations``): loads the recipe's + ``difficulty_profiles.yaml``, reads the profile for + ``config.difficulty``, and returns ``config`` with the resolved + :class:`DifficultyParams` attached. The snapshot builders consume + ``noise_scale`` / ``missing_rate`` / ``outlier_rate``; the remaining + knobs are carried for forward-compatible simulation-level scaling + (issue #129). + + Returns ``config`` unchanged when the recipe has no difficulty-profiles + file (e.g. an ad-hoc config whose recipe lacks one). + """ + from leadforge.api.recipes import Recipe + from leadforge.core.models import DifficultyParams + from leadforge.recipes.registry import load_recipe + + try: + raw = load_recipe(config.recipe_id) + recipe = Recipe.from_dict(raw) + profiles = recipe.load_difficulty_profiles() + except (FileNotFoundError, KeyError): + return config + if not profiles: + return config + + profile = profiles.get(config.difficulty.value, {}) + + # All keys are required — a missing key indicates a malformed profile + # YAML and should fail loudly rather than silently defaulting. + required_keys = ( + "signal_strength", + "noise_scale", + "missing_rate", + "outlier_rate", + "conversion_rate_range", + "committee_friction", + ) + missing = [k for k in required_keys if k not in profile] + if missing: + from leadforge.core.exceptions import InvalidRecipeError + + raise InvalidRecipeError( + f"Difficulty profile '{config.difficulty.value}' is missing " + f"required keys: {missing}" + ) + cr_range = profile["conversion_rate_range"] + difficulty_params = DifficultyParams( + signal_strength=profile["signal_strength"], + noise_scale=profile["noise_scale"], + missing_rate=profile["missing_rate"], + outlier_rate=profile["outlier_rate"], + conversion_rate_lo=cr_range[0], + conversion_rate_hi=cr_range[1], + committee_friction=profile["committee_friction"], + ) + return dataclasses.replace(config, difficulty_params=difficulty_params) + def write_bundle( self, bundle: WorldBundle, diff --git a/tests/recipes/test_b2b_saas_ltv_v1.py b/tests/recipes/test_b2b_saas_ltv_v1.py new file mode 100644 index 0000000..31b55c5 --- /dev/null +++ b/tests/recipes/test_b2b_saas_ltv_v1.py @@ -0,0 +1,199 @@ +"""End-to-end tests for the b2b_saas_ltv_v1 (lifecycle / pLTV) recipe (LTV-Po.2b). + +These exercise the recipe assets — recipe.yaml, narrative.yaml, +difficulty_profiles.yaml — through the public Generator API: discovery, config +resolution, the build_world round-trip, and a full write_bundle in both +exposure modes. +""" + +from __future__ import annotations + +import hashlib +import json +from pathlib import Path + +import pandas as pd + +from leadforge.api.generator import Generator +from leadforge.api.recipes import Recipe +from leadforge.recipes.registry import list_recipes, load_recipe +from leadforge.schemes.lifecycle.artifacts import LifecycleArtifacts + +_RECIPE_ID = "b2b_saas_ltv_v1" +_TS = "2026-01-01T00:00:00+00:00" +_SMALL = 120 + +_PUBLIC_TASKS = { + "pltv_revenue_90d", + "pltv_revenue_365d", + "pltv_revenue_730d", + "churned_within_180d", +} +_INSTRUCTOR_TASKS = _PUBLIC_TASKS | { + "early_pltv_revenue_90d", + "early_pltv_revenue_365d", + "early_pltv_revenue_730d", + "early_churned_within_180d", +} + + +# --------------------------------------------------------------------------- +# Discovery + recipe asset shape +# --------------------------------------------------------------------------- + + +def test_recipe_is_discoverable() -> None: + ids = [r["id"] for r in list_recipes()] + assert _RECIPE_ID in ids + + +def test_recipe_declares_lifecycle_scheme() -> None: + recipe = Recipe.from_dict(load_recipe(_RECIPE_ID)) + assert recipe.scheme == "lifecycle" + assert recipe.primary_task == "pltv_revenue_365d" + assert recipe.default_population == {"n_customers": 1500} + + +def test_recipe_supports_both_modes_and_all_tiers() -> None: + from leadforge.core.enums import DifficultyProfile, ExposureMode + + recipe = Recipe.from_dict(load_recipe(_RECIPE_ID)) + assert set(recipe.supported_modes) == { + ExposureMode.student_public, + ExposureMode.research_instructor, + } + assert set(recipe.supported_difficulty) == set(DifficultyProfile) + + +def test_narrative_declares_multi_value_firmographics() -> None: + """student_public invariant #6: industry/region must not be zero-variance, + so the recipe narrative must declare >= 2 industries and >= 2 geographies.""" + recipe = Recipe.from_dict(load_recipe(_RECIPE_ID)) + market = recipe.load_narrative()["market"] + assert len(market["icp_industries"]) >= 2 + assert len(market["geographies"]) >= 2 + + +def test_difficulty_profiles_present_for_every_tier() -> None: + recipe = Recipe.from_dict(load_recipe(_RECIPE_ID)) + profiles = recipe.load_difficulty_profiles() + assert {"intro", "intermediate", "advanced"} <= set(profiles) + for tier in ("intro", "intermediate", "advanced"): + for key in ("signal_strength", "noise_scale", "missing_rate", "outlier_rate"): + assert key in profiles[tier], f"{tier} missing {key}" + + +# --------------------------------------------------------------------------- +# Config resolution +# --------------------------------------------------------------------------- + + +def test_resolve_config_carries_n_customers_from_default_population() -> None: + recipe = Recipe.from_dict(load_recipe(_RECIPE_ID)) + config = recipe.resolve_config(seed=42) + assert config.recipe_id == _RECIPE_ID + assert config.n_customers == 1500 + + +# --------------------------------------------------------------------------- +# build_world round-trip via Generator +# --------------------------------------------------------------------------- + + +def test_generate_returns_lifecycle_artifacts() -> None: + gen = Generator.from_recipe(_RECIPE_ID, seed=42, n_customers=_SMALL) + assert gen.world_spec.scheme == "lifecycle" + bundle = gen.generate() + assert isinstance(bundle.artifacts, LifecycleArtifacts) + assert bundle.spec.scheme == "lifecycle" + assert len(bundle.artifacts.population.customers) == _SMALL + + +def test_generate_resolves_difficulty_params_from_recipe() -> None: + gen = Generator.from_recipe(_RECIPE_ID, seed=42, n_customers=_SMALL, difficulty="advanced") + bundle = gen.generate() + params = bundle.spec.config.difficulty_params + assert params is not None + # The advanced tier's knobs (from difficulty_profiles.yaml) flowed through. + assert params.noise_scale == 0.55 + assert params.missing_rate == 0.18 + + +def test_narrative_drives_population_firmographics() -> None: + gen = Generator.from_recipe(_RECIPE_ID, seed=42, n_customers=_SMALL) + accounts = gen.generate().artifacts.population.accounts + market = Recipe.from_dict(load_recipe(_RECIPE_ID)).load_narrative()["market"] + seen_industries = {a.industry for a in accounts} + seen_regions = {a.region for a in accounts} + assert seen_industries <= set(market["icp_industries"]) + assert seen_regions <= set(market["geographies"]) + # A 120-customer world should surface variety from the multi-value vocab. + assert len(seen_industries) >= 2 + assert len(seen_regions) >= 2 + + +def test_generate_is_deterministic() -> None: + a = Generator.from_recipe(_RECIPE_ID, seed=7, n_customers=_SMALL).generate() + b = Generator.from_recipe(_RECIPE_ID, seed=7, n_customers=_SMALL).generate() + assert a.artifacts.motif_family == b.artifacts.motif_family + assert [s.to_dict() for s in a.artifacts.simulation_result.subscriptions] == [ + s.to_dict() for s in b.artifacts.simulation_result.subscriptions + ] + + +# --------------------------------------------------------------------------- +# Full bundle round-trip on disk (both exposure modes) +# --------------------------------------------------------------------------- + + +def _write(tmp_path: Path, *, mode: str, difficulty: str = "intermediate") -> Path: + gen = Generator.from_recipe( + _RECIPE_ID, seed=42, n_customers=150, exposure_mode=mode, difficulty=difficulty + ) + bundle = gen.generate() + out = tmp_path / mode + bundle.save(str(out), generation_timestamp=_TS) + return out + + +def test_public_bundle_round_trip(tmp_path) -> None: + out = _write(tmp_path, mode="student_public") + assert (out / "manifest.json").is_file() + assert not (out / "metadata").exists() + task_dirs = {p.name for p in (out / "tasks").iterdir() if p.is_dir()} + assert task_dirs == _PUBLIC_TASKS # early-pLTV family omitted from public + m = json.loads((out / "manifest.json").read_text()) + assert m["generation_scheme"] == "lifecycle" + assert m["relational_snapshot_safe"] is True + + +def test_instructor_bundle_round_trip(tmp_path) -> None: + out = _write(tmp_path, mode="research_instructor") + assert (out / "metadata").is_dir() + task_dirs = {p.name for p in (out / "tasks").iterdir() if p.is_dir()} + assert task_dirs == _INSTRUCTOR_TASKS + m = json.loads((out / "manifest.json").read_text()) + assert m["relational_snapshot_safe"] is False + + +def test_public_industry_region_features_have_variance(tmp_path) -> None: + """The multi-value narrative vocab must produce >= 2 distinct values in the + public snapshot's firmographic features (student_public invariant #6).""" + out = _write(tmp_path, mode="student_public") + train = pd.read_parquet(out / "tasks" / "pltv_revenue_365d" / "train.parquet") + for col in ("industry", "region"): + if col in train.columns: + assert train[col].nunique(dropna=True) >= 2, f"{col} is zero-variance" + + +def test_full_bundle_byte_deterministic(tmp_path) -> None: + def hashes(root: Path) -> dict[str, str]: + return { + str(p.relative_to(root)): hashlib.sha256(p.read_bytes()).hexdigest() + for p in sorted(root.rglob("*")) + if p.is_file() + } + + a = _write(tmp_path / "a", mode="research_instructor") + b = _write(tmp_path / "b", mode="research_instructor") + assert hashes(a) == hashes(b) diff --git a/tests/schemes/lifecycle/test_build_world.py b/tests/schemes/lifecycle/test_build_world.py index db0cd06..64e54c9 100644 --- a/tests/schemes/lifecycle/test_build_world.py +++ b/tests/schemes/lifecycle/test_build_world.py @@ -65,18 +65,33 @@ def test_motif_varies_across_seeds() -> None: assert motifs <= set(LIFECYCLE_MOTIF_FAMILIES) -def test_difficulty_not_yet_differentiating() -> None: - """Tracked-gap guard (LTV-Pn.4a): build_world does not yet consume - config.difficulty, so every tier yields the same world. When Pn.4b wires - difficulty in, this test must be updated to assert the tiers DIFFER — - flipping it is the reminder that the gap is closed. +def test_difficulty_resolves_params_but_world_unchanged() -> None: + """LTV-Po.2b: build_world now resolves config.difficulty against the + recipe's difficulty_profiles.yaml and attaches the per-tier DifficultyParams + to the returned spec.config (consumed as snapshot distortions downstream). + + The *simulation* itself is still tier-independent — every tier yields the + same world (simulation-level scaling is deferred, issue #129) — so the motif + and subscriptions stay identical across tiers; only difficulty_params differ. """ intro = get_scheme("lifecycle").build_world( - GenerationConfig(seed=5, n_customers=60, difficulty="intro"), narrative=None + GenerationConfig(seed=5, n_customers=60, recipe_id="b2b_saas_ltv_v1", difficulty="intro"), + narrative=None, ) advanced = get_scheme("lifecycle").build_world( - GenerationConfig(seed=5, n_customers=60, difficulty="advanced"), narrative=None + GenerationConfig( + seed=5, n_customers=60, recipe_id="b2b_saas_ltv_v1", difficulty="advanced" + ), + narrative=None, ) + # Difficulty now resolves into distinct params per tier... + intro_params = intro.spec.config.difficulty_params + advanced_params = advanced.spec.config.difficulty_params + assert intro_params is not None + assert advanced_params is not None + assert intro_params != advanced_params + assert intro_params.noise_scale < advanced_params.noise_scale + # ...but the underlying world is unchanged (issue #129 still open). assert intro.artifacts.motif_family == advanced.artifacts.motif_family assert [s.to_dict() for s in intro.artifacts.simulation_result.subscriptions] == [ s.to_dict() for s in advanced.artifacts.simulation_result.subscriptions diff --git a/tests/schemes/lifecycle/test_write_bundle.py b/tests/schemes/lifecycle/test_write_bundle.py index 52e2d9f..051492f 100644 --- a/tests/schemes/lifecycle/test_write_bundle.py +++ b/tests/schemes/lifecycle/test_write_bundle.py @@ -9,7 +9,7 @@ import pandas as pd import pytest -from leadforge.core.models import DifficultyParams, GenerationConfig +from leadforge.core.models import GenerationConfig from leadforge.schemes import get_scheme from leadforge.schemes.lifecycle.snapshots import CHURN_WINDOW_DAYS, FORWARD_WINDOWS_DAYS @@ -192,33 +192,26 @@ def hashes(root: Path) -> dict[str, str]: # --------------------------------------------------------------------------- -# Difficulty threading (the LTV-Pn.4a pinned obligation) +# Difficulty resolution (LTV-Po.2b): recipe difficulty → snapshot distortions # --------------------------------------------------------------------------- -def test_difficulty_params_thread_into_snapshots(tmp_path) -> None: - """config.difficulty_params must reach the snapshot builders — with strong - distortion knobs the task features differ from an undistorted bundle. - (Recipe-driven resolution of difficulty_params lands in LTV-Po; this proves - the wiring so that resolution will take effect.)""" - params = DifficultyParams( - signal_strength=1.0, - noise_scale=1.0, - missing_rate=0.3, - outlier_rate=0.05, - conversion_rate_lo=0.02, - conversion_rate_hi=0.4, - committee_friction=0.5, - ) - plain = _write(tmp_path / "plain") - distorted = _write(tmp_path / "distorted", config=_config(difficulty_params=params)) - - plain_df = pd.read_parquet(plain / "tasks" / "pltv_revenue_365d" / "train.parquet") - dist_df = pd.read_parquet(distorted / "tasks" / "pltv_revenue_365d" / "train.parquet") - # A numeric feature column should differ once distortions are applied. - assert not plain_df["avg_active_users_l12w"].equals(dist_df["avg_active_users_l12w"]) - # Targets are never distorted (the distortion helper excludes them). - assert plain_df["ltv_revenue_365d"].equals(dist_df["ltv_revenue_365d"]) +def test_difficulty_tiers_produce_different_task_features(tmp_path) -> None: + """build_world resolves config.difficulty against the recipe's + difficulty_profiles.yaml (LTV-Po.2b) and threads the resulting + DifficultyParams into the snapshot builders. Two tiers (intro vs advanced) + therefore yield different feature distortions — while the targets, which the + distortion helper exempts, stay identical.""" + intro = _write(tmp_path / "intro", config=_config(difficulty="intro")) + advanced = _write(tmp_path / "advanced", config=_config(difficulty="advanced")) + + intro_df = pd.read_parquet(intro / "tasks" / "pltv_revenue_365d" / "train.parquet") + adv_df = pd.read_parquet(advanced / "tasks" / "pltv_revenue_365d" / "train.parquet") + # A numeric feature column differs once the per-tier distortions are applied. + assert not intro_df["avg_active_users_l12w"].equals(adv_df["avg_active_users_l12w"]) + # Targets are never distorted (the distortion helper excludes them), so the + # world being identical across tiers (issue #129) leaves them untouched. + assert intro_df["ltv_revenue_365d"].equals(adv_df["ltv_revenue_365d"]) # ---------------------------------------------------------------------------