From 72822472dd5f5bf3f3688752f22546de0be1ca3f Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Mon, 11 May 2026 22:05:40 +0000 Subject: [PATCH 1/6] add matching-demos slides for LSIB lesson, MGCP skeleton Fill in the previously-empty slides/matching-demos.md with a presenter deck for the LSIB <-> Overture matching demo (lesson 6), plus a skeleton for the MGCP buildings demo (lesson 7) that mirrors the lesson's TODO structure. Wire the new section into slides/index.html after the GERS extras. Signed-off-by: Dana Bauer --- slides/index.html | 9 +++ slides/matching-demos.md | 156 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 164 insertions(+), 1 deletion(-) diff --git a/slides/index.html b/slides/index.html index dfd1f13..7498234 100644 --- a/slides/index.html +++ b/slides/index.html @@ -207,6 +207,15 @@ data-charset="iso-8859-15" > + +
+ diff --git a/slides/matching-demos.md b/slides/matching-demos.md index 329b619..a80a02e 100644 --- a/slides/matching-demos.md +++ b/slides/matching-demos.md @@ -1 +1,155 @@ -## Matching Demos \ No newline at end of file +## Matching Demos + +Linking external datasets to Overture via _GERS IDs_ + +<<< + +## What is a "match"? + +A row in a _crosswalk_ table that links one feature in dataset A +to its corresponding feature in dataset B. + +`(external_id, overture_id, match_class, metrics…)` + +<<< + +## Why match against Overture? + + + +>>> + +## Demo 1: LSIB ↔ Overture + +Matching **administrative boundaries** between the U.S. State Department's +Large Scale International Boundaries (LSIB) and Overture's `division_boundary`. + +`notebooks/3-lsib_overture.ipynb` + +Notes: +LSIB is the State Department's reference for international boundaries. +The notebook produces a crosswalk linking each LSIB segment to its +matching Overture GERS ID. The crosswalk is the deliverable; the +matching score is just how we got there. + +<<< + +## The join key: `pair_key` + +A canonical, sorted country-pair string. + +```python +pair_key = "|".join(sorted([cc1, cc2])) +# ("AR", "BR") → "AR|BR" +# ("BR", "AR") → "AR|BR" +``` + +Both datasets produce the _same key_ for the same border. + +<<< + +## Two country-code vocabularies + +| | LSIB | Overture | +| -------- | ------ | -------- | +| Standard | GENC | ISO 3166 | +| Kosovo | `KV` | `XK` | + +Translate before joining: + +```python +def genc_to_iso(cc): + return "XK" if cc == "KV" else cc +``` + +Notes: +GENC and ISO agree on most countries. They diverge on contested or +unrecognized entities. For country-level boundaries, Kosovo is the +only meaningful divergence; other dispute codes exist but are out of +scope for the demo. + +<<< + +## How we score a candidate pair + +- **Buffer overlap (250m)** — what fraction of each line falls within 250m of the other +- **Length ratio** — shorter / longer + +_Production adds Hausdorff distance and multi-tolerance sweeps; the demo keeps it lean._ + +<<< + +## Cardinality buckets + +The structural shape of the match, _before_ we look at geometry. + +- **clean** — 1 LSIB feature ↔ 1 Overture feature +- **lsib_fragmented** — LSIB splits a single boundary into parts +- **overture_fragmented** — Overture splits it +- **both_fragmented** — both do, possibly differently + +<<< + +## Match classes + +After scoring, every pair lands in one of: + + + +<<< + +## Three takeaways + +1. The output is a _link table_, not a score +2. Building it is _iterative_ — bucketing, filters, and thresholds shape the result +3. _Disagreements are findings, not failures_ + +>>> + +## Demo 2: MGCP polygons ↔ Overture + +> **Status:** skeleton — full deck will land with the finished lesson + +Matching NGA's MGCP polygon features (buildings + base-theme polygons) against Overture. + +`notebooks/4-buildings-matching.ipynb` + +<<< + +### Schema landscape + +MGCP, TRD, TDS, DGIWG — and where GERS fits. + +_Slide TODO_ + +<<< + +### Methodology + +For each polygon pair we compute: + +- **IoU** (Intersection over Union) +- **Centroid containment** + +A pair matches when IoU ≥ 0.5, **or** IoU ≥ 0.3 with the Overture centroid inside the MGCP polygon. + +<<< + +### Results: seven passes + +| Overture type | Behavior | +| ------------- | -------- | +| `buildings/building` | Clean case — ~99% land in the clean bucket | +| `buildings/building_part` | Schema-design mismatch surfaces in low match count | +| `base/infrastructure`, `base/land_use`, `base/water` | Mostly clean with some aggregation | + +_Slide TODO: expand with notebook screenshots_ From 953a5c4c0e70a6fcb83549f18776af2395192831 Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Mon, 11 May 2026 22:05:41 +0000 Subject: [PATCH 2/6] add lesson + notebook callouts to slide decks So attendees can find the written lesson and runnable notebook for each section: - GERS overview (lesson 4) - Exploring data (notebook 1-overturemaps-lonboard) - GERS ecosystem extras (lesson 4, replaces trailing >>> blank slide with a Continue-the-Workshop closer) - Matching demos: LSIB (lesson 6, notebook 3-lsib-demo) and MGCP (lesson 7, notebook 4-buildings-matching). Also fixes the notebook filename (was 3-lsib_overture, should be 3-lsib-demo). Signed-off-by: Dana Bauer --- slides/GERS-ecosystem-extra.md | 4 +++- slides/GERS.md | 2 ++ slides/exploring-data.md | 2 ++ slides/matching-demos.md | 6 ++++-- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/slides/GERS-ecosystem-extra.md b/slides/GERS-ecosystem-extra.md index fda2b2c..75efc0f 100644 --- a/slides/GERS-ecosystem-extra.md +++ b/slides/GERS-ecosystem-extra.md @@ -90,4 +90,6 @@ Some bridge file examples Third party services that offer a gers id lookup service ->>> +<<< + +Continue the Workshop at [labs.overturemaps.org/workshop](//labs.overturemaps.org/workshop/4-gers.html) diff --git a/slides/GERS.md b/slides/GERS.md index 82be7cf..e61419d 100644 --- a/slides/GERS.md +++ b/slides/GERS.md @@ -4,6 +4,8 @@ - GERS IDs identify real world entities such as road segments - Simplifies integrating & exchanging data layers +Lesson: [4-gers](//labs.overturemaps.org/workshop/4-gers.html) + <<< ## How does GERS Work? diff --git a/slides/exploring-data.md b/slides/exploring-data.md index 5198196..4d28b39 100644 --- a/slides/exploring-data.md +++ b/slides/exploring-data.md @@ -16,3 +16,5 @@ <<< Continue the Workshop at [labs.overturemaps.org/workshop](//labs.overturemaps.org/workshop/2-accessing-data.html) + +Notebook: `notebooks/1-overturemaps-lonboard.ipynb` diff --git a/slides/matching-demos.md b/slides/matching-demos.md index a80a02e..46a7938 100644 --- a/slides/matching-demos.md +++ b/slides/matching-demos.md @@ -28,7 +28,8 @@ to its corresponding feature in dataset B. Matching **administrative boundaries** between the U.S. State Department's Large Scale International Boundaries (LSIB) and Overture's `division_boundary`. -`notebooks/3-lsib_overture.ipynb` +Lesson: [6-lsib-demo](//labs.overturemaps.org/workshop/6-lsib-demo.html) +Notebook: `notebooks/3-lsib-demo.ipynb` Notes: LSIB is the State Department's reference for international boundaries. @@ -121,7 +122,8 @@ After scoring, every pair lands in one of: Matching NGA's MGCP polygon features (buildings + base-theme polygons) against Overture. -`notebooks/4-buildings-matching.ipynb` +Lesson: [7-buildings-matching](//labs.overturemaps.org/workshop/7-buildings-matching.html) +Notebook: `notebooks/4-buildings-matching.ipynb` <<< From f74adace0f55f7437e2de322323368cf31d6e5c6 Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Mon, 11 May 2026 22:35:11 +0000 Subject: [PATCH 3/6] [Docs] fix slide markdown charset to utf-8 The reveal.js markdown plugin was loading each slide section with data-charset="iso-8859-15", which mojibake'd any non-ASCII content (em-dashes, arrows, ellipsis, >=) in the matching-demos deck. The other slide files happen to be pure ASCII so this was a latent bug. Signed-off-by: Dana Bauer --- slides/index.html | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/slides/index.html b/slides/index.html index 7498234..437dd13 100644 --- a/slides/index.html +++ b/slides/index.html @@ -159,7 +159,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > @@ -168,7 +168,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > @@ -177,7 +177,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > @@ -186,7 +186,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Note:" - data-charset="iso-8859-15" + data-charset="utf-8" > --> @@ -195,7 +195,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > --> @@ -204,7 +204,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > @@ -213,7 +213,7 @@ data-separator="^>>>" data-separator-vertical="^<<<" data-separator-notes="^Notes:" - data-charset="iso-8859-15" + data-charset="utf-8" > From cc738551770d5f849b75e8b5d141db5de6bd8908 Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Mon, 11 May 2026 23:02:26 +0000 Subject: [PATCH 4/6] [Docs] wire setup deck back in; move callouts to end-of-deck slides - Uncomment and append setup.md as the final section of the slideshow (DuckDB, OvertureMaps-Py CLI, docs links, Codespaces, JupyterLab). - Move lesson/notebook callouts from inline-on-title-slide to dedicated end-of-deck sub-slides on GERS.md and both demos in matching-demos.md, matching the existing exploring-data.md pattern. Signed-off-by: Dana Bauer --- slides/GERS.md | 6 ++++-- slides/index.html | 16 ++++++++-------- slides/matching-demos.md | 18 ++++++++++++------ 3 files changed, 24 insertions(+), 16 deletions(-) diff --git a/slides/GERS.md b/slides/GERS.md index e61419d..e7d1323 100644 --- a/slides/GERS.md +++ b/slides/GERS.md @@ -4,8 +4,6 @@ - GERS IDs identify real world entities such as road segments - Simplifies integrating & exchanging data layers -Lesson: [4-gers](//labs.overturemaps.org/workshop/4-gers.html) - <<< ## How does GERS Work? @@ -34,3 +32,7 @@ GERS is not just a stable ID. The "S" stands for system.
  • Bridge files for easy mappings of source IDs to GERS IDs.
  • Onboarding Services that let anyone easily associate their data with GERS.
  • + +<<< + +Continue the Workshop at [labs.overturemaps.org/workshop](//labs.overturemaps.org/workshop/4-gers.html) diff --git a/slides/index.html b/slides/index.html index 437dd13..0f2d486 100644 --- a/slides/index.html +++ b/slides/index.html @@ -189,27 +189,27 @@ data-charset="utf-8" > --> - - +
    --> + > - +
    - +
    >> ## Demo 2: MGCP polygons ↔ Overture @@ -122,9 +125,6 @@ After scoring, every pair lands in one of: Matching NGA's MGCP polygon features (buildings + base-theme polygons) against Overture. -Lesson: [7-buildings-matching](//labs.overturemaps.org/workshop/7-buildings-matching.html) -Notebook: `notebooks/4-buildings-matching.ipynb` - <<< ### Schema landscape @@ -155,3 +155,9 @@ A pair matches when IoU ≥ 0.5, **or** IoU ≥ 0.3 with the Overture centroid i | `base/infrastructure`, `base/land_use`, `base/water` | Mostly clean with some aggregation | _Slide TODO: expand with notebook screenshots_ + +<<< + +Lesson: [7-buildings-matching](//labs.overturemaps.org/workshop/7-buildings-matching.html) + +Notebook: `notebooks/4-buildings-matching.ipynb` From 07ceadf69344823c7b43959ed46d49552960a04c Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Tue, 12 May 2026 01:52:33 +0000 Subject: [PATCH 5/6] [Docs] flesh out MGCP demo slides from finished lesson 7 Lesson 7 landed in #29; replace the placeholder MGCP slides with real content drawn from the merged prose. The 4 skeleton sub-slides become 8 substantive ones: demo data, methodology, results arc, cardinality reporting, two-rate diagnostic, GERS adoption matrix, key finding, lesson/notebook callout. LSIB half is unchanged. Signed-off-by: Dana Bauer --- slides/matching-demos.md | 88 +++++++++++++++++++++++++++++++++------- 1 file changed, 74 insertions(+), 14 deletions(-) diff --git a/slides/matching-demos.md b/slides/matching-demos.md index 6f35b06..796b5b9 100644 --- a/slides/matching-demos.md +++ b/slides/matching-demos.md @@ -121,17 +121,19 @@ Notebook: `notebooks/3-lsib-demo.ipynb` ## Demo 2: MGCP polygons ↔ Overture -> **Status:** skeleton — full deck will land with the finished lesson - Matching NGA's MGCP polygon features (buildings + base-theme polygons) against Overture. <<< -### Schema landscape +### The demo data + +MGCP cell **W079N26** — western Bahamas, 1:100K, captured 2015 by UK MOD against TRD 3.0. -MGCP, TRD, TDS, DGIWG — and where GERS fits. +- ~1,300 polygon features across 34 fcodes +- Mostly ocean; sparse capture +- Seven Overture types: `buildings/building`, `building_part`, and five `base/*` polygon types -_Slide TODO_ +_The methodology applies unchanged to denser data and other schema versions._ <<< @@ -140,21 +142,79 @@ _Slide TODO_ For each polygon pair we compute: - **IoU** (Intersection over Union) -- **Centroid containment** +- **Centroid containment** (Overture centroid inside MGCP polygon) + +| Tier | Condition | +| ---- | --------- | +| High | `IoU >= 0.5` | +| Low | `IoU >= 0.3` _and_ centroid containment | + +_Reproject both sides to a metric CRS (UTM 17N) before computing._ + +<<< + +### Results: clean → friction -A pair matches when IoU ≥ 0.5, **or** IoU ≥ 0.3 with the Overture centroid inside the MGCP polygon. +Seven passes, ordered from cleanest to messiest: + +| Pass | Result | +| ---- | ------ | +| `buildings/building` | 412 → 350 matched (85%), 99% clean | +| `buildings/building_part` | 11 / 136 — schema design mismatch (parents, not parts) | +| `base/infrastructure`, `land_use`, `water` | 69 / 59 / 16; mostly clean, occasional aggregation | +| `base/land`, `land_cover` | 125 / 247; cardinality diagnostic earns its keep | <<< -### Results: seven passes +### Cardinality reporting + +Each MGCP polygon gets a global label across all seven passes: + +- **clean** — every matched pass is 1:1 +- **aggregated** — one MGCP polygon → many Overture features +- **fragmented** — many MGCP polygons → one Overture feature +- **mixed** — both patterns appear +- **unmatched** — no Overture matches in any pass + +Notes: +Unmatched is NOT an audit result. A polygon may be unmatched because of an Overture coverage gap, a real-world change, an IoU threshold miss, or because MGCP captured a feature that has no Overture polygon counterpart at any scale. + +<<< + +### Two-rate diagnostic + +| Pattern | Example | What it means | +| ------- | ------- | ------------- | +| High match, high clean | AL015 Building (85% / 99%) | Direct GERS ID works | +| Low match, high clean | BH080 (32% / 100%) | Coverage gap, not schema mismatch | +| High match, low clean | BA030 Island (66% / 3%) | Needs a link table | +| Zero match | BA040 Tidal Water | Defer to a different theme | + +<<< + +### GERS adoption — five buckets + +With thresholds `MATCH_RATE_HIGH=80`, `CLEAN_RATE_HIGH=80`, `MIN_SAMPLE=5`: + +
      +
    • Direct GERS ID attachment — high match + high clean
    • +
    • Link table — high match + low clean
    • +
    • Deferred — zero match (different geometry/theme needed)
    • +
    • Review — partial / mixed
    • +
    • Insufficient sample — below threshold
    • +
    + +<<< + +### Finding: the link-table bucket is nearly empty + +The skeleton expected it to be substantial. The data disagrees. -| Overture type | Behavior | -| ------------- | -------- | -| `buildings/building` | Clean case — ~99% land in the clean bucket | -| `buildings/building_part` | Schema-design mismatch surfaces in low match count | -| `base/infrastructure`, `base/land_use`, `base/water` | Mostly clean with some aggregation | +At 1:100K capture, the cardinality problem in this cell is binary: +- Codes with high match rates **don't fragment** +- Codes that fragment (BA030 Island, EC030 Trees) have **moderate match rates** → they land in `review` -_Slide TODO: expand with notebook screenshots_ +_A finding about the data, not a flaw in the methodology._ <<< From a0d90b0d5f33df24a71defb22372da0b4612f832 Mon Sep 17 00:00:00 2001 From: Dana Bauer Date: Tue, 12 May 2026 02:08:42 +0000 Subject: [PATCH 6/6] [Docs] mark slides as work in progress; remove links from lessons - Drop "View as Slideshow" entries from the README home page and the lesson 1 top nav row, so the slideshow isn't promoted to attendees while the deck is incomplete. - Add a subtle italic "work in progress" note under the title on the slideshow's intro slide. Signed-off-by: Dana Bauer --- 1-what-is-overture.md | 2 +- README.md | 2 -- slides/intro.md | 2 ++ 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/1-what-is-overture.md b/1-what-is-overture.md index ab09b82..19d832d 100644 --- a/1-what-is-overture.md +++ b/1-what-is-overture.md @@ -1,6 +1,6 @@ # 1. What is Overture Maps? -| [View as Slideshow](https://labs.overturemaps.org/workshop/slides/index.html#/0/1) | [Home](README.md) | [2. Data Access >>](2-accessing-data.md) | +| [Home](README.md) | [2. Data Access >>](2-accessing-data.md) | ![Overture Maps Homepage](img/homepage.png) diff --git a/README.md b/README.md index b81e4c2..696b96a 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,6 @@ ## Workshop Lessons -[View as Slideshow](https://labs.overturemaps.org/workshop/slides/index.html) - 1. [What is Overture Maps?](1-what-is-overture.md) 2. [Exploring Overture Maps Data](2-accessing-data.md) 3. [Accessing Overture Maps GeoParquet with DuckDB](3-geoparquet-duckdb.md) diff --git a/slides/intro.md b/slides/intro.md index 6fef0aa..43ecc0b 100644 --- a/slides/intro.md +++ b/slides/intro.md @@ -2,6 +2,8 @@ # Overture Maps Data Workshop +_Slides — work in progress_ + [github.com/overturemaps/workshop](//github.com/overturemaps/workshop) [labs.overturemaps.org/workshop/](//labs.overturemaps.org/workshop/)