Skip to content

Charismatic samples audit: swap or annotate front-page features #142

@rdhyee

Description

@rdhyee

Summary

Raymond observed that most of the four "charismatic" featured samples in the ## Showcase: Real Samples from the Collection section of index.qmd are illustrative, not actual records in the iSamples Zenodo export. This audit verifies each against the Jan 2026 wide parquet (zenodo_wide_2026-01-09.parquet, 20M entities).

TL;DR: 1 of 4 is in-collection. None have thumbnails yet (sidecar not merged). The section header overstates.

Audit results

# Image (local asset) Visible label on page Linked identifier Classification In Zenodo wide? Thumbnail? Recommendation
1 assets/IGSN_10.58052_DIA0000YL.png Diamond, Brazil, 2019-06-11 doi.org/10.58052/DIA0000YLIGSN:DIA0000YL not-in-collection No (no DIA0000YL, range is DIA000004-DIA000009+) N/A Swap
2 assets/IGSN_10.58052_IEGIL000C.png Fossil coral, Cayman Islands, 10000 BCE doi.org/10.58052/IEGIL000CIGSN:IEGIL000C in-collection-no-image Yes (SESAR, real label 18LCI-5, 19.67N 80.10W) None (SESAR 0% thumbnail coverage) Keep PID, fix caption + needs SESAR sidecar
3 assets/ark_65665_337856f1a655e4ad78b1ef10a16dfb6e3.png Paracirrhites arcatus, French Polynesia, 2006-03-10 ark:65665/337856f1... (Smithsonian NAAN) not-in-collection No (hash 337856f1... not in 1.18M Smithsonian records) N/A Swap
4 assets/ark_28722_r2p24_vdm_19600211.png Red-figure askoi, late-4C-early-3C BCE, Murlo, Italy ark:28722/r2p24/vdm_19600211 (OpenContext VdM) not-in-collection No (VdM range in export is vdm_20060001-vdm_20175070, 1755 records — 19600211 is pre-2006 catalog) N/A Swap

Net: 1 real (coral), 3 illustrative. Even the one real sample has a descriptive caption ("Fossil coral") that doesn't match its actual iSamples label (18LCI-5).

Thumbnail coverage (system-wide)

Source MaterialSampleRecord count With thumbnail_url %
SESAR 4,688,386 0 0.00%
OPENCONTEXT 1,064,831 0 0.00%
SMITHSONIAN 322,161 0 0.00%
GEOME 605,554 0 0.00%

The enrichment sidecar pattern (Raymond's endorsed approach, 2026-04-17) has not yet been merged into the published wide parquet — so even the in-collection coral (#2) cannot be rendered with its own real thumbnail. Until sidecars ship, any showcase must use either (a) images pulled live from source repositories (opencontext.org, geosamples.org, collections.nmnh.si.edu), or (b) remain illustrative with a disclosure.

Recommended swaps (real, in-collection candidates)

For #1 Diamond → real SESAR IGSN:DIA* diamond

  • IGSN:DIA000004 — label: Russia_Mirny_03072014_52768, Mirny diamond mine, Russia (66.4N, 129.2E). Iconic locality.
  • IGSN:DIA000009 — label: DRC_Mbuji-Mayi (Miba)_03072014_55864, DRC (-6.2N, 23.0E). Geographic diversity.

For #3 Fish (Paracirrhites arcatus) → same species or sibling, real PID

  • GEOME ark:/21547/CXs2MParis0001Paracirrhites arcatus MParis0001, French Polynesia (-17.48, -149.88). Same species, same region.
  • Smithsonian ark:/65665/3f9768717-39ce-4422-a86e-932be0555df7Centropyge loriculus AG0IV08 (Flame angelfish), French Polynesia (-17.47, -149.93). Real Smithsonian ARK, charismatic reef fish.

For #4 Red-figure askoi → real OpenContext Murlo vessel

  • ark:/28722/r2p24/vdm_20155002 — label: Pottery VdM20155002, Murlo (Poggio Civitate) excavation. Same site, real PID, in VdM 2015 catalog.
  • ark:/28722/r2p24/vdm_20155120 — label: Object VdM 20155120, same site.
  • Search did not find "Red-figure" or "askos" explicitly in VdM labels — if visual fidelity to a red-figure vessel matters more than site-provenance, we may need to pull from a different OC project (e.g., Etruscan collections).

Sidecar rollout implications

Only one of the four current features (coral) is in-collection-no-image. So the sidecar story here is narrower than I feared: even if we swap #1/#3/#4 to real PIDs, none of them would render from the published parquet today — every one would be in-collection-no-image.

Options:

  1. Short term (ship this week): Swap the 3 illustrative PIDs to real ones per above; keep local /assets/*.png renderings but update captions + alt text to name the real PID. Add a footnote: "Images are curator-selected illustrations; data records resolve via the linked identifiers."
  2. Medium term (sidecar): Prioritize SESAR + GEOME + Smithsonian sidecars so these 4 samples can render from upstream (SESAR has IGSN landing pages with images; Smithsonian NMNH has collection pages; GEOME/OpenContext are partners we can request from). This promotes all four from in-collection-no-imagein-collection-with-image.
  3. Long term: After sidecar merge, the Showcase becomes queryable — pick top-N by photographic quality across all sources automatically.

Verification method

import duckdb
WIDE = 'https://data.isamples.org/current/wide.parquet'  # or local
con = duckdb.connect()
con.sql(f\"SELECT pid, label, n as source, thumbnail_url FROM read_parquet('{WIDE}') WHERE pid = 'IGSN:IEGIL000C'\").df()

Also cross-checked via altids / alternate_identifiers arrays and place_name / label ILIKE fuzzy searches. Source: ~/Data/iSample/pqg_refining/zenodo_wide_2026-01-09.parquet (20M rows, 6.7M MaterialSampleRecord).

Source file to edit

index.qmd lines 22-32 (the layout-ncol=4 block with 4 image/link pairs).

Related

  • Sidecar pattern discussion: per Raymond's 2026-04-17 endorsement (per-source parquet sidecars keyed by pid, LEFT-JOINed into wide at build time).
  • Landscape draft on Project 7: PVTI_lADOBCDimM4BS_FQzgq58y0.

Filed as part of the iSamples Grant Closeout (May 2026) site cleanup. Report generated by audit run 2026-04-24.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions