diff --git a/2026-usrse-con-talk.html b/2026-usrse-con-talk.html new file mode 100644 index 0000000..8e9a848 --- /dev/null +++ b/2026-usrse-con-talk.html @@ -0,0 +1,1099 @@ + + + + + + + + + [WiP] Reuse, Compose, Extend, Standardize, Automate: Two Decades of RSEing Open (Neuro)Science at CON + + + + + + + + + + + + + + + +
+
+ + + + + +
+
+ +

+ WiP + Reuse, Compose, Extend, Standardize, Automate +

+

+ Two Decades of RSEing Open (Neuro)Science at CON +

+ +
+ + + + + + + + + +
+ Yaroslav O. Halchenko
+ + Cody Baker, Austin Macdonald, Isaac To, Vadim Melnik — the CON team
+ @yarikoptic + @yarikoptic@fosstodon.org +
+
Center for Open Neuroscience +
Department of Psychological and Brain Sciences +
Center for Cognitive Neuroscience
+ Dartmouth College
+ New Hampshire, USA
+ + +
+
+
+ + US-RSE’26 — Research Software Engineers Conference, October 2026
+ Live slides/Sources: + https://datasets.datalad.org/centerforopenneuroscience/talks/2026-usrse-con-talk.html +
+ + + + + + + + + + + + +
+
+
+ + + +
+
+ +
+
+ + + +
+
+

Two decades, five verbs

+ + + + + + + + + + + + + + + + + + + + + +
Reusejoin an upstream instead of writing one
Composesmall modules over silo'd monoliths
Extendstay on as maintainer of what you depend on
Standardizea common language — for humans and agents
Automateif you don’t, it doesn’t happen at scale
+

+ Same verbs, before and during the “Age of AI”.
+ AI just makes the fifth one — Automate — cheap enough to apply to the harness itself. +

+ +
+ + +
+

When it began for us

+
    +
  • 2005 — Reuse: pkg-exppsy in Debian (FSL, PyEPL) → NeuroDebian
  • +
  • 2007 — Reuse + Automate: PyMVPA with tests + buildbot CI before it was common
  • +
  • 2009 — Extend: NeuroDebian published in Front. Neuroinform.
  • +
  • 2013 — Compose: first commit of DataLad on top of git + git-annex
  • +
  • 2014 — Standardize: BIDS co-authored / Open Brain Consent born
  • +
  • 2016 — Automate: ReproIn + HeuDiConv — DICOM → BIDS at the scanner, no human in the loop
  • +
  • 2018 — YODA principles posted at OHBM
  • +
  • 2019 — DANDI opens for neurophysiology archival; Automated dandiset ↔ DataLad mirroring + con/tinuous CI-log archival
  • +
  • 2024+ — registry.datalad.org, con/duct, AnnexTube, con/serve, AI-aware tooling (con/skills, con/yolo)
  • +
+
+
+ + + +
+ +
+

Reuse — join, don't fork

+

The cheapest reproducible thing is the one you didn't have to build.

+
+ + +
+ + + +

Born in 2009 by joining Debian, not starting a new distro + +
+ + Y. O. Halchenko and M. Hanke. Open is not enough. Let's take the next step: + An integrated, community-driven computing platform for neuroscience. + Frontiers in Neuroinformatics, 6(00022), 2012. doi: 10.3389/fninf.2012.00022 + +
+ +
+

NeuroDebian: a textbook bridge

+ + +

Eventually most of it dissolved upstream into Debian Med, + Debian Science, and conda-forge. A successful bridge dissolves into the commons.

+
+ + +
+ +
+ Born in 2007 — Python ML library for neuroimaging,
+ full test suite + buildbot CI before either was common in scientific Python.
+ + + M. Hanke, Y. O. Halchenko, P. B. Sederberg, et al. PyMVPA: A Python toolbox for + multivariate pattern analysis of fMRI data. Neuroinformatics, 7:37–53, 2009. + doi: 10.1007/s12021-008-9041-y +
+ +
+

Today we'd contribute upstream — and we do

+ +

+ Take home: before your next project, do one upstream-search pass. + Add pytest + a one-file CI workflow to one repo this week. +

+
+ +
+ + + +
+ +
+

Compose — small modules, common substrate

+

Sandwich, don't silo.

+
+ + +
+

The DataLad sandwich

+

Layered tech that everyone already knows

+
+graph TB
+    user --> datalad[datalad get file] --> git-annex[git-annex get file] -.-> git-annex-remote-archives --> git-annex2[git-annex get --key XXX.tar.gz] -.-> git-annex-remote-datalad
+
+    datalad --> git1[git]
+    git-annex --> git2[git]
+    git-annex2 --> git3[git]
+
+    classDef green fill:#40bf4c,stroke:#333,stroke-width:1px;
+    classDef orange fill:#ffa200,stroke:#333,stroke-width:1px;
+    classDef red fill:#f44d27,stroke:#333,stroke-width:1px;
+    class datalad orange
+    class git-annex-remote-archives orange
+    class git-annex-remote-datalad orange
+
+    class git-annex green
+    class git-annex2 green
+
+    class git red
+    class git1 red
+    class git2 red
+    class git3 red
+	
+

DataLad: distributed system for joint management of code, data, and their relationship. Halchenko et al., JOSS 2021.

+
+ + +
+

DataLad extensions: domain-agnostic core, à la carte add-ons

+
+graph LR
+ datalad --> datalad-crawler
+ datalad --> datalad-container
+ datalad --> datalad-neuroimaging
+ datalad --> datalad-osf
+ datalad --> datalad-fuse
+ datalad --> datalad-xnat
+ datalad --> datalad-ukbiobank
+ datalad --> datalad-next
+ datalad --> ...
+ classDef orange fill:#ffa200,stroke:#333,stroke-width:1px;
+ classDef orangy fill:#ffc200,stroke:#333,stroke-width:1px;
+ class datalad orange
+ class datalad-crawler orangy
+ class datalad-container orangy
+ class datalad-neuroimaging orangy
+ class datalad-osf orangy
+ class datalad-fuse orangy
+ class datalad-xnat orangy
+ class datalad-ukbiobank orangy
+ class datalad-next orangy
+
+

Extensions = separate Python packages, ride on + datalad-extension-template + — your domain plugs in, neuroimaging is just one of many.

+
+ + +
+

Federate, don’t recentralize

+ +

registry.datalad.org indexes + DataLad datasets across institutions and clouds — discovery over + petabytes, no platform required.

+
+ + +
+

Small acquisition & compute units

+ + + + + +
+
    +
  • ReproIn — sequence naming convention at the scanner
  • +
  • HeuDiConv — DICOM → BIDS, heuristic-driven
  • +
  • ReproStim — record audio/video stimuli, sliced into BIDS
  • +
  • NeuroConv — convert physiology data → NWB
  • +
  • con/nwb2bids — bridge the two flagship standards (NWB → BIDS layout)
  • +
  • ReproNim/containers — curated singularity containers
  • +
  • con/duct — minimal process monitor (built on brainlife's smon)
  • +
+
+ +

Each does one thing. Glue is YODA.

+
+

+ Take home: wrap your next pipeline run in duct, + or pull a recipe from repronim/containers. + Convert one shared-data folder to a DataLad dataset. +

+
+ +
+ + + +
+ +
+

Extend — stay on as maintainer

+

Ship pragmatic now, formalize upstream later.

+
+ +
+

Where we stayed

+
    +
  • Debian / Debian Med — pkg-exppsy → NeuroDebian → + Debian Med & Debian Science (and on into conda-forge). + The bridge dissolves into the commons.
  • +
  • citeproc-py + — we built duecredit on it; when its maintainer stepped back, we stepped in.
  • +
  • BIDS Steering Group + and BIDS-DICOM WG-16 — co-evolving the standard, not just consuming it.
  • +
  • BEP028 + — DataLad's ad-hoc RUNCMD provenance is being lifted into BIDS provenance + a BIDS prov exporter.
  • +
  • NWB + Kitware + — sustained academic-RSE × non-academic-engineering collaboration on browse/analyze/visualize tooling.
  • +
+
+ + +
+

From ad-hoc to standard, in one slide

+
+flowchart LR
+  RUNCMD["DataLad RUNCMD
(2015, ad-hoc JSON in commit msgs)"] --> BEP["BEP028
BIDS provenance"] + RUNCMD --> EXP["bids-prov-exporter
(WiP)"] + BEP --> COMMUNITY["BIDS standard
(community-owned)"] + EXP --> COMMUNITY + classDef orange fill:#ffa200,stroke:#333,stroke-width:1px; + classDef green fill:#40bf4c,stroke:#333,stroke-width:1px; + class RUNCMD orange + class BEP green + class COMMUNITY green +
+

+ Take home: the next time you reach for your own JSON-in-a-commit-message format, + ask which upstream standard it should fold back into. +

+
+ +
+ + + +
+ +
+

Standardize — a language for HI and AI

+

All standards are “bad”*, but some are used.

+

* — D. Clunie, MICCAI 2017.

+
+ + +
+

Data: BIDS & NWB

+ +
+ + + Gorgolewski, K. J., Auer, T., …, Halchenko, Y. O., …, Poldrack, R. A. (2016). + The brain imaging data structure, a format for organizing and describing outputs of + neuroimaging experiments. Scientific Data, 3:160044. + +

You've seen one BIDS dataset, you've seen them all. That's the point.

+
+ + +
+

BIDS as a meta-standard

+ +

Not a wire format — a contract. Validators, BIDS-Apps, BEPs all hang off it.

+
+ + +
+

Metadata: schemas as first-class citizens

+
    +
  • Data standards (BIDS, NWB) make data exchangeable.
  • +
  • Many schema languages, one idea: describe the model, then derive everything else. +
      +
    • LinkML — one source → JSON Schema, OWL/SHACL, pydantic, docs — used by the DANDI & NWB schemas, concepts.datalad.org.
    • +
    • pydantic — runtime validation in Python, used widely in DANDI/NWB tooling.
    • +
    • JSON Schema — the lingua franca; what BIDS extensions and the DANDI meditor read.
    • +
    • SHACL — RDF-side shape constraints (the substrate shacl-vue consumes).
    • +
    +
  • +
  • The Model becomes a generator: validators, APIs, editor UIs, and even + research-group websites all derive from it (we'll see how on the MVC slide).
  • +
  • Standardize the consent too: Open Brain Consent (OBC) + (born 2014; HBM 2021) gives labs and IRBs a community-vetted form + instead of bespoke wording, with translations into 8+ languages.
  • +
+

+ Take home: validate one of your datasets against a community standard this month; + if you maintain a schema, publish it as a real artifact (LinkML / pydantic / JSON Schema), not as “what the code happens to accept”. +

+
+ +
+ + + +
+ +
+

Federated archives, built on the standards

+ + + + + + +
+
+ DANDI — neurophysiology +
+ EMBER
+ microscopy / behavior +
+
+ OpenNeuro — MRI/EEG/MEG +
+

Different domains, the same shared substrate — + every layer is a community project, not a vendor lock-in:

+ + + + + + + + + + + + + + + + + + + + + +
LayerWhat we work on / contribute to
Common standardsBIDS, NWB, OME-NGFF, HED, DICOM — co-developed with the BIDS Steering Group, NWB, and DICOM WG-16
Infra (object storage)AWS S3 + MinIO for self-hosted/test deployments — same API, swap-in compatible; backed by institutional Dropbox/S3 Glacier mirrors
Data logisticsgit + git-annex + DataLad — content-addressed, federated, no service required to read the bytes
ProcessingContainers (ReproNim/containers, Apptainer/Singularity, OCI), BIDS-Apps (mriqc, fmriprep, …); container-aware datalad containers-run
+

Federation is what lets a small RSE center reach population scale — + because every layer is interchangeable and the Model lives in standards, not services.

+
+ +
+

What's in DANDI

+ +
+ +
+

DANDI ecosystem — integrate, don't re-implement

+ +
+ +
+ + + +
+ +
+

Reuse, in reverse

+

Your data is yours, even when someone else hosts it.

+
+ +
+

Pull content back into git + git-annex

+ + + + + + +
+ AnnexTube
+ YouTube videos + transcripts + comments → DataLad/git-annex. + Demo: ReproTube. +
+ mykrok
+ Personal knowledge bag — pulls Google & cloud silos back into a versioned local archive. +
+ con/serve
+ “The Vault”: a configuration-driven archival of Slack, Zoom, GitHub, AI sessions, … into DataLad subdatasets. +
+

+ WiP — Wayback archive + (under con/serve): a local archive of the Internet Archive’s + archive of your project’s archive — in case the IA itself goes away. + Reuse-in-reverse, with a safety net. +

+

+ Take home: pick one commercial silo your group depends on. + Set up a one-shot pull-back into a git+git-annex archive. + You'll thank yourself in two years. +

+
+ +
+ + + +
+ +
+

Automate — or it doesn’t happen at scale

+

If a human has to remember it, it’s already broken.

+
+ +
+

Where we automate

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
What gets automatedHow — concretely
Building & testing the stackPR-level CI on every DataLad change; datalad-extensions tested daily against core; git-annex built & tested daily — status emailed to a human only if red
CI logs + artifacts — before they expirecon/tinuous archives logs & artifacts from GitHub Actions / Travis / AppVeyor into git-annex/DataLad — a clean “pull commercial ephemera back” example, very modular by design
Container building & distributionReproNim/containers auto-rebuilds + auto-snapshots; shub mirror of singularity-hub.org
Acquisition → standardized dataReproIn + HeuDiConv at the scanner; ReproStim recording all stimuli
Archive mirroring & smoke-testingDandisets auto-mirrored into DataLad on GitHub; webshots of every dandiset for timing & smoke tests; trivial-IO sweep across all dandisets
Validation harmonizationcon/validation + dandi-cli — every release runs the full multi-validator gauntlet
Releases & docsscriv + datalad/release-action; auto-deployed handbook + per-archive docs
+

+ None of the above scales without automation. None. + “If a human has to remember to do it” = “it won’t happen for the next maintainer”. +

+
+ +
+

Cost: harnesses, harnesses, harnesses

+
    +
  • Each automation is itself code — CI YAML, hooks, + nightly cron, deploy scripts, glue. Someone maintains that.
  • +
  • By 2024 the harness was bigger than several of the projects under it. +
    This is the real reason a small RSE center was + dragged toward AI assistance — not for the science, but for + the maintenance of the meta-layer.
  • +
  • The next bullet is what the rest of this talk has been + quietly setting up …
  • +
+

+ WiP — the meta-automation move: + using Claude Code + + con/skills + + con/yolo to automate the + maintenance of our automations — CI triage, PR review, + dependency updates, log analysis, doc updates. The cost of writing + a new automation drops — so does the cost of keeping it + working when its 17 transitive dependencies break next Tuesday. +

+

+ Take home: automate one repetitive chore + this week (CI matrix, daily smoke test, release script). Then write + down who will maintain that automation — if the answer + is “me, alone, forever”, plan how AI agents take half + the load. +

+
+ + +
+

The five verbs climb the SciOps ladder

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SciOps level (Johnson et al., 2024)Our verb(s)Concrete practice
L1 InitialAd-hoc, custom, manual. Where most labs start.
L2 ManagedCompose (lab-local)Lab-wide standardized processes — YODA layout, internal pipelines.
L3 DefinedReuse · Compose · Extend · StandardizeOpen-source ecosystems + FAIR data + FAIR workflows. The paper’s own L3 exemplars: BIDS, NWB, DataLad/git-annex, DANDI, brainlife.io.
L4 ScalableAutomateSciOps pipelines” — semi-automated continuous workflows across experimental design, collection, processing, analysis, dissemination. Our CI + con/tinuous + auto-mirrored dandisets + ReproIn/HeuDiConv at the scanner are exactly this.
L5 OptimizingAutomate × AIClosing the discovery loop — AI partnered with humans, automating experiments and the harness. Where con/skills + con/yolo + AI-aware tooling are heading.
+

+ Johnson et al. note that most neuroscience teams sit at L1–L2; + the 5 verbs are the practical climb to L3→L4, and AI is what makes L5 plausible + for a small RSE center instead of just a large consortium. +

+
+ +
+ + + +
+ +
+

For HI and AI

+

Same self-contained artifacts. Two readers.

+
+ +
+

Why every layer matters now

+
    +
  • Self-contained, well-described, openly shared → legible to HI + (human investigators) and AI agents.
  • +
  • STAMPED principles + (Self-containment, Tracking, + Actionability, Modularity, + Portability, Ephemerality, + Distributability) — + Tracking extends naturally to AI sessions and AI↔human attribution.
  • +
  • Operational-maturity counterpart: SciOps + (Johnson et al., 2024). The just-shown 5-verb ↔ 5-level mapping + is not coincidence: the SciOps paper itself names BIDS, NWB, DataLad/git-annex, + DANDI, brainlife.io as canonical Level 3 (Defined) exemplars, and reserves + Level 5 (Optimizing) for “closing the discovery loop with the assistance + of artificial intelligence”.
  • +
  • The AI-coding maturity ladder (companion talk) + works only on top of versioned, modular, standardized artifacts — i.e. + you need at least SciOps L3 before agents can do anything useful. + Without it, agents thrash.
  • +
+

+ AI doesn’t change the verbs — it raises the cost of skipping them, + drops the cost of the fifth verb (Automate) enough to apply it to the harness itself, + and turns the gap between SciOps L4 and L5 from “requires a consortium” + into “feasible for a small RSE center”. +

+
+ + +
+

HI ↔ AI — every project picks its own policy

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StanceOSS exemplarsCON / adjacent exampleSciOps fit / what it relies on
1. Reject any AI-generated contentZig, Krita, Clojure, QEMU; in practice git-annex (Joey Hess’s “policy” is famously a satirical Feb-30 joke)git-annex upstream contributions — pure-HI commitsL3 floor; AI use, if any, stays ephemeral — never enters the tracked artifact
2. Accept with disclosure (AI-assisted ok; must be marked)NumPy, Kubernetes, Linux kernel, DjangoDataLad, DANDI — HI commits; AI may have helped; Co-Authored-By: Claude… trailers, @pytest.mark.ai_generatedL3 → L4; leans hardest on STAMPED T (Tracking) — provenance is non-negotiable
3. Spec-driven AI-generated (HI specs, AI writes, HI reviews)Increasingly common; rare as declared upstream policyAnnexTube, mykrok, con/citations-collector, parts of dandi-cli (LAD specs + AI-generated tests)L4; leans on T + Actionability + Modularity — the spec is the contract
4. Autonomous agents in the loopActive research everywhere; rarely a declared production policycon/skills + con/yolo workflows for triage, PR review, dep updatesL5 (Optimizing); all of STAMPED + the harness itself becomes the type-checker
+

+ Common ground (all four stances): + AI cannot be an author (ICMJE Jan 2026); + humans retain full responsibility; AI use must be disclosed in the artifact itself + (commit trailer, methods section, acknowledgments). The STAMPED Tracking principle is + what makes that mechanical instead of aspirational. +

+

+ Survey of declared OSS stances: + melissawm/open-source-ai-contribution-policies + — community-maintained catalog of project policies, three+one buckets: + Accept / Restrict / Reject / Ongoing. +

+

+ Take home: declare your project’s stance, in writing, + before someone files a PR with an undisclosed Claude trailer. + “We haven’t decided” is itself a policy — just a bad one. +

+
+ +
+ + + +
+ +
+

Why it composes — MVC at the stack scale

+

Models, Controllers, Views — just bigger than usual.

+

+ The five verbs (Reuse / Compose / Extend / Standardize / Automate) are practices.
+ This is the architecture they keep producing. +

+
+ + +
+

Models — multiple, layered, standard

+ + + + + + + + + + + + + + + + + + + + + +
LayerExamples
Dataset layout — how files live togetherBIDS; YODA;
BIDS-like DuckDB hive in mykrok & AnnexTube
Per-file — how one file is encodedNWB; NIfTI; TSV / Parquet / HDF5 / Zarr; DICOM
Metadata schemaLinkML, pydantic, JSON Schema, SHACL
used in concepts.datalad.org, the DANDI / NWB / BIDS schemas, BIDS extensions
Storage — bytes-on-disk, distributedgit + git-annex + DataLad as a content-addressed model
(S3, IPFS, NFS, … appear as remotes)
+

+ Multiple models, but each is standardized and inspectable. + Pick the one your problem needs; the others stay out of the way. +

+
+ + +
+

Views — humans, agents, machines

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
AudienceView(s)
Browse-the-archive humansdatasets.datalad.org, DANDI, OpenNeuro, EMBER; mykrok / ReproTube web UIs
Tabular / ad-hoc analysisDatasette on the web; VisiData locally; pandas / polars / DuckDB
Programmatic (humans + agents)PyBIDS, matlab-bids, pynwb; datalad.api; archive REST APIs; fsspec; FUSE
External services that just plug inbrainlife.io, CONP, CBRAIN, Kitware NWB tooling, MetaCell NWB Explorer
Schema-driven UIs
the View generated from the Model
vjsfDANDI meditor (JSON Schema→form);
+ shacl-vue from psychoinformatics-de — SHACL→forms and entire research-group websites
Long-form narrativehandbook.datalad.org; per-archive docs; the standards’ specs themselves
+

+ Same Model materializes many Views — archive UIs, editor forms, websites, the spec itself.
+ See M. Hanke et al., + LinkML metadata-driven workflow + (ReproTube; YouTube). +

+
+ + +
+

Controllers — small, single-purpose

+ + + + + + + + + + + + + + + + + + + + + + + + + +
JobExample controller
Acquisition → standardized layoutReproIn, HeuDiConv (DICOM→BIDS), NeuroConv (raw→NWB), con/nwb2bids (NWB→BIDS), ReproStim
Reproducible executiondatalad run / containers-run; con/duct; ReproMan
Logistics — move/get/store datagit-annex special remotes (rclone, S3, datalad-archives, …)
Data → derived dataBIDS-Apps (mriqc, fmriprep, …); DataLad crawler
Validation (and harmonization)bids-validator (+ HED), pynwb, zarr, nwbinspector, OME-Zarr, LinkML validators — harmonized via con/validation (deployed in dandi-cli) into one validation-result Model
+

+ Each does one thing. They couple to the Model, not to each other. +

+

+ The punchline: pick any cell — Model, View, or Controller — + and swap it out. The rest still works.
+ That is why the stack composes, why our code stays small, + and why an HI or an AI agent can pick it up cold. +

+

+ Contrast: a typical academic “service-tied UI” + (LIMS / ELN / bespoke Flask portal) welds View to Controller to a + running backend — lose the server, lose the data. + Cf. Cockburn’s + Hexagonal Architecture / Ports & Adapters (2005) + and Evans’s Smart UI anti-pattern. +

+
+ +
+ + + +
+
+

Your Monday checklist

+

Pick one. Today is fine.

+
+ +
+

6 things to copy on Monday

+
    +
  1. Reuse: do one upstream-search pass before your next project, and add pytest + a one-file CI workflow to one repo this week.
  2. +
  3. Reuse → Extend: file an ITP at Debian Med or open a conda-forge feedstock for one tool you currently distribute by URL.
  4. +
  5. Compose: convert one shared-data folder into a DataLad dataset; wrap one pipeline run in duct.
  6. +
  7. Standardize: validate one of your datasets against a community standard.
  8. +
  9. Automate: automate one repetitive chore (CI matrix, daily smoke test, release script, or archival pull-back via con/tinuous) — then write down who maintains that.
  10. +
  11. Reuse-in-reverse: pull content back from one commercial silo your group depends on into a git+git-annex archive.
  12. +
+

+ None requires neuroscience. All work today. +

+
+
+ + + +
+

Acknowledgements

+ + + + + +
+
+
Software & communities
+
+
    +
  • Joey Hess (git-annex)
  • +
  • The DataLad team & contributors
  • +
  • Debian / Debian Med / Debian Science / conda-forge
  • +
  • BIDS & NWB communities
  • +
  • distribits community
  • +
+
+
Some slides origin:
+
+ +
+
+
+
Funders
+ + + + +
+ + + + +
+
Collaborators
+
+ + + + +
+
+ + + + +
+
+
+ + + +
+

Thank you!

+

Reuse. Compose. Extend. Standardize. Automate.

+

+ + + +

+

Slides: datasets.datalad.org/…/2026-usrse-con-talk.html — CC-BY-SA

+
+ + + + + + + +
+
+ + + + + + + + + diff --git a/2026-usrse/projects-audit.md b/2026-usrse/projects-audit.md new file mode 100644 index 0000000..adc231b --- /dev/null +++ b/2026-usrse/projects-audit.md @@ -0,0 +1,171 @@ +# CON projects audit — what the US-RSE talk does *not* mention + +Source comparison: `gh api orgs/con/repos` (active, non-archived) **vs.** +projects named in `2026-usrse-con-talk.html` and `talk-proposal-draft.md`. +Also covers projects that *were* in older CON talks but dropped from the +current US-RSE deck, plus a few key external/CON-external projects worth +re-considering. + +- First sweep: 2026-05-10. +- **Refreshed 2026-05-11** after applying the original recommendations + 1–5 and adding the Automate / SciOps mapping. + + +## Currently named in the US-RSE deck / proposal + +### CON repos and CON-led projects +**con/duct**, **con/serve**, **con/validation**, **con/tinuous**, +**con/skills**, **con/yolo**, **con/citations-collector**, +**con/nwb2bids**, **con/annextube** (AnnexTube), +**registry.datalad.org**, **concepts.datalad.org**, +**open-brain-consent (OBC)** — referenced in the Standardize section +and Acknowledgements; **shub** mirror — visible by URL in the Automate +section but not named as the repo `con/shub`. + +### DataLad family + extensions +DataLad core; `git`+`git-annex`; datalad-container, -crawler, -fuse, +-neuroimaging, -next, -osf, -ukbiobank, -xnat, -extension-template; +`datalad run` / `containers-run`; DataLad Handbook. + +### Standards & community projects (CON-adjacent) +BIDS, NWB, DANDI, OpenNeuro, EMBER, BIDS-Apps (mriqc, fmriprep), +LinkML, pydantic, JSON Schema, SHACL, HED, vjsf, shacl-vue, +ReproIn, HeuDiConv, NeuroConv, ReproStim, ReproNim/containers, +ReproMan, NeuroDebian, PyMVPA, duecredit, citeproc-py, nwbinspector, +pynwb, zarr, OME-Zarr. + +### Frameworks / principles (now load-bearing) +**STAMPED** (per-artifact, Macdonald *et al.* 2026) and **SciOps** +(team-operations CMM, Johnson *et al.* 2024) — mapped to the 5-verb +spine: Reuse / Compose / Extend / Standardize ≈ L3, Automate ≈ L4, +AI-in-the-loop ≈ L5. Plus **YODA**, **FAIR**, and the **AI-coding +ladder** (companion talk). + +### External tools / collaborators +brainlife.io, CBRAIN, CONP, Kitware (NWB tooling), MetaCell NWB +Explorer, neurobagel, Datasette, VisiData, Hexagonal Architecture +(Cockburn 2005), GoF Design Patterns, Fowler's PoEAA, +Eric Evans (DDD Smart UI anti-pattern), distribits, ReproTube. + + +## Status of original recommendations + +| # | Project | Status | Where it now appears | +|---|---------|--------|----------------------| +| 1 | `con/tinuous` | **Applied** | Automate § "Where we automate" + timeline (2019 milestone) + Monday checklist | +| 2 | `con/yolo` + `con/skills` | **Applied** | Timeline 2024+ row; Automate § meta-automation slide; HI+AI cross-ref | +| 3 | `con/citations-collector` | **Applied** | Reuse § (continuation of duecredit; feeds dandi-bib) | +| 4 | `con/nwb2bids` | **Applied** | Compose § small-units table + MVC Controllers row | +| 5 | `OBC` (open-brain-consent) | **Applied** | Standardize § "schemas as first-class citizens" + retained in logo strip | +| 6 | `fail2ban` | **Deferred** | Mentioned in `SOUL.md §1` Extend bullet, not in the deck — recommended for the BoF, not the 15-min talk | + + +## Active CON repos *still* not named (gap list, refreshed) + +### Worth a one-liner if space opens up + +- **`con/external-services`** — *Registry of external services to use + for YOUR hosted files*. Direct match for the "Federate, don't + recentralize" thread; could be a sentence next to `registry.datalad.org` + in the Compose section. +- **`con/shub`** — *GitHub mirror of `datasets.datalad.org/?dir=/shub`*. + Currently only visible via a `?dir=/shub` link in the Automate-section + container-building row; the repo itself isn't named. Quintessential + Reuse-in-reverse + archive story — could be promoted to a named line. +- **`con/noisseur`** — *Automated verification of entered/displayed + information*. Concept-stage; cited in `2022-nih-compcore.html` as + "Beyond ReproIn". A one-line nod under Controllers / acquisition + would round out the ReproIn / HeuDiConv / ReproStim family. +- **`con/upptime`** — *Uptime of CON websites & services*. Tiny but a + cheap "View on the operations Model" example for the MVC Views slide. +- **`con/flux`** — *Map of `git`/`git-annex` clones of a repository*. + Maps to the federation/discovery story; pair with `registry.datalad.org`. +- **`datalad-installer`** — was in `2024-distribits-datalad.html` § "To + provide peace to developers and users for deployment". Concrete + Reuse / distribution / multi-platform evidence; not load-bearing + for the US-RSE story but a natural Acknowledgements name. + +### Long-running maintenance work, not in the talk + +- **`fail2ban`** — Halchenko is co-maintainer; canonical "we stayed + on as upstream maintainer" example. *In `SOUL.md` but not the deck.* + Recommended for a BoF / longer talk, not the 15-min US-RSE slot. +- **`psychtoolbox-3-debian`** — long-running Debian-Med packaging; + same audience-fit call as fail2ban. + +### Mentioned in older talks, dropped from this one (deliberate) + +- **PyMVPA-on-phone** punchline (`2022-nih-compcore.html`) — the + "12.5 hours to happy time" deployment-pain story that motivated + NeuroDebian. Cut for length; appropriate to drop. +- **Phantom QA / Nuisance study** (Cheng & Halchenko, F1000 2020) — + excellent Trust/Variance content for an NIH audience, but off-topic + for US-RSE. Keep in references only. +- **Decentralized RDM** (Hanke *et al.*, Neuroforum 2021) — cited in + SOUL.md §5 canonical citations, not in the deck. + +### CON-internal / niche — probably keep skipping + +`CONveyor`, `catenate`, `cierge`, `communitator`, `con-intro`, +`demos`, `docflow`, `duct-gallery`, `ference`, `fscacher`, +`git-annex-log-stats`, `journals`, `jsdownloader`, `job`, +`liab-deployments`, `mind_2018`, `opfvta-reexecution`, `quest`, +`scripts`, `serve-liab`, `serve-actions`, +`serve-wayback-archive-demo`, `shell-chronicle`, `solidation`, +`sparkle-tools`, `taxonomy-site-sandbox`, `tents`, `tinuous-inception`, +`tinuous-template`, `tinuum`, `try-aind-1`, `tube`, `tributors`, +`vandermeerlab-to-bids`, `versations`, `visidata-demos`, +`work-history-data`. Some are namedrop-worthy if a future slide is +already on the topic (e.g. `visidata-demos` if a VisiData live demo +gets added; `vandermeerlab-to-bids` if BEP-032 comes up). + + +## What's gained since the first sweep (2026-05-10 → 2026-05-11) + +Beyond the audit-driven additions (1)–(5), the deck has grown three +substantial pieces that didn't exist in the first sweep and that +*were* the chief reason the audit asked for con/tinuous etc.: + +- **5-verb spine with Automate** — Reuse / Compose / Extend / + Standardize / **Automate** now appears in the title, spine slide, + timeline, every section, the Monday checklist, and the sign-off. +- **A full Automate section** — 4 slides: opener + "Where we + automate" + "Cost: harnesses → meta-automation via AI" + **"The + five verbs climb the SciOps ladder"** (the new SciOps L1–L5 + mapping slide, with the L3 exemplar list — BIDS / NWB / DataLad / + DANDI / brainlife.io — that Johnson *et al.* themselves call out). +- **SciOps + STAMPED as load-bearing references** — STAMPED is + cited canonically (Macdonald, Baker, To & Halchenko, 2026, with + the full Self-containment / Tracking / Actionability / Modularity / + Portability / Ephemerality / Distributability expansion); SciOps + is cited canonically (Johnson *et al.*, 2024) and framed as the + *team-level* maturity ladder complementing STAMPED's *per-artifact* + properties. The proposal abstract weaves the 5-verb ↔ SciOps L1–L5 + mapping into the framing paragraph. + +These together — not the individual project namedrops — are the +substantive shift the audit needed to acknowledge. + + +## Updated recommendation summary + +The list of project namedrops to *still* add is now short and +optional. In priority order (each is a single line): + +1. **`con/shub`** — promote from a `?dir=/shub` URL to a named repo + line in the Automate section's container-building row. One word + change, very on-spine (Reuse-in-reverse + archive). *Recommend + doing this.* +2. **`con/external-services`** — one cell in the Compose section + right after `registry.datalad.org`, framing it as a "federation + registry, plain text, no service required". *Recommend.* +3. **`datalad-installer`** — one line in Acknowledgements (or in the + Reuse → Extend distribution bullet). *Marginal.* +4. **`con/noisseur`** — drop in only if the Compose § acquisition + row needs more concreteness. *Skip for 15-min US-RSE.* +5. **`con/upptime`** / **`con/flux`** — fine to skip; cheap mentions + if the MVC Views slide opens up. *Skip.* +6. **`fail2ban`** / **`psychtoolbox-3-debian`** — save for the BoF. + +The audit no longer recommends anything that materially changes the +talk's shape; (1)–(2) are the only worthwhile micro-edits. diff --git a/2026-usrse/talk-proposal-draft.md b/2026-usrse/talk-proposal-draft.md new file mode 100644 index 0000000..666f995 --- /dev/null +++ b/2026-usrse/talk-proposal-draft.md @@ -0,0 +1,79 @@ +# Reuse, Compose, Extend, Standardize, Automate: Two Decades of RSEing Open (Neuro)Science at CON + +## Authors + +- Yaroslav O. Halchenko \, Department of Psychological and Brain Sciences, Dartmouth College / Center for Open Neuroscience, ORCID 0000-0003-3456-2493 +- Cody Baker \, Department of Psychological and Brain Sciences, Dartmouth College / Center for Open Neuroscience, ORCID 0000-0002-0829-4790 +- Austin Macdonald \, Department of Psychological and Brain Sciences, Dartmouth College / Center for Open Neuroscience, ORCID 0000-0002-8124-807X +- Isaac To \, Department of Psychological and Brain Sciences, Dartmouth College / Center for Open Neuroscience, ORCID 0000-0002-4740-0824 +- Vadim Melnik \, Department of Psychological and Brain Sciences, Dartmouth College / Center for Open Neuroscience, ORCID 0009-0007-3981-0798 + +## Keywords + +reuse, modularity, community standards, federation, reproducibility, distributed data management, neuroinformatics, FAIR, open source, research software engineering + +## Abstract + +For two decades the team of the **Center for Open Neuroscience (CON)** has been building an open, largely domain-agnostic research-software stack — first for neuroimaging, then for neuroscience broadly, and now used well beyond. We did it by repeating five actions: **Reuse** what already exists; **Compose** small modules into environments and systems instead of shipping silo-ed monoliths; **Extend** the upstream projects we depend on; **Standardize** the "languages" of data we exchange; and **Automate** everything we can — because nothing scales without it, and the harness itself is now another thing AI helps us maintain. These five verbs are not just our private taxonomy: they map directly onto the operational-maturity climb described by the **SciOps Capability Maturity Model** (Johnson *et al.*, [arXiv:2401.00077](https://arxiv.org/abs/2401.00077), 2024) — Reuse/Compose/Extend/Standardize are the practices of *Level 3 (Defined)*, where Johnson *et al.* themselves cite BIDS, NWB, DataLad/git-annex, DANDI, and brainlife.io as canonical exemplars; Automate is *Level 4 (Scalable)* with its "SciOps pipelines"; and AI-in-the-loop is *Level 5 (Optimizing)*, the pinnacle Johnson *et al.* reserve for "closing the discovery loop with the assistance of artificial intelligence". Architecturally the result is a familiar pattern lifted to the *stack* scale — **Model–View–Controller**: layered, standardized **models** (BIDS / NWB layouts; [LinkML](https://linkml.io/) / [pydantic](https://docs.pydantic.dev/) / [JSON Schema](https://json-schema.org/) / [SHACL](https://www.w3.org/TR/shacl/) for metadata; `git`+`git-annex`+DataLad as a content-addressed storage model); a plurality of interchangeable **views** — including ones the *Model itself materializes*: archive UIs (datasets.datalad.org, DANDI, OpenNeuro), schema-driven editors and websites ([vjsf](https://vjsf.koumoul.com/) for the DANDI meditor; [shacl-vue](https://github.com/psychoinformatics-de/shacl-vue) from M. Hanke's group rendering forms *and entire research-group websites* from SHACL), tabular surfaces ([Datasette](https://datasette.io/), [VisiData](https://www.visidata.org/)), programmatic APIs, and the DataLad Handbook; and small, single-purpose **controllers** (HeuDiConv, NeuroConv, ReproStim, `datalad run`, `con/duct`, BIDS-Apps, `con/validation`). The "Age of AI" doesn't change any of that — it expedites the actions, and in turn benefits from a modular, transparent stack that lets humans and agents "divide and conquer" without duplicating effort. + +In 15 minutes (talk + Q&A) we walk the layers of that stack — drawing on slides distilled from a decade of CON talks at distribits, ReproNim, NIH, OHBM, BIDS/DICOM and NWB/DANDI venues — and at each layer name one concrete thing the audience can start using **right away**: + +- **Reuse — joining instead of writing.** Yaroslav's first move (~2005) was *not* writing code: it was joining Debian, packaging FSL and PyEPL with Michael Hanke under the **pkg-exppsy** project that became [NeuroDebian](https://neuro.debian.net/). [PyMVPA](http://www.pymvpa.org/) (2007) followed — an early reproducible-analysis library shipped with a full test suite and buildbot CI before either was common in scientific Python; today we contribute upstream to scikit-learn / nilearn instead of maintaining a parallel toolbox. The same instinct lives on in [duecredit](https://github.com/duecredit/duecredit) (we built on [citeproc-py](https://github.com/citeproc-py/citeproc-py) and stayed on as co-maintainers when it needed care) and in [con/duct](https://github.com/con/duct), built on brainlife's `smon` after we learned [ReproMan](https://github.com/ReproNim/reproman) was too heavy for everyday use. *The cheapest reproducible thing is the one you didn't have to build.* **Take home:** before your next project, do one upstream-search pass; add `pytest` + a one-file CI workflow to one repo this week. +- **Reuse → Extend — distributing software.** Most of what NeuroDebian pioneered now flows through Debian Med, Debian Science, and conda-forge; a successful bridge dissolves into the commons. **Take home:** file an ITP at Debian Med, or open a conda-forge feedstock, for a tool you currently distribute by URL. +- **Compose — data management on a common substrate.** [DataLad](https://www.datalad.org/) layers reproducible versioning and distribution on top of `git` and `git-annex` — tech everyone already knows — and extends modularly via the [DataLad extensions](https://github.com/datalad/datalad-extension-template) mechanism. The same substrate scales out through [registry.datalad.org](https://registry.datalad.org/), which federates DataLad datasets across institutions and clouds and provides discovery over **petabytes** of data with no recentralizing platform. **Take home:** convert one shared-data folder into a DataLad dataset. +- **Compose — small acquisition & compute units.** [ReproStim](https://github.com/ReproNim/reprostim), [HeuDiConv](https://github.com/nipy/heudiconv), [NeuroConv](https://neuroconv.readthedocs.io/), and the [ReproNim/containers](https://github.com/ReproNim/containers) collection each tackle one slice of acquisition-to-pipeline reproducibility, glued together by [YODA](https://github.com/myyoda/poster) ("look up you must not"). *Resist monoliths, even your own.* **Take home:** wrap your next pipeline run in `duct` or in a `repronim/containers` recipe. +- **Extend — staying upstream.** When the commons we depend on need care, we stay: as Debian/Debian-Med maintainers; as citeproc-py co-maintainers; as BIDS Steering Group members. We generalize ad-hoc work upstream too: DataLad's `RUNCMD` provenance format is being lifted into [BEP028](https://github.com/bids-standard/bids-specification) (BIDS provenance) plus a BIDS prov exporter. *Ship pragmatic now, formalize upstream later.* +- **Standardize — data and metadata.** [BIDS](https://bids.neuroimaging.io/) and [NWB](https://www.nwb.org/) make data exchangeable across labs and vendors; [LinkML](https://linkml.io/), [pydantic](https://docs.pydantic.dev/), [JSON Schema](https://json-schema.org/), and [SHACL](https://www.w3.org/TR/shacl/) extend the same idea to *metadata* — and turn the schema into a *generator*: the DANDI [meditor](https://gui.dandiarchive.org/) builds its editor UI from JSON Schema via [vjsf](https://vjsf.koumoul.com/); [`shacl-vue`](https://github.com/psychoinformatics-de/shacl-vue) from M. Hanke's group renders forms *and entire research-group websites* from SHACL shapes (see Hanke et al., *LinkML metadata-driven workflow*, [ReproTube](https://datasets.datalad.org/repronim/ReproTube/DataLad/web/#/video/oF98hdaph1k?tab=local&wide=1&t=644&q=model&filter=1)). **Take home:** publish your model as a real artifact (LinkML / pydantic / JSON Schema), not as "what the code happens to accept"; validate one of your datasets against a community standard. +- **Standardize at scale — federated archives.** [DANDI](https://dandiarchive.org/), [EMBER](https://emberarchive.org/), and [OpenNeuro](https://openneuro.org/) put real data online at population scale, all built on the standards above and discoverable through the same federation pattern. *Federation is what lets a small RSE center reach population scale.* +- **Reuse, in reverse — pulling content back from platforms.** [AnnexTube](https://github.com/con/annextube) and [mykrok](https://github.com/mykrok/mykrok) pull research outputs back from commercial platforms (YouTube, Google) into the same `git` + `git-annex` substrate. *Your data is yours, even when someone else hosts it.* +- **Automate — without it, nothing of the above scales.** Every layer in this talk works only because we automate the boring part: PR-level CI on every DataLad change; daily-tested `git-annex` against DataLad; [con/tinuous](https://github.com/con/tinuous) archiving CI logs and artifacts before they expire; auto-rebuilt [ReproNim/containers](https://github.com/ReproNim/containers); ReproIn / HeuDiConv driving DICOM → BIDS at the scanner; ReproStim auto-recording all stimuli; dandisets auto-mirrored into DataLad on GitHub with webshots and trivial-IO sweeps; con/validation + dandi-cli running the full multi-validator gauntlet on every release. In SciOps terms (Johnson *et al.*, 2024), this is *Level 4 (Scalable)* — what the paper calls "SciOps pipelines": semi-automated continuous workflows across experimental design, collection, processing, analysis, and dissemination. The cost is real — *those harnesses are themselves code that someone maintains* — and that is the moment AI becomes the most viable way forward: as the **meta-automation** of harness maintenance ([con/skills](https://github.com/con/skills), [con/yolo](https://github.com/con/yolo)), pushing a small RSE center from L4 toward *Level 5 (Optimizing)*. **Take home:** automate one repetitive chore (CI matrix, daily smoke test, release script) this week — *and* write down who maintains the automation. +- **For HI and AI.** All of the above stays self-contained, well-described, and openly shared — equally legible to **HI** (human investigators) and **AI** agents. Our [STAMPED](https://stamped-principles.org/) principles (Self-containment, **T**racking, Actionability, Modularity, Portability, Ephemerality, Distributability — Macdonald, Baker, To & Halchenko, 2026) name exactly the operational properties this requires *per research object*; **SciOps** (Johnson *et al.*, 2024) names the matching team-level maturity ladder. The two are complementary: STAMPED describes the *artifact*, SciOps the *operations* around it. The AI-coding maturity ladder (companion talk) works only on top of versioned, modular, standardized artifacts — i.e. at least SciOps L3 — and AI is what turns the L4→L5 gap from "requires a consortium" into "feasible for a small RSE center", making AI-era reproducibility tractable rather than aspirational. **Different projects need different AI-acceptance policies**, and we show a four-stance spectrum — *Reject* (e.g. `git-annex` upstream contributions — pure HI), *Accept-with-disclosure* (DataLad / DANDI — HI commits with `Co-Authored-By` trailers and `@pytest.mark.ai_generated`), *Spec-driven AI-generated* (AnnexTube, mykrok, con/citations-collector, parts of dandi-cli), *Autonomous* (con/skills + con/yolo for triage/PR-review). Common ground across all four (per [ICMJE 2026](https://www.icmje.org/recommendations/browse/artificial-intelligence/ai-use-by-authors.html)): AI cannot be an author; humans retain full responsibility; AI use must be disclosed in the artifact itself. STAMPED *Tracking* is what makes that mechanical rather than aspirational. Survey of declared OSS stances: [`melissawm/open-source-ai-contribution-policies`](https://github.com/melissawm/open-source-ai-contribution-policies). +- **Why it composes — MVC at the stack scale.** The five verbs above keep producing the same shape: layered standardized **models**, many interchangeable **views**, and small single-purpose **controllers**. Pick any cell — Model, View, or Controller — and swap it out; the rest still works. *That* is what lets a small RSE center reach population scale, what keeps each piece small enough to maintain, and what lets agents pick up the stack cold. **Take home:** name the M / V / C of your next project before you start writing it; if any column has only one entry, you have a silo. + +We close with a one-slide **Monday checklist** of five concrete actions distilled from above — none requires neuroscience, all work today. + +## References + +1. *DataLad: distributed system for joint management of code, data, and their relationship.* Halchenko et al. JOSS 2021. +2. *The Brain Imaging Data Structure (BIDS).* Gorgolewski et al. Sci. Data 2016. +3. *Neurodata Without Borders (NWB).* +4. *DANDI Archive.* +5. *OpenNeuro: An open resource for sharing of neuroimaging data.* Markiewicz et al. eLife 2021. +6. *EMBER Archive.* +7. *ReproNim: A center for reproducible neuroimaging computation.* (NIH NIBIB P41 EB019936) +8. *PyMVPA.* +9. *NeuroDebian.* Halchenko & Hanke. Front. Neuroinform. 2012. +10. *duecredit — automated scholarly credit tracking.* ; *citeproc-py.* +11. *con/duct — small process-execution monitor.* +12. *DataLad Registry.* +13. *BEP028 — BIDS provenance.* +14. *LinkML.* ; *pydantic.* ; *JSON Schema.* ; *SHACL.* ; *DataLad Concepts.* +14a. *Schema-driven UIs.* *vjsf — Vue JSON Schema Form*, (powering the DANDI metadata editor at ); *shacl-vue*, M. Hanke's group at psychoinformatics-de, ; M. Hanke et al., *LinkML metadata-driven workflow*, ReproTube +15. *AnnexTube.* (demo: ); *mykrok.* (demo: ) +16. *STAMPED principles for reproducible research objects.* A. Macdonald, C. C. Baker, I. To, Y. O. Halchenko. 2026. ; sources: . STAMPED = **S**elf-containment, **T**racking, **A**ctionability, **M**odularity, **P**ortability, **E**phemerality, **D**istributability. +16a. *SciOps: Achieving Productivity and Reliability in Data-Intensive Research.* E. C. Johnson, T. T. Nguyen, B. K. Dichter, F. Zappulla, M. Kosma, K. Gunalan, **Y. O. Halchenko**, S. Q. Neufeld, K. Ratan, N. J. Edwards, S. Ressl, S. R. Heilbronner, M. Schirner, P. Ritter, B. Wester, S. Ghosh, M. E. Martone, F. Pestilli, D. Yatsenko. 2024. — a five-level Capability Maturity Model for rigorous scientific operations; the operational-maturity companion to STAMPED. +16b. *AI use by authors and peer reviewers.* International Committee of Medical Journal Editors (ICMJE) Recommendations, January 2026 update. — AI cannot be listed as an author; authors retain full responsibility; AI use must be disclosed. +16c. *Open-source AI contribution policies.* M. Weber Mendonça (curator), community-maintained catalog. — surveys declared OSS-project policies under four buckets: Accept / Restrict / Reject / Ongoing. Concrete exemplars (NumPy, Kubernetes, Linux, Django, Zig, Krita, Clojure, QEMU). +17. *con/serve — Digital Research Artifact Archive.* +18. *Distribits — Technologies for distributed data management (conference & community).* +19. *Center for Open Neuroscience.* + +## Connection to Mission, Goals, & Interests of US-RSE Community + +CON is, by construction, a prototypical US-RSE organization: a small team of full-time research software engineers (five "centroids") whose job is to design, ship, and steward open infrastructure used by domain scientists they do not directly report to. Almost everything we build or co-build — pkg-exppsy/NeuroDebian, PyMVPA, DataLad, duecredit, HeuDiConv, NeuroConv, ReproMan, con/duct, registry.datalad.org, BIDS/NWB/LinkML extensions, the STAMPED principles, and now `con/serve` — was scoped from the outset to be **domain-agnostic**, even when first motivated by neuroimaging. RSEs in genomics, geosciences, HPC, and digital humanities already use them; the talk makes those entry points explicit. + +The talk's title verbs are also its takeaways for the US-RSE audience: + +- **Reuse.** Whenever an upstream existed or could be grown, we joined it: pkg-exppsy/NeuroDebian → Debian, PyMVPA's intent → scikit-learn / nilearn, duecredit → citeproc-py, DataLad → `git`+`git-annex`, con/duct → brainlife's `smon`. The cheapest reproducible thing is the one you didn't have to build. +- **Compose.** `con/duct` is small on purpose; ReproStim / HeuDiConv / NeuroConv / ReproNim-containers each do one job; `registry.datalad.org` federates rather than recentralizes; BIDS, NWB, and LinkML are independent building blocks that can be picked up à la carte. +- **Extend.** When the commons we depend on needed care, we stayed on as maintainers (citeproc-py), generalized our ad-hoc work upstream (RUNCMD → BEP028 + BIDS prov exporter), and pushed packages into Debian Med / Debian Science / conda-forge so others could re-use them in turn. +- **Standardize.** Common tech (`git`), common data standards (BIDS, NWB), and common metadata standards (LinkML, concepts.datalad.org) so RSEs across fields can read each other's work — and so AI agents can too. +- **Automate.** Nothing in this stack scales without it: CI on every PR; daily-tested `git-annex` and DataLad extensions; con/tinuous CI-log archival; auto-rebuilt ReproNim containers; ReproIn/HeuDiConv at the scanner; auto-mirrored dandisets; con/validation + dandi-cli on every release. This is SciOps Level 4 ("SciOps pipelines") in practice. The harness is itself code we maintain — and the meta-automation step (using AI to maintain *it*) is the most viable way for a small RSE center to approach SciOps Level 5 (Optimizing). + +We embody the bridging role US-RSE foregrounds: + +- **RSE ↔ engineering industry.** Sustained collaboration with **Kitware** on NWB browse/analyze/visualize tooling — a textbook example of an academic RSE center partnering with a non-academic engineering shop. +- **RSE ↔ domain scientists.** Direct co-development with neuroscience labs at Dartmouth, Stanford (OpenNeuro), Allen Institute (NWB/DANDI), and FZ Jülich. +- **RSE ↔ global RSE community.** We help organize [distribits.live](https://www.distribits.live/), bringing DataLad-adjacent practitioners across continents into one room — a model other RSE sub-communities can copy. + +For the US-RSE audience the talk offers (a) a concrete, layer-by-layer tour of reusable infrastructure, (b) a working example of a multi-institutional RSE center sustained for over a decade through NIH P41 and collaborator funding, and (c) a Monday checklist that does not assume anything neuroscience-specific. diff --git a/202x-mvc-stack.html b/202x-mvc-stack.html new file mode 100644 index 0000000..29fb346 --- /dev/null +++ b/202x-mvc-stack.html @@ -0,0 +1,595 @@ + + + + + + + + + [WiP] MVC at the stack scale: what makes the open-(neuro)science stack compose + + + + + + + + + + + + + + + +
+
+ + + + + + +
+
+ +

+ WiP + MVC at the stack scale +

+

+ What makes the open-(neuro)science stack compose +

+ + + + STUB — not yet scheduled (rename the file when assigned to a venue)
+ Live slides/Sources: + https://datasets.datalad.org/centerforopenneuroscience/talks/202x-mvc-stack.html +
+ + + + + + + + + + + + +
+
+
+ + + +
+
+

One thesis, two parts

+
    +
  1. Most reusable open-science infrastructure ends up shaped like MVC — + not at the level of one app, but at the level of the whole stack.
  2. +
  3. If your project doesn’t already separate Model, Controller, and View, + that’s where you should put scarce engineering time first.
  4. +
+

+ TODO: open with a 30-second “you already know MVC” refresher cartoon. + Borrow nothing — assume the audience is RSE / scientific-software. +

+
+ + +
+

Why “at the stack scale” is the new word

+
    +
  • Classical MVC: one app, one process, one team.
  • +
  • Stack-scale MVC: one community, many tools, many institutions, many languages — + the M, the C, and the V each become whole sub-ecosystems.
  • +
  • The decoupling that MVC buys at app scale (testability, swap-ability, reuse) + is the same thing federation buys at archive scale.
  • +
+ +
+
+ + + +
+ +
+

Models — multiple, layered, standard

+

If only one team can read it, it’s not a FAIR model.

+
+ + + +
+

Dataset layout: BIDS & YODA

+

TODO: reuse BIDS-minder + a YODA-hierarchy figure. + Make the case that “BIDS” is a data model in the OO sense, not a folder convention.

+
+ +
+

Per-file: NWB / NIfTI / TSV / Parquet / HDF5 / Zarr / DICOM

+

TODO: point at NWB Inspector, BIDS-validator as runtime witnesses + that the model is real and self-describing.

+
+ +
+

Metadata: many languages, one idea

+
    +
  • LinkML — one source → JSON Schema, OWL/SHACL, pydantic classes, docs; + used by DANDI, NWB, + concepts.datalad.org.
  • +
  • pydantic — runtime validation in Python; the workhorse for DANDI/NWB tooling.
  • +
  • JSON Schema — the lingua franca; what BIDS extensions + and the DANDI meditor read.
  • +
  • SHACL — RDF-side shape constraints; the substrate for shacl-vue.
  • +
  • Cross-domain analogues: HED tags; + DUO (Data Use Ontology); + PROV.
  • +
+

+ TODO: show one schema slice in LinkML — same source rendered as JSON Schema, pydantic, SHACL. +

+
+ +
+

Storage: git + git-annex + DataLad

+

TODO: reuse the “DataLad sandwich” mermaid from + 2024-distribits-datalad.html. + Frame it as the storage model; remotes are pluggable.

+
+ +
+ + + +
+ +
+

Views — humans, agents, machines

+

A single Model deserves many Views.

+
+ + + +
+

Browse-the-archive humans

+

TODO: screenshots of datasets.datalad.org, DANDI, OpenNeuro, + EMBER, mykrok, ReproTube. The same dataset visible through any of them.

+
+ +
+

Tabular / ad-hoc analysis

+

TODO: live or screenshot demo of Datasette + over a TSV-from-BIDS-or-mykrok; VisiData over the same file.

+
+ +
+

Programmatic (humans + agents)

+

TODO: PyBIDS / pynwb / datalad.api / fsspec / FUSE. + Argue that this is the view that AI agents see.

+
+ +
+

External services that just plug in

+

TODO: brainlife, CONP, CBRAIN, Kitware NWB tooling, MetaCell NWB Explorer, + neurobagel. Diagram: same archive Model, four service Views.

+
+ +
+

Schema-driven UIs: the View that the Model generates

+
    +
  • vjsf → + DANDI meditor: + JSON Schema in → full metadata-editor form out, no hand-written widgets.
  • +
  • shacl-vue + (M. Hanke's group at psychoinformatics-de): + SHACL shapes in → not just forms, but entire research-group websites.
  • +
  • Same Model, many materializations: editor, validator, API, + documentation, website — all derived from one schema.
  • +
+

+ Recommended viewing: + M. Hanke et al., LinkML metadata-driven workflow + (ReproTube; + YouTube) — + walks through the model→UI→website pipeline in detail. +

+

+ TODO: embed a screenshot of DANDI meditor next to one of a shacl-vue site. +

+
+ +
+

Long-form narrative is a view too

+

TODO: handbook.datalad.org, per-archive docs, the standards' specs themselves. + “A spec is a View on the Model that humans read first.”

+
+ +
+ + + +
+ +
+

Controllers — small, single-purpose

+

If it does two things, it’s already two controllers.

+
+ + + +
+

Acquisition → standardized layout

+

TODO: ReproIn / HeuDiConv / NeuroConv / ReproStim with concrete + before/after — reuse slides from 2022-nih-compcore.html § ReproIn, + + new slide for NeuroConv.

+
+ +
+

Reproducible execution: datalad run, con/duct, ReproMan

+

TODO: reuse the datalad run/rerun code blocks + from 2025-distribits-YODA.html; add a duct trace example.

+
+ +
+

Logistics: git-annex special remotes

+

TODO: visual of the special-remote zoo + (rclone / S3 / web / datalad-archives / external). + Argue that special remotes are how a controller stays small while + the view of the storage stays plural.

+
+ +
+

Data → derivative: BIDS-Apps, DataLad crawler

+

TODO: walk the OpenNeuroDerivatives example + (already in 2025-distribits-YODA.html); call out that + “controller composition” is what derivative archives are.

+
+ +
+

Validation — and a recursive Standardize moment

+
    +
  • Each format brings its own validator: bids-validator + (now with HED validation inside), + nwbinspector, + pynwb, + zarr, + OME-Zarr, LinkML validators, codespell, REUSE…
  • +
  • Each emits its own shape of report — that’s a Model collision waiting to bite.
  • +
  • con/validation + harmonizes them: one schema for “a validation result”, regardless of source.
  • +
  • Concrete deployment: dandi-cli + now collects bids-validator (+ HED) and pynwb and zarr and nwbinspector + (and ome-zarr, …) into the one con/validation schema.
  • +
  • Once the results are one Model, Views are easy: + dashboards, dataset-health rollups, and yes — + VisiData on the validation TSV is great for triage.
  • +
+

+ Pattern: when Controllers all emit slightly-different result shapes, + the right move is not a bigger Controller — it’s a new Model for results, + with a thin adapter from each upstream. Validation = the type-checker for your data Model; + con/validation = the type-checker for your validators. +

+

+ TODO: screenshot of dandi-cli’s harmonized output + a VisiData snapshot. +

+
+ +
+ + + +
+ +
+

Contrast: the “service-tied UI” pattern

+

When the View only works because the backend is running.

+
+ +
+

What this looks like in academia

+
    +
  • A bespoke LIMS / ELN with all data behind a vendor login.
  • +
  • A lab-spun-up Flask / Django / Rails / FastAPI app that + reads from a private DB, ships server-rendered HTML or a + single-purpose REST API, and is the only way in.
  • +
  • A WordPress/Drupal “data portal” whose pages are the dataset.
  • +
  • A Jupyter app on someone’s VM that everyone shares + (until they retire / lose funding / change institution).
  • +
  • A “dashboard” that is the data — no + export, no schema, no machine API.
  • +
+

+ These are often picked because they’re quick, not because the + problem demands server-side state. The Model is implicit (whatever the + ORM happens to expose); the View is welded to the Controller; the + Controller is welded to a specific running process. +

+

+ Cost: low portability and low reusability. + The View depends not just on a Model, but on a service + that has to keep running. Lose the server — lose the data, + the workflow, and (usually) the provenance. +

+
+ +
+

Names from the design-pattern literature

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Pattern (or anti-pattern)What it names
Smart UI (Evans, DDD, 2003)Anti-pattern. Domain logic lives in the view; one team can ship fast but the system can’t be reused, tested, or evolved.
Anemic Domain Model (Fowler)Anti-pattern. Inverse cousin: data classes with no behaviour; all logic in a service layer welded to the UI.
Façade (GoF, 1994)A unified interface to a subsystem — useful, but custom backends often become the only Façade, and the Model behind it disappears.
Hexagonal Architecture / Ports & Adapters (Cockburn, 2005)The cure: domain Model in the centre; UI, DB, external services are interchangeable Adapters around it. Same idea as MVC at stack scale.
Adapter & Strategy (GoF)The micro-mechanisms that make Hexagonal work: thin per-source shims (Adapter) and pluggable behaviours (Strategy) into the same Model.
Service Layer (Fowler, PoEAA)A boundary of operations per Model, not per UI. Avoids re-encoding the same logic once for each interface.
+

+ Translation to our stack: BIDS / NWB / LinkML schemas + are the Model — the centre of the hexagon. Validators, ETL tools, + and archive UIs are Adapters around it. git+git-annex+DataLad + is what lets the Model travel without a service. +

+
+ +
+

What CON does instead

+
    +
  • Static-first archive: datasets.datalad.org + is just nginx over a directory tree. + No daemon, no DB, no special API — any HTTP client is enough.
  • +
  • Data is in the standard, not in the service: + BIDS layouts, NWB files, LinkML-described metadata. + Every consumer can read it with no help from us.
  • +
  • Multiple access paths to the same bytes: web, + git-annex, S3, FUSE, fsspec, REST — + because the Model is a file tree, not a service.
  • +
  • Schema-driven UIs (vjsf, + shacl-vue): + the View is derived from the Model, so the same Model + gets a hundred Views for free instead of one bespoke one.
  • +
  • Static-site generators & JAMstack for documentation: + ship HTML at build-time, serve from any CDN. Every deck in this repo + is a single HTML file in this spirit.
  • +
  • Run a service only when you must: + DANDI does have a backend (publication API, auth); + but the data sits on S3 + git-annex and is reachable + without it.
  • +
+

+ Heuristic: if your project would die when the + server goes down for a week, the View is welded to the Controller. + Move the contract into a real Model artifact (a schema, a layout, + a versioned dataset) — then the service is just one + Adapter among many. +

+

+ TODO: diagram — on the left, a monolithic LIMS-style + blob (View=Controller=DB); on the right, a hexagon with Model in + the centre and Adapters all around it. +

+
+ +
+ + + +
+ +
+

One stack, many lenses

+

MVC isn’t the only way to talk about this stack —
+ it’s one of several overlapping dimensions.

+
+ +
+

Lenses we keep using

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LensWhat it asksAxes / examples
ArchitecturalWhere does this piece live?MVC: Model / View / Controller — this whole talk
ProceduralHow is it produced & preserved over time?STAMPED — + Self-containment, Tracking, Actionability, + Modularity, Portability, Ephemerality, + Distributability
SharingWho can use it, and at what cost?FAIR — + Findable, Accessible, Interoperable, Reusable
CompositionalDoes it compose without looking up?YODA — modular, look-up-you-must-not, version-control everything
PurposeWhat is the project for?acquisition / curation / archive / analysis / governance / sharing / training
MaturityHow operationally rigorous & agentic-ready is it?SciOps — + Johnson et al., 2024 — 5-level CMM: + L1 Initial → L2 Managed → L3 Defined (FAIR data + FAIR workflows; the paper’s own L3 exemplars include BIDS, NWB, DataLad, DANDI, brainlife.io) → + L4 Scalable (“SciOps pipelines”: semi-automated, continuous) → + L5 Optimizing (closing the loop with AI).
+ Agentic-readiness sub-axis: the + AI-coding ladder L1–L5 — + but it presupposes at least SciOps L3 below it.
+

+ Each lens picks out a different property of the same stack. + When two lenses disagree about whether something is “good”, that’s usually a + real tension — not noise — worth a design conversation. +
e.g. a single big monolith can score great on FAIR + (you can find & download it) but terrible on MVC and YODA. A schema-driven + archive scores well on MVC and FAIR and AI-readiness at once — + that’s why we keep building them. +

+
+ +
+ + + +
+
+

So what?

+

Three RSE-grade actions.

+
+ +
+

Three things to do this week

+
    +
  1. Name your M / V / C. + For your current project, write down the model(s), view(s), and controller(s). + If a column has only one entry — or worse, zero — you have a silo.
  2. +
  3. Pick the swap. Identify one cell you would replace + within five years. Make the seam explicit now.
  4. +
  5. Publish the model. If your model is implicit + (only your code knows it), make it a real artifact: + LinkML, JSON Schema, BIDS BEP, CSV-with-a-spec. + Future-you, future-collaborators, and future-AI will thank you.
  6. +
+
+
+ + + + + + +
+

Thank you!

+

Models, Views, Controllers.

+

Compose them, your stack will.

+

+ + + +

+

Slides: datasets.datalad.org/…/202x-mvc-stack.html — CC-BY-SA

+
+ + + + +
+
+ + + + + + + + + diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..dbcac59 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,215 @@ +# CLAUDE.md — guidance for authoring & curating CON talks + +This file is loaded automatically when you (Claude / Claude Code) work in +this repository. It complements: + +- **`SOUL.md`** — *what* CON talks are about, voice, visual style, fixed + metadata, citation conventions, and the recurring story arcs. +- **`INDEX.md`** — *which slides exist already* and how to find them per + topic and per talk. + +Read both before authoring or editing a deck. Treat them as the source of +truth for cross-talk decisions. + + +## Repository layout (relevant subset) + +``` +README.md — repo overview, build/PDF instructions +SOUL.md — mission, voice, visual identity, references +INDEX.md — talk-by-talk catalog + topic lookup +CLAUDE.md — this file: how to author & curate +LICENSE — CC-BY-SA +gulpfile.js, package.json — npm/gulp setup for `npm start` live reload +css/custom.css — small site-wide CSS overrides +reveal.js/ — vendored reveal.js (do not modify) +reveal.js-mermaid-plugin/ — vendored mermaid plugin +pics/ — all images (canonical assets are listed in SOUL §4) +pics/borrowed/ — third-party images +3rd-party/ — third-party PDFs cited in slides +embed/ — small HTML iframes referenced by some decks +posters/ — poster sources (separate flow) +2026-usrse/ — venue-specific drafts: proposal, BoF/poster templates, + lineage diagrams +2026-ohbm-ossig/ — OHBM Open Science Room submission +2026-repronim-YODA-BIDS-webinar/ — companion notes/planning to that deck +2026-ca-origami-retreat-aicoding/ — planning + per-slide screenshots for that deck +.html — top-level reveal.js decks +``` + +`` follows `--` (or with a trailing +`-name`/`-aicoding` for sub-decks at the same venue). The published mirror +is `https://datasets.datalad.org/centerforopenneuroscience/talks/.html`. + +The repo is a [DataLad](https://datalad.org/) dataset (note `.datalad/`, +`.gitattributes`); commit binary assets via DataLad if introducing new +ones. For text-only edits to existing HTML, plain `git` is fine. + + +## Workflow for creating a new deck + +1. **Decide the arc** (`SOUL.md` §7) — Origin/Stack/Today, Challenges/ + Solutions/Take-home, YODA-principle-a-day, Nirvana, AI-ladder, or + Reuse/Compose/Extend/Standardize. Copy the spine, then customize. +2. **Pick the parent deck** for that arc from `INDEX.md` (e.g. for a + data-archives talk, start from `2023-brain-dandi.html` or + `2023-lbl-building-dandi.html`). +3. **Create `.html`** at the repository root. Use the parent + deck as a starting point — `cp` it and edit, or assemble fresh from + the title-slide template in `SOUL.md` §3. +4. **Set the standard reveal.js header** with theme `beige`, the four + plugins (Markdown / Highlight / Notes / Mermaid), `1400×1050` canvas, + and the `Reveal.initialize` block matching `SOUL.md`. +5. **Title slide**: CON letterhead → title → social handles → CON/PBS/ + CCN/Dartmouth affiliations → QR code (see "QR codes" below) → venue + + date + live-slides URL → logo strip with the Ukraine ribbon last. +6. **Pull reusable slides** by copying full `
` elements from + the talk file referenced in `INDEX.md`. Adjust hyperlinks but keep + `data-src="pics/..."` paths verbatim — all decks share the `pics/` + tree. +7. **End with**: + - one or more "Take-home" / "Monday checklist" slides; + - an Acknowledgements slide (the canonical layout is in + `2024-distribits-datalad.html`); + - a final "Thank you!" slide. Yoda-SVG sign-off is appropriate when + the deck has YODA flavoring. +8. **Test** the deck by opening it in a browser. For a development + loop, `npm install` then `npm start` per `README.md`. +9. **Update `INDEX.md`** with the new deck (per-talk entry + any + topic-lookup additions). +10. **Commit** the new deck *and* the QR code *and* the `INDEX.md` + update together. Use the repo's git workflow (DataLad-managed). + +### QR codes + +The live-slides URL is **fully determined by the deck filename**: + +``` +https://datasets.datalad.org/centerforopenneuroscience/talks/.html +``` + +so the QR code can (and should) be generated **at deck-creation time** +— do not wait for "after publishing". The same URL is already hard-coded +into the title-slide template, and every older deck in the corpus +follows this convention (e.g. +`pics/2024-distribits-datalad-qrcode.png`, +`pics/2025-distribits-YODA-qrcode.png`, +`pics/2026-repronim-YODA-BIDS-webinar-qrcode.png` — open any of them +side-by-side with the title slide of the matching `*.html` deck to +confirm). + +Save as `pics/-qrcode.png` and reference it from the title +slide ``. The repo's already-generated QR PNGs were produced with +the `qrcode` Python package (which also installs a `qr` CLI). Install +once with `uv` (do **not** forget `--with pillow` — without it `qr` +fails at import-time with `ModuleNotFoundError: No module named 'PIL'`): + +```bash +uv tool install qrcode --with pillow +``` + +Then for each new deck: + +```bash +TALK_ID=2026-usrse-con-talk +URL=https://datasets.datalad.org/centerforopenneuroscience/talks/${TALK_ID}.html +qr "$URL" > pics/${TALK_ID}-qrcode.png +``` + +The default output is a ~490×490 1-bit PNG, which is what the older +title slides reference. `qrencode` from your distro works too if it's +installed. + +The PNG itself is binary; commit it via DataLad +(`datalad save -m '+ qrcode for ' pics/-qrcode.png`). + +### Subdirectory conventions + +- Companion notes (planning, per-slide screenshots, companion + TODO files) live in a directory matching the deck name without the + `.html` extension, e.g. `2026-ca-origami-retreat-aicoding/`. +- Long-running drafts of submissions, BoFs, posters, etc. live under + the venue directory (e.g. `2026-usrse/`). +- Do **not** introduce a new top-level subdirectory just for one talk. + +## Workflow for curating / editing existing decks + +- **Small fix** (typo, link rot): edit in place, commit a focused diff. +- **Adding a new slide**: prefer copying a similarly-structured slide + from a sibling deck (consult `INDEX.md`) over inventing layout from + scratch. Reveal.js's section-vertical structure is touchy; cargo-cult + carefully. +- **Updating a citation**: keep the small/`` blockstyle from + `SOUL.md` §5; bold the speaker's name; link the DOI directly. +- **Refreshing a screenshot**: drop into `pics/` with a date suffix + (e.g. `datasets.datalad.org-20251021.png`). Don't delete the older + one — older decks still reference it. +- **Yanking a slide for reuse**: copy the *entire* `
...
` + block, then walk through any `class="fragment"`, `data-fragment-index`, + `data-transition` attributes — they are usually fine but occasionally + need to be adjusted to fit the new local context. + +## Authoring conventions to match the existing corpus + +- **Use `data-src` for ``**, not `src`. Reveal.js lazy-loads on + approach. +- **Reveal markdown sub-decks** — use them for content-heavy sections: + ```html +
+ +
+ ``` +- **Mermaid diagrams** — wrap in `
` + and stay within widely-supported flavors (`flowchart`, `gitGraph`, + `graph TB|LR`). Newer decks set theme variables per-diagram. +- **Speaker notes** — ``. Fine to be + long; press `s` to view in reveal.js. +- **Section dividers** — soft radial gradient + (`
`) + for "Challenge:" intros. +- **No new CSS files** — augment `css/custom.css` if you must, but + inline-style most layout tweaks (matching the existing decks). +- **No emojis** in slide text unless the deck already uses them + (the corpus does not, except in occasional small flourishes). + +## What goes where in this folder + +| If you're producing… | Put it in… | +| --- | --- | +| New slide deck | `.html` at repo root | +| QR code image for the deck | `pics/-qrcode.png` | +| Per-deck planning notes / screenshots | `/` directory | +| Venue-specific submission docs (PRD, BoF, poster, abstract) | `/` (e.g. `2026-usrse/`) | +| New reusable image | `pics/` (or `pics/borrowed/` if third-party) | +| New citation PDF | `3rd-party/` | + +## Don't do these + +- Don't switch the reveal.js theme. **`beige`** is the visual identity. +- Don't introduce a build step in a deck (each `.html` must be + openable as a single file). +- Don't move or rename existing `.html` files — published live + links depend on the URL. +- Don't add a new top-level subdirectory for slides; keep decks at the + root and notes in a sibling directory named after the deck. +- Don't `git rm` historical assets from `pics/` — older talks reference + them. +- Don't write new English-language documentation files at the root + unprompted. The three docs (`SOUL.md`, `INDEX.md`, `CLAUDE.md`) plus + the existing `README.md` are intentional. Use `/PLAN.md`, + `TODO.md`, etc., per existing companion-notes convention. + +## When the user says "draft a talk for X" + +A reasonable flow: + +1. Skim `SOUL.md` to anchor on style. +2. Pick the arc (`SOUL.md` §7) and parent deck (`INDEX.md`). +3. Search `INDEX.md`'s topic-lookup for matching reusable slides. +4. Produce `.html` (and the companion `/PLAN.md` if + the deck warrants one). +5. Update `INDEX.md` with the new talk's entry. +6. Surface a list of slides borrowed (with source `:
` + pointers) so the user can sanity-check sourcing. diff --git a/INDEX.md b/INDEX.md new file mode 100644 index 0000000..9500703 --- /dev/null +++ b/INDEX.md @@ -0,0 +1,514 @@ +# INDEX — talks and reusable slides + +A catalog of every talk in this repository plus a topic-wise lookup of +*where to copy slides from* when authoring a new deck. Pair this file +with `SOUL.md` (mission/style) and `CLAUDE.md` (authoring workflow). + +Slide IDs use the form `#section-[/sub-]` where `n` counts +**top-level `
`** elements in the source file (1-based). When a +top-level section has vertical sub-slides, `sub-m` counts those (1-based). +This matches reveal.js's URL-fragment numbering: `#//`. + + +## Per-talk inventory + +### `202x-mvc-stack.html` — *[WiP] MVC at the stack scale: what makes the open-(neuro)science stack compose* *(stub)* +- **Status**: WiP / outline-only stub. Title carries a `[WiP]` marker + in the browser tab and on the title slide. Not scheduled for a venue. + Rename to `--mvc-stack.html` when assigned, drop the + `[WiP]` prefix, and regenerate the QR code (the URL changes with the + filename). +- **Spine**: re-reads the four CON verbs as Model–View–Controller at + the *stack* scale. +- **Authoring seed**: the 4-slide MVC mini-section inside + `2026-usrse-con-talk.html` is the seed; promote each row into its own + slide here, plus borrow visuals from the per-topic lookup below. +- **Section openers in place**: + - "One thesis, two parts" — explicit MVC-at-stack-scale claim. + - "Why 'at the stack scale' is the new word" — classical-vs-stack-MVC contrast. + - **Models**: Dataset layout / Per-file / Metadata / Storage (4 stub slides). + - **Views**: Browse-the-archive humans / Tabular ad-hoc / Programmatic / External services / **Schema-driven UIs** (vjsf → DANDI meditor; shacl-vue → forms + research-group websites) / Long-form narrative. + - **Controllers**: Acquisition→layout / Reproducible execution / Logistics / Data→derivative / **Validation — and a recursive Standardize moment** (the latter is now the most fleshed-out slide: con/validation harmonizes bids-validator + HED + pynwb + zarr + nwbinspector + OME-Zarr; deployed in dandi-cli; VisiData for triage). + - **Contrast: the "service-tied UI" pattern** section (NEW): four slides walking through the academic anti-pattern (LIMS / ELN / "just a Flask app"), naming the design-pattern literature (Smart UI [Evans], Anemic Domain Model [Fowler], Hexagonal / Ports & Adapters [Cockburn 2005], Adapter / Strategy / Façade [GoF], Service Layer [PoEAA]), and showing CON's static-first contrast (datasets.datalad.org as plain `nginx`, schema-driven UIs, JAMstack-style decks). Includes a "if your project dies when the server is down for a week..." heuristic. + - **One stack, many lenses** section: a table of six lenses (Architectural / Procedural / Sharing / Compositional / Purpose / Maturity) mapping to MVC / STAMPED / FAIR / YODA / project-purpose / AI-ladder. Captured as a recurring framing in `SOUL.md` §1. + - "Three things to do this week" closer. +- TODO markers in-file flag what to fill in. + +### `2026-usrse-con-talk.html` — *[WiP] Reuse, Compose, Extend, Standardize, Automate: Two Decades of RSEing Open (Neuro)Science at CON* *(draft)* +- **Venue / date**: US-RSE'26 (proposal-stage draft; title carries a `[WiP]` marker in the tab and on the title slide). +- **Spine**: the five-verb spine (Reuse / Compose / Extend / Standardize / **Automate**) plus a "Reuse, in reverse" coda, an Automate section (with the *meta-automation* handoff), and an HI+AI close. +- **Reusable highlights**: + - Title slide (per `SOUL.md` §3). + - "Two decades, five verbs" intro slide (NEW). + - "When it began for us" verb-tagged timeline (NEW; extension of `2024-distribits-datalad.html` § timeline; Automate milestones include 2007 PyMVPA-with-CI, 2016 ReproIn/HeuDiConv, 2019 dandiset auto-mirroring + con/tinuous). + - Reuse: NeuroDebian + PyMVPA blocks borrowed from `2022-nih-compcore.html`; **`con/citations-collector`** added as the modern continuation of duecredit. + - Compose: DataLad sandwich mermaid + extensions diagram from `2024-distribits-datalad.html`; registry stats from `2025-distribits-YODA.html`; small-units table referencing ReproIn / HeuDiConv / NeuroConv / **`con/nwb2bids`** / ReproStim / ReproNim-containers / con/duct. + - Extend: a NEW "From RUNCMD to BEP028" mermaid summarizing the upstream-lift pattern. + - Standardize: BIDS slide + BIDS-minder image (from `2023-bids-dicom.html`) + LinkML/concepts.datalad.org bullets. + - Federated archives slide (DANDI + EMBER + OpenNeuro) with DANDI deep-dive borrowed from `2023-brain-dandi.html`. + - "Reuse, in reverse" 3-up table (AnnexTube / mykrok / con/serve). + - **Automate** section (NEW): **4 slides** — opener + "Where we automate" table (CI / con/tinuous / ReproNim-containers / acquisition / archive mirroring / validation / releases) + "Cost: harnesses, harnesses, harnesses → meta-automation via AI" (with `con/skills` / `con/yolo`) + **"The five verbs climb the SciOps ladder"** mapping (5-verbs ↔ SciOps L1–L5; the SciOps paper itself names BIDS / NWB / DataLad / DANDI / brainlife.io as Level-3 exemplars, and reserves Level 5 — Optimizing — for AI-in-the-loop). Hands off into HI+AI. + - HI+AI section: opener + "Why every layer matters now" (STAMPED + SciOps) + **"HI ↔ AI — every project picks its own policy"** (NEW): 4-stance spectrum table (Reject / Accept-with-disclosure / Spec-driven AI-generated / Autonomous) with OSS exemplars + CON projects + SciOps level + STAMPED principle, citing ICMJE Jan 2026 and melissawm/open-source-ai-contribution-policies. Pointing to `2026-ca-origami-retreat-aicoding.html` as the deeper-dive companion. + - **MVC mini-section** (NEW): 4 slides — opener + Models + Views + Controllers — placed between HI+AI and the Monday checklist. Models row covers BIDS / NWB / DuckDB-hive layouts + LinkML / pydantic / JSON Schema / SHACL metadata schemas + DataLad storage; Views row includes a dedicated *Schema-driven UIs* row (vjsf → DANDI meditor; shacl-vue → forms and research-group websites; Hanke et al. LinkML workflow ReproTube reference); Controllers row carries the punchline (*pick any cell — Model, View, or Controller — and swap it; the rest still works*). This block is the **seed of the standalone `202x-mvc-stack.html` stub**. + - Standardize section's "Metadata: schemas as first-class citizens" slide (NEW): expanded from a LinkML+concepts.datalad.org one-liner into a multi-language schema overview (LinkML / pydantic / JSON Schema / SHACL); also names **OBC (Open Brain Consent)** as the *consent* layer of standardization. + - Monday checklist: 6 entries (was 5) — added an Automate take-home pointing at `con/tinuous`. + - Monday checklist (NEW; 5-action wrap-up). + - Acknowledgements + Yoda SVG sign-off. +- Notes: drafted in support of `2026-usrse/talk-proposal-draft.md`. + Companion files in `2026-usrse/`. QR code TBD; uncomment the + `data-src` line in the title slide once the live URL is published. + +### `2026-ca-origami-retreat-aicoding.html` — *A few words of intro into AI assisted coding* +- **Venue / date**: CA Origami Retreat 2026. +- **Spine**: AI-coding ladder + spec-driven workflow + CON tools. +- **Reusable highlights**: + - Title slide with Avogadro Corp book reference (intro hook). + - "Reality Check" disclaimer slide (idiocracy GIF). + - YODA-Beyond-Code-and-Data table (traditional vs. expanded YODA scope). + - `con/serve` "The Vault" mermaid diagram (inbound / hub / outbound). + - **AI Coding Maturity Ladder** Levels 1–5 (Chat → Mid-loop → In-the-loop → + On-the-loop → Multi-agent). + - 5-Stage Development Loop mermaid. + - Mapping table: Vibe coding vs. Spec-driven vs. Compound engineering. + - Spec-driven tools ecosystem table (spec-kit / OpenSpec / Compound / + LAD). + - AI-assisted projects table (mykrok, AnnexTube, con/serve, + citations-collector, dandi-cli). + - Reusable-skills table (con/skills repo). +- Notes: planning notes in `2026-ca-origami-retreat-aicoding/PLAN.md`, + rendered checkpoint screenshots in the same directory. + +### `2026-repronim-YODA-BIDS-webinar.html` — *ReproFlow & YODA: Structure your studies* +- **Venue / date**: ReproNim Webinar, 2026-02-06. +- **Spine**: YODA principle-by-principle deep dive with BIDS framing. +- **Reusable highlights**: + - The full YODA principles canon (`yoda-principles-reordered.png`). + - Principle 1: Version control everything — `Why version control?` + table with PhD Comics 1531; VCS-as-experiment slides; `datalad run` + walk-through; `datalad rerun`; "datalad runs in the wild" registry + statistics; `git-annex addcomputed`; `con/duct`. + - Principle 2: Portable compute environments — software-container + families; `datalad-container`; `ReproNim/containers`; + `datalad containers-run`; clean-record CEREBRA/MRIQC. + - Principle 3: Modular composition — modules-and-layouts ladder; + BIDS as layout; OpenNeuroDerivatives walkthrough. + - "Look up you must not!" corollary slide + (`pics/yoda-do-not-look-up.png`, + `pics/depends-on-untracked-file.png`). + - Reality Check / disclaimer slide pattern. +- Notes: companion materials in + `2026-repronim-YODA-BIDS-webinar/{notes,planning}/`. + +### `2025-distribits-YODA.html` — *Pragmatic YODA: principles and their wild life encounters* +- **Venue / date**: distribits 2025, recorded + . +- **Spine**: same YODA spine as the 2026 ReproNim webinar — older but + shorter; many slides identical and reused there. +- **Reusable highlights**: see ReproNim entry above; this is the *parent* + of that deck. Use either as a source for YODA section material. + +### `2025-ca-origami-retreat.html` — *A challenge on the way to Neuroscience Nirvana: WORKAROUNDS!* +- **Venue / date**: CA Origami Retreat 2025. +- **Spine**: Nirvana / archives / make-re-use-convenient framing. +- **Reusable highlights**: + - WordNet "nirvana" definition pre block. + - "Where data go to die / how data are reincarnated" Q-and-A slides + (Buddha background). + - "What makes data re-use INconvenient?" two-slide pair (data bugs; + ad-hoc data access; opinionated software) with bug / feed-me cartoons. + - "What allows to make data re-use convenient?" — Standards (BIDS) and + Validation (BIDS). + - "What if standard does not (yet) fill the bill?" + - Closing "talk in BIDS" Nirvana slide. + +### `2024-distribits-datalad.html` — *"What's in the DataLad sandwich" AKA the DataLad ecosystem* +- **Venue / date**: distribits 2024. +- **Spine**: DataLad origin → sandwich layering → ecosystem → CI / health. +- **Reusable highlights**: + - "When it began for us" timeline (git → PyMVPA → GitHub → + git-annex → DataLad first commits). + - First use case: arjlover crawler → website-crawler-born mermaid. + - "From an email to a proposal" Joey-email screenshot timeline. + - "More layers to the sandwich" mermaid (datalad → git-annex → + git-annex-remote-archives → git-annex → git-annex-remote-datalad). + - DataLad crawler pipeline gitGraph (incoming → processed → master). + - DataLad realizations & shortcomings checklist. + - DataLad **extensions** mechanism + extension template + initial + extension graph. + - DataLad core "what it is" definition slide with JOSS citation. + - DataLad Extensions & Their Health (`pics/datalad-extensions.png`). + - DataLad Handbook overview (3-row table). + - "DataLad fulfilled original promise of a Data Distribution" + (`datasets.datalad.org` snapshot). + - Examples-of-use: OpenNeuro, brainlife, CONP infrastructure use; + YODA + ReproNim/containers; DANDI alternative view + Dropbox. + - "DataLad ecosystem" `DataLad-minder.svg` figure. + - **CI / testing / monitoring stack**: DataLad-all-changes-are-tested, + extensions tested daily, git-annex daily, daily-status-email, + `con/tinuous` archives, datalad-installer. + - Acknowledgements slide with funders + collaborators. + +### `2024-distribits-datalad-name.html` — *"What's in the DataLad name" AKA How come DataLad?* +- **Venue / date**: distribits 2024 (lightning). +- **Spine**: just the naming history (datagit → ftf → datalad). +- **Reusable highlights**: name-history single-slide timeline (good + warm-up / origin-story slide). + +### `2023-brain-dandi.html` — *DANDI: distributed archives for neurophysiology data integration* +- **Venue / date**: BRAIN Initiative talk, 2023. +- **Spine**: archive challenge → DANDI ingredients → standards → testing. +- **Reusable highlights**: + - "Challenge: Develop a BRAIN Initiative Archive" (radial-gradient + section divider). + - "Where data go to die" → DANDI born. + - "What data is in DANDI" `dandi-slide-modalities.svg`. + - "Data chronology and demographics" + `20230622-NWB-and-DANDI-tutorial-updates.svg`. + - "Ingredients needed to build an archive" — People / Standards / + Technologies / FOSS / Automations. + - DANDI users by role (submitter / researcher / developer SVGs). + - DANDI integrates standards (`20210421-INCF-dandischema.svg`). + - DANDI ecosystem (`DANDI-ecosystem.svg`). + - **Testing the entire archive**: docker-compose; DataLad-mirroring + of dandisets; `con/tinuous`; webshots; trivial IO across all + dandisets (`dandisets-healthstatus.png`). + - "DANDI ..." final summary bullets (modular FOSS, integrates, + novel-tech adoption, automated QA). + +### `2023-lbl-building-dandi.html` — *Building an Archive for Large-scale Neuroscience Data* +- **Venue / date**: LBL talk, 2023. +- **Spine**: same as `2023-brain-dandi.html` but longer; with a Brief + Bio section for general-audience framing. +- **Reusable highlights**: see DANDI entries above; this is the larger + parent deck. Includes: + - "Brief Bio" slide (Born in Siberia → Ukraine → US trajectory). + - "Standard for neurophysiology data: NWB" slide. + - "Standards make DANDI FAIR for People" slide. + - DANDI schema deeper-dive (`dandi-slide-schema.svg`). + +### `2023-bids-dicom.html` — *BIDS 4 DICOM WG-16* +- **Venue / date**: DICOM WG-16 meeting, 2023. +- **Spine**: BIDS as a meta-standard, and where BIDS ↔ DICOM + collaboration could go. +- **Reusable highlights**: + - "Brief Bio" (variant). + - BIDS-Steering iframe slide. + - "Standard for neural datasets: BIDS" with the 2016 BIDS Sci Data + citation (canonical citation slide). + - "BIDS ..." features bullets, including "you've seen one BIDS dataset + you've seen them all". + - BIDS-minder upstream-images slide + (`bids-standard.github.io/.../BIDS-minder.svg`). + - DICOM ↔ BIDS chronology (1982 DICOM → 2014 BIDS). + - Clunie MICCAI 2017 5-image fragments (data is in `pics/2017-Clunie-*.png`). + - **"All standards are 'Bad', but some are used"** — recurring + rhetorical slide. + - DICOMs in BIDS workflow (sourcedata, .json sidecars, BEP019, PR#1450). + +### `2023-brain-dandi-imgdatasrc.html` — short DANDI talk +- Tiny deck (86 lines), title-only template; safe to ignore as source. + +### `2022-nih-compcore.html` — *An Integrated and Trusted Scientific and Statistical Computing Core* +- **Venue / date**: NIH SSCR pitch, 2022. +- **Spine**: trust → noise → human IO → standards → FOSS distribution + → data management → archive → all the projects in one walk. +- **Reusable highlights**: + - Yarik-goal cartoon (`pics/yarik-goal.svg`) — the "north star". + - "Brief Bio" slide (canonical version; reused in 2023 talks). + - CON principles (`con-principles.png`) — used in title slides. + - **"Integration & Trust Tiers"** ladder (Social / Data acquisition / + Methods/Analytics / Software systems / Data management / Services). + - "Trust is largely a social aspect" framing slides. + - "How can we minimize unexplained variance?" — minimize-human-IO + bullets, simulations, assertions, peer-review, provenance, re-use. + - **3rd-party / "God-is-at-the-computer"** slide (recursive trust). + - Phantom QA / Nuisance study figure (F1000 2020 citation). + - **ReproNim 5 steps** (`pics/repronim-5steps.png`, + ). + - OBC (Open Brain Consent) born-in-2014 slide; outcomes; OBC tools. + - "Challenge: minimize human IO to understand data" → BIDS → BIDS-Apps. + - **ReproIn / HeuDiConv** sequence-naming → automated BIDS slides. + - **Beyond ReproIn**: ReproStim / ReproEvents / con/noisseur. + - "Challenge 2007: no ML framework" → **PyMVPA** features / + classification / searchlight / hyperalignment / TRANSFusion; + PyMVPA-on-phone deployment punchline. + - **NeuroDebian** born-2009 slide; integration figure; + user-perspective figure; `nd_overview.svg` developer view; benefits + bullets (Conda-Forge / Fedora / Gentoo handoff, + "Containerization comes for free"). + - **DataLad-in-one-figure** (`pics/datalad_process_tuned/00base_preview.png`). + - Provenance capture: 3-step `datalad run` / `datalad containers-run` + / `datalad rerun` code blocks. + - **Extend DataLad** extensions overview. + - DataLad CI / health (testing-extensions / git-annex daily / + `con/tinuous` archive). + - "In DataLad We Trust" + decentralized RDM citation. + - **DANDI**-section duplicate (modalities / services / standards / + schema / "Webshots of all dandisets"). + - Closing "Integrated and Trusted" bullet manifesto. +- This deck is the **richest single source** of reusable CON-history + material — borrow heavily for any retrospective talk. + +### `2022-tx-big-neuroscience.html` — *Towards the Big Data Neuroscience Nirvana* +- **Venue / date**: ACNN Workshop 2022 (Texas). +- **Spine**: Nirvana / archives / making re-use convenient + DataLad CI + health quick tour. +- **Reusable highlights**: prototype of the Nirvana arc later refined in + `2025-ca-origami-retreat.html`. Includes "Big Data" section, the + largest-Git-repo / `datasets.datalad.org` snapshot, the for-users / + for-developers slide pair, and DataLad / extensions / git-annex daily + validation triplet. + +### `0000-zoom-background.html` — Zoom background slide template +- Not a talk; layout source for sharing CON banner during Zoom. + +## Topic-wise lookup + +Use these as a fast "where do I steal a slide for X?" cheat sheet. + +### Reuse / upstream contribution / NeuroDebian +- `2022-nih-compcore.html` § "NeuroDebian from user perspective", § "Under-the-hood for a NeuroDebian developer", § Overall benefits. +- `2024-distribits-datalad.html` § "git-annex is built and tested daily", § datalad-installer, § acknowledgements. +- *Asset*: `pics/neurodebian*.{png,svg}`, `pics/nd_overview.svg`, + `pics/neuropy_history.svg`. + +### Compose / DataLad ecosystem / sandwich layering +- `2024-distribits-datalad.html` § Sandwich mermaid, § Extensions + template, § DataLad ecosystem (`DataLad-minder.svg`), § "DataLad for + developers". +- `2022-nih-compcore.html` § Provenance + extensions list. +- *Asset*: `pics/DataLad-minder.svg`, `pics/datalad-extensions.png`, + `pics/tall-burger.png`, `pics/datalad_process_tuned/`. + +### Compose / small acquisition+compute units (HeuDiConv / ReproStim / +ReproNim-containers / con/duct / ReproMan) +- `2022-nih-compcore.html` § ReproIn / HeuDiConv / Beyond-ReproIn. +- `2025-distribits-YODA.html` § "datalad-container", § "ReproNim/ + containers" walkthrough, § ReproMan reference. +- `2026-repronim-YODA-BIDS-webinar.html` § same Principle 2 section. +- `2025-distribits-YODA.html` § con/duct; § "datalad runs in the wild"; + § `git-annex addcomputed`. +- *Asset*: `pics/webshot-repronim-containers.png`, + `pics/repronim-containers-{workflow,show,yoda-lower}.png`, + `pics/webshot-con-duct.png`, `pics/screenshot-duct-*.png`, + `pics/duct-mriqc-cerebra.png`, `pics/borrowed/reproin-logo.jpg`. + +### Extend / standards work / BIDS BEPs +- `2023-bids-dicom.html` § BIDS features, § BIDS-minder, § DICOMs-in-BIDS + workflow, § "All standards are bad, but some are used". +- `2022-nih-compcore.html` § Microscopy-BIDS citation. +- *Asset*: `pics/BIDS-minder.svg`, `pics/bids-logo-wide.png`, + `pics/bids-yoda.png`, `pics/bep028-example1.png`. + +### Standardize / data archives / DANDI / OpenNeuro / federation +- `2023-brain-dandi.html` (whole deck) — best DANDI walkthrough. +- `2023-lbl-building-dandi.html` — extended version. +- `2024-distribits-datalad.html` § DANDI alternative-view slide; + § OpenNeuro infrastructure use. +- *Asset*: `pics/dandi-slide-{modalities,services,standards,schema}.svg`, + `pics/DANDI-{ecosystem,FAIR,users-*}.svg`, + `pics/dandiarchive-webshots.png`, `pics/dandisets-healthstatus.png`. + +### YODA principles + "Look up you must not" +- `2025-distribits-YODA.html` and `2026-repronim-YODA-BIDS-webinar.html` + — full YODA spine. +- *Asset*: `pics/yoda*.{png,svg}`, `pics/principle-{vcs,computeenv, + structure}.png`, `pics/depends-on-untracked-file.png`, + `pics/yoda-hierarchy-with-containers.png`, + `pics/yoda-do-not-look-up.png`, `pics/yoda-all-the-way-down.png`. + +### Provenance / `datalad run` / `datalad rerun` / RUNCMD → BEP028 +- `2025-distribits-YODA.html` and `2026-repronim-YODA-BIDS-webinar.html` + § Principle 1. +- `2022-nih-compcore.html` § Provenance capture (3 code-block slides). +- `2024-distribits-datalad.html` § DataLad crawler gitGraph. + +### CI / con/tinuous / daily-tested git-annex / health dashboards +- `2024-distribits-datalad.html` § three-image stack of + PR-test screenshots; daily-status email iframe; `con/tinuous`. +- `2023-brain-dandi.html` § identical CI slides re-used for DANDI. +- `2022-nih-compcore.html` § "PART of an answer: AUTOMATION" + 3 + testing slides. +- *Asset*: `pics/con-tinuous-{github,term,term-dandi-cli}.png`, + `pics/datalad-extensions.png`, `pics/datalad-git-annex.png`, + `pics/webshot-datalad-installer.png`, + `pics/datalad-daily-status-email-subject.png`, + `embed/datalad_git-annex_daily.html`. + +### Trust / accountability / variance / phantom QA / OBC +- `2022-nih-compcore.html` § "Trust is largely social", § Nuisance + study, § ReproNim 5 steps, § OBC born/outcomes/tools. +- *Asset*: `pics/god-is-at-the-computer.jpg`, + `pics/MRI-scanner.png`, `pics/f1000-webshot-20200930*.png`, + `pics/repronim-5steps.png`, `pics/OBC_LogoCheck.svg`, + `pics/obc-{main,ultimate,tools}.png`. + +### PyMVPA / "we ported the intent upstream" +- `2022-nih-compcore.html` § PyMVPA Features → searchlight → + hyperalignment → TRANSFusion → "phone deployment". +- *Asset*: `pics/pymvpa*.png/svg`, + `pics/pymvpa_logo_fromfusionposter.svg`, + `pics/pymvpa_on_phone.jpg`, + `pics/uniform_analysis.svg`. + +### "Make re-use convenient" / Nirvana framing +- `2025-ca-origami-retreat.html` — full deck. +- `2022-tx-big-neuroscience.html` — original. +- `2023-brain-dandi.html` § "Where data go to die". + +### AI angle / HI+AI / AI-coding ladder / con/serve +- `2026-ca-origami-retreat-aicoding.html` — full deck. +- `2026-repronim-YODA-BIDS-webinar.html` § hand-off to AI talk + (Appendix-style slide referenced from the AI talk's "Previously on…"). +- *Asset*: `pics/borrowed/ai-ladder-skills.png`, + `pics/borrowed/2026-ai-intensifies.png`, + `pics/surface-depth-v2.jpg`, `pics/borrowed/idiocracy-fixed.gif`. + +### MVC framing / "why the stack composes" +- `2026-usrse-con-talk.html` § "Why it composes — MVC at the stack scale" (4-slide mini-section: opener + Models + Views + Controllers table). +- `202x-mvc-stack.html` *(stub)* — the standalone deck spun out of that mini-section. +- *Asset*: re-uses existing tables; no new images required for the seed. + When deepening the standalone talk, borrow `pics/DataLad-minder.svg` + (storage model), `pics/BIDS-minder.svg` (dataset-layout model), and + the controllers screenshots from + `2025-distribits-YODA.html` / `2026-repronim-YODA-BIDS-webinar.html`. + +### Validation harmonization (con/validation) +- `2026-usrse-con-talk.html` § MVC mini-section's Controllers row + ("Validation *(and harmonization)*") names `con/validation` as the + harmonizer for bids-validator (+ HED) / pynwb / zarr / nwbinspector / + OME-Zarr / LinkML validators, deployed in dandi-cli. +- `202x-mvc-stack.html` § Controllers/"Validation — and a recursive + Standardize moment" — the long-form treatment: *con/validation is the + type-checker for your validators*; once results are one Model, Views + are easy (dashboards, VisiData on the validation TSV). +- *External*: ; + deployed in . + +### Automate / harness / meta-automation +- `2026-usrse-con-talk.html` § Automate (3 slides: opener, "Where we + automate" table, "Cost: harnesses, harnesses, harnesses → meta- + automation via AI"). Cites con/tinuous as the canonical CI-archival + example (highly modular, fits con/serve's archival mission). +- *Older talks with reusable CI/automation slides*: + - `2024-distribits-datalad.html` § "DataLad: all changes are tested", + "extensions tested daily", "git-annex daily", `con/tinuous` archive, + `datalad-installer`. + - `2022-nih-compcore.html` § "PART of an answer: AUTOMATION" + the + same three testing slides. + - `2023-brain-dandi.html` § "Dandisets converted into DataLad and + pushed to GitHub", "Webshots of all dandisets", "Testing trivial + IO across all dandisets". +- *Asset*: `pics/con-tinuous-{github,term,term-dandi-cli}.png`, + `pics/datalad-extensions.png`, `pics/datalad-git-annex.png`, + `pics/datalad-daily-status-email-subject.png`, + `embed/datalad_git-annex_daily.html`, + `pics/dandiarchive-webshots.png`, + `pics/dandisets-healthstatus.png`. +- *Meta-automation framing*: harness maintenance is the place where AI + assistance becomes most viable for a small RSE center; pair with + `con/skills` and `con/yolo` (per `2026-ca-origami-retreat-aicoding.html`). + +### HI ⇄ AI policy spectrum (per-project AI-acceptance) +- `2026-usrse-con-talk.html` § "HI ↔ AI — every project picks its own policy" (in the HI+AI section): a 4-row spectrum table mapping policy stance ↔ OSS exemplars ↔ CON project ↔ SciOps level ↔ STAMPED property: + - **Reject** — Zig, Krita, Clojure, QEMU; `git-annex` (Joey Hess's "Feb 30" satirical policy). + - **Accept with disclosure** — NumPy, Kubernetes, Linux, Django; **DataLad / DANDI** with `Co-Authored-By` trailers, `@pytest.mark.ai_generated`. + - **Spec-driven AI-generated** — **AnnexTube**, **mykrok**, **con/citations-collector**, parts of **dandi-cli**. + - **Autonomous** — **con/skills** + **con/yolo** workflows. + Common ground per [ICMJE Jan 2026](https://www.icmje.org/recommendations/browse/artificial-intelligence/ai-use-by-authors.html): AI cannot be author, humans retain responsibility, disclosure mandatory. +- *External survey*: [`melissawm/open-source-ai-contribution-policies`](https://github.com/melissawm/open-source-ai-contribution-policies) — community-maintained catalog of declared OSS policies; the three+one bucket framework (Accept / Restrict / Reject / Ongoing) is what the slide's 4-stance spectrum is built on. +- *SOUL.md §1* names this as a recurring framing — use it for any deck that touches AI contributions. + +### Backend-coupled "service-tied UI" anti-pattern / Hexagonal contrast +- `202x-mvc-stack.html` § "Contrast: the 'service-tied UI' pattern" + (4 slides): the anti-pattern in academia, design-pattern names + (Smart UI / Anemic Domain Model / Hexagonal-Ports-Adapters / Adapter / + Strategy / Façade / Service Layer), CON's static-first counter-pattern. +- *Citations*: Cockburn, *Hexagonal Architecture*, 2005 + ; Evans, + *Domain-Driven Design*, 2003 (Smart UI anti-pattern); Fowler, + *Patterns of Enterprise Application Architecture* + (; Anemic + Domain Model: ); + Gamma, Helm, Johnson, Vlissides (GoF), *Design Patterns*, 1994 + (Adapter / Strategy / Façade). +- Use this slide block when an audience is more enterprise-software- + literate than data-archive-literate, or when a critic asks + "why don't you just build a portal?". + +### Five verbs ↔ SciOps maturity levels +- *Mapping* (Johnson *et al.*, 2024; ): + - **L1 Initial** — ad-hoc; no CON verb yet. + - **L2 Managed** — Compose (lab-local; YODA layout). + - **L3 Defined** — Reuse / Compose / Extend / Standardize; FAIR data + FAIR workflows. *Paper-cited L3 exemplars*: BIDS, NWB, DataLad / git-annex, DANDI, brainlife.io. + - **L4 Scalable** — Automate; "SciOps pipelines" (semi-automated continuous workflows). + - **L5 Optimizing** — Automate × AI; closing the discovery loop. +- *Slide anchors*: `2026-usrse-con-talk.html` § Automate's 4th slide ("The five verbs climb the SciOps ladder"); `202x-mvc-stack.html` § Many-lenses Maturity row (now spells out all 5 levels); `2026-usrse-con-talk.html` § HI+AI's updated SciOps bullet (STAMPED = per-artifact, SciOps = team-operations). +- *SOUL.md §7* explicitly lists the verb→level mapping; cite it from any future deck that wants to talk about maturity. +- *Practical implication*: when you name one of the 5 verbs in a slide, mention which SciOps level it corresponds to — CMM language travels well to RSE audiences. + +### Multi-dimensional framings ("many lenses, one stack") +- `202x-mvc-stack.html` § "One stack, many lenses" — table of six + lenses (Architectural / Procedural / Sharing / Compositional / Purpose + / Maturity) with example axes: + - Architectural → MVC (this talk's spine) + - Procedural → **STAMPED** — Self-containment, Tracking, Actionability, Modularity, Portability, Ephemerality, Distributability (Macdonald et al. 2026, [stamped-paper](https://github.com/stamped-principles/stamped-paper)) + - Sharing → FAIR + - Compositional → YODA + - Purpose → acquisition / curation / archive / analysis / governance / sharing / training + - Maturity → **SciOps** five-level CMM (Johnson et al. 2024, [arXiv:2401.00077](https://arxiv.org/abs/2401.00077)) + AI-coding ladder L1–L5 for the agentic-readiness sub-axis. + Use this slide when a talk needs to triangulate several patterns at + once instead of pitching one. +- *SOUL.md §1* names this as a recurring framing — read it before + writing a talk that compares projects/properties along multiple axes. +- Both STAMPED and SciOps now have entries in `SOUL.md` §5 (canonical + citations), so any future deck can cite them with one-line consistency. + +### Schema-driven UIs / "the Model materializes the View" +- `2026-usrse-con-talk.html` § Standardize → "Metadata: schemas as first-class citizens" (LinkML / pydantic / JSON Schema / SHACL bullets) and § MVC mini-section's Views slide (vjsf → DANDI meditor; shacl-vue → forms + research-group websites). +- `202x-mvc-stack.html` *(stub)* § Models → "Metadata: many languages, one idea" and § Views → "Schema-driven UIs: the View that the Model generates". +- *External reference*: M. Hanke et al., **LinkML metadata-driven workflow** — + on ReproTube: + (YouTube id `oF98hdaph1k`, timestamp 644s); cite when borrowing or expanding the schema-driven-UI argument. +- *Suggested screenshots to add later*: DANDI meditor at ; a `shacl-vue`-rendered website (per psychoinformatics-de's deployments). + +### CON identity / acknowledgements / funders +- `2024-distribits-datalad.html` final acknowledgements slide is the + canonical layout: software → some-slides-origin → Funders → Collaborators. +- `2022-nih-compcore.html` § "Trust ladder" CON banner. +- *Asset*: `pics/con-{principles,ack-*,webshot-*,logo_*}.{png,svg}`, + `pics/con-ccn-dartmouth-letterhead.svg`, + `pics/borrowed/{nih,nsf*,bmbf_2020,binc,erdf,cbbs_logo,LSA-Logo,fzj_logo, + hbp_logo,conp_logo,vbc_logo,repronim_logo,openneuro_logo,cbrain_logo, + brainlife_logo,dandi_logo,bannerthanks}.{png,svg,jpg}`. + +### Speaker bio / "Brief Bio" slides +- `2022-nih-compcore.html`, `2023-bids-dicom.html`, + `2023-lbl-building-dandi.html` — three near-identical versions of + Yarik's bio + CON-principles `r-stack`. + +### Title / opening hook archetypes +- `2025-ca-origami-retreat.html` opens with "WORKAROUNDS!" reveal. +- `2026-repronim-YODA-BIDS-webinar.html` opens with Reality-Check GIF. +- `2024-distribits-datalad.html` opens with QR code + logo strip + (canonical template). + +### Closing slides +- "Save your questions for the panel discussion" / "Let me know what to + fix": + `2024-distribits-datalad.html`, `2025-distribits-YODA.html`, + `2026-ca-origami-retreat-aicoding.html` ("Let the AI agents be with + you", with the Yoda SVG as a sign-off). + +## How to use this index + +1. **Pick a story arc** from `SOUL.md` §7. +2. Walk the per-talk list above for the arc's parent deck — its slide + anchors are your "free" content. +3. For each section of the new deck, consult the *topic-wise lookup* to + pull supporting slides from sibling decks rather than re-creating + them. +4. When you copy a slide, update relative `data-src` paths only if the + new deck lives in a subdirectory (it shouldn't — keep new decks at + the repo root). +5. Add the new deck to this index when committed. diff --git a/SOUL.md b/SOUL.md new file mode 100644 index 0000000..6a75661 --- /dev/null +++ b/SOUL.md @@ -0,0 +1,525 @@ +# SOUL — what these talks are *for* and how they look + +This file is the long-lived "soul" of the [Center for Open Neuroscience +(CON)](http://centerforopenneuroscience.org/) talks repository. It captures +the recurring **mission**, **voice**, **visual style**, **resources**, and +**citation conventions** that any CON talk should inherit. New talks should +read this first; reusable slide content is indexed in `INDEX.md`. + +The technical "how to author / build" guide lives in `CLAUDE.md`. + + +## 1. Mission + +CON talks exist to advocate, by way of working examples, for a single +recurring thesis: + +> **Build science as reusable, composable, extensible, standardized +> infrastructure — and, as much as pragmatic, *bridge* upstream +> rather than re-implementing.** + +Concretely, almost every talk in the corpus revisits some subset of: + +- **Reuse** — "the cheapest reproducible thing is the one you didn't have to + build". `pkg-exppsy → NeuroDebian`, `PyMVPA ←→ scikit-learn → nilearn`, + `duecredit → citeproc-py`, `DataLad → git + ←→ git-annex`, `con/duct ← + brainlife smon`. We *join* upstreams instead of forking. +- **Compose** — small units (HeuDiConv, ReproStim, ReproNim/containers, + con/duct, DataLad subdatasets, DataLad extensions, BIDS dataset modules, ...) + over silo'd monoliths. +- **Extend** — when the commons we depend on need care, we stay on as + (co)maintainers (fail2ban, citeproc-py) as long as needed, or even take over + the development (heudiconv); we generalize and standardize ad-hoc work + upstream (DataLad RUNCMD → BEP028 BIDS provenance); we push packages into + Debian / Debian Med / Debian Science / conda-forge so others reuse them in + turn but also extend our "workforce". +- **Standardize** — common tech and underlying data models (`git`, + `git-annex`), common data standards (BIDS, NWB, HED, DICOM), common metadata + (LinkML, concepts.datalad.org), common organizational layouts (BIDS, YODA, + STAMPED). Standards are the language across labs and across HI ↔ AI. +- **Automate** — none of the above scales without it. Unit/integration CI on + every PR; daily-tested `git-annex` against DataLad; `con/tinuous` archiving + CI logs and artifacts before they expire; auto-rebuilt `ReproNim/containers`; + `ReproIn` + `HeuDiConv` driving DICOM→BIDS at the scanner; `ReproStim` + capturing all stimuli; auto-mirrored dandisets; `con/validation` + `dandi-cli` + on every release; auto-deployed handbook + per-archive docs. + In **SciOps** (Johnson *et al.*, 2024) terms this is *Level 4 (Scalable)* — + what the paper calls "SciOps pipelines". **Cost:** the harness is itself + code we maintain — and that is exactly where AI assistance becomes the most + viable way forward as **meta-automation**: using Claude Code + `con/skills` + + `con/yolo` to maintain the *automations* themselves, taking a small RSE + center from SciOps L4 toward *L5 (Optimizing)*. +- **Federate, don't recentralize** — `registry.datalad.org`, DANDI/EMBER/ + OpenNeuro, datasets.datalad.org, neurobagel, ... all distribute + discovery and/or storage rather than becoming a single platform. + Joining forces where we can contribute (neurobagel, babs) to instill our principles + and make interchangeable to be "federatable", instead of taking over +- **Make things Convenient** -- buggy, unreliable, hard to use, or requiring + manual action when could be automated -- is INCONVENIENT. + +Recurring framings the speaker leans on: + +- "Where data go to die / how data are reincarnated" → archives + reuse + (`2022-tx-big-neuroscience.html`, `2025-ca-origami-retreat.html`, + `2023-brain-dandi.html`). +- "Sandwich" / "burger" / "minder" — DataLad as **layered tech** over + `git-annex` over `git`, with extensions / remote helpers as additional + layers. +- "All standards are bad, but some are used" (D. Clunie, MICCAI 2017) — + used to justify pragmatic standardization (`2023-bids-dicom.html`). +- "Make re-use convenient" — every time data integration is mentioned, the + conclusion arrives back at this. +- **YODA** as a vocabulary of reproducibility (Version control everything; + Look up you must not; Modular composition). +- **HI + AI** — the recent (2026) framing: the same self-contained, well- + described, version-controlled artifacts serve both human investigators + and AI agents. +- **HI ⇄ AI policy spectrum** — different projects need different + AI-acceptance policies; CON's portfolio spans the full spectrum and we + show that in talks rather than picking one. Four canonical stances: + - **Reject** any AI-generated content (e.g. `git-annex` upstream + contributions in practice — Joey Hess's "policy" page is a famous + Feb-30 satire; analogues in OSS: Zig, Krita, Clojure, QEMU). + - **Accept with disclosure** (DataLad / DANDI; `Co-Authored-By: Claude…` + trailers; `@pytest.mark.ai_generated`; analogues: NumPy, Kubernetes, + Linux kernel, Django). + - **Spec-driven AI-generated** (HI specifies, AI writes, HI reviews & + commits) — AnnexTube, mykrok, con/citations-collector, parts of + dandi-cli (LAD specs + AI-generated tests). + - **Autonomous** agents in the loop — con/skills + con/yolo for triage, + PR review, dependency updates. + + *Common ground (all four):* AI cannot be an author (ICMJE Jan 2026 + update); humans retain full responsibility; AI use must be disclosed + *inside the artifact* (commit trailer, methods section, acknowledgments). + STAMPED *Tracking* is what makes that mechanical. SciOps positions + these as a climb from L3-floor up to L5 (Optimizing). + External catalog of declared OSS stances: + [`melissawm/open-source-ai-contribution-policies`](https://github.com/melissawm/open-source-ai-contribution-policies). +- **Static-first vs. service-tied UI** — a recurring contrast: most + academic data infrastructure is built as a "service-tied UI" + (LIMS / ELN / bespoke Flask app / Drupal portal) where the View is + welded to a running backend, so portability and reuse collapse the + moment the server stops. CON's pattern is the inverse: the Model + (BIDS / NWB / LinkML / DataLad-tracked file tree) is the artifact; + Views are derived; services are *one Adapter among many*. Cite + Cockburn's *Hexagonal Architecture / Ports & Adapters* (2005), + Evans's *Smart UI* anti-pattern (DDD 2003), Fowler's *Anemic Domain + Model* + *Service Layer*, and GoF *Adapter / Strategy / Façade* when + the audience is enterprise-software-literate. +- **Many lenses, one stack** — the same project / artifact can be + legitimately described along several *overlapping* dimensions, each + picking out a different property: + - **Architectural** — Model / View / Controller (the spine of + `202x-mvc-stack.html`; mini-section in `2026-usrse-con-talk.html`). + - **Procedural** — STAMPED (Self-containment, Tracking, + Actionability, Modularity, Portability, Ephemerality, + Distributability — see Macdonald, Baker, To & Halchenko, 2026, + [`stamped-principles/stamped-paper`](https://github.com/stamped-principles/stamped-paper)). + - **Sharing** — FAIR (Findable, Accessible, Interoperable, Reusable). + - **Compositional** — YODA (modular, "look up you must not", + version-control everything). + - **Purpose** — acquisition / curation / archive / analysis / + governance / sharing / training. + - **Maturity** — operational maturity per SciOps + (Johnson *et al.*, 2024; arXiv:2401.00077; five-level Capability + Maturity Model for rigorous scientific operations — cited from the + STAMPED paper); agentic-readiness axis via the AI-coding ladder L1–L5 + (`2026-ca-origami-retreat-aicoding.html`). + + Use lenses *together*, not as competitors. When two lenses disagree + about whether something is "good", that's usually a real design + tension worth surfacing on the slide, not noise to smooth over. + +## 2. Audience and tone + +- Audiences range from neuro-domain (NWB/DANDI, BIDS WGs, ReproNim + webinars) to general RSE / HPC / data-management (US-RSE, distribits, CA + Origami retreat, NIH compcore). Talks usually pick a domain hook and + then quickly broaden to the domain-agnostic stack. +- Voice: first-person plural ("we"), occasionally first-person ("Yarik's + first move was..."), liberal Yoda phrasing on YODA-flavored decks + ("Track you must!", "Look up you must not"), and dry humor in Reality + Check / disclaimer slides ("idiocracy-fixed.gif"). +- Concrete "Take home" / "Monday checklist" lists at section ends are a + staple — the audience should leave with one thing they can do this week. +- Respect for collaborators is loud: every deck ends with an Acknowledgements + slide naming **funders** (NIH, NSF, BMBF, ERDF, BInC), **collaborators** + (HBP, CONP, VBC, ReproNim, OpenNeuro, CBRAIN, brainlife, DANDI), and + upstream people (Joey Hess, Michael Hanke, the DataLad team). +- Solidarity tag: a small Ukrainian flag ribbon + (`pics/Ukrainian_Blue-Yellow_ribbon.svg`) appears in the title slide + logo strip — keep it. + +## 3. Visual style and reveal.js conventions + +All decks are [reveal.js](https://revealjs.com/) HTML, **theme `beige`**, +with the highlight plugin's `monokai` syntax theme. The repository's vendored +`reveal.js/`, `reveal.js-mermaid-plugin/`, and `css/custom.css` are the +canonical assets. Do not introduce a new theme without reason. + +### Standard header + +```html + + + + + + + Talk Title + + + + + + + + + +
+ +
+ + + + + + +``` + +- **Canvas size**: `1400 x 1050` is the current default + (`2025-distribits-YODA`, `2026-repronim-YODA-BIDS-webinar`, + `2026-ca-origami-retreat-aicoding`); older decks used `1920 x 1080`. + Stick to **1400×1050** for new talks unless the venue insists on 16:9. +- **Plugins**: always include `RevealMarkdown`, `RevealHighlight`, + `RevealNotes`, `RevealMermaid`. Recent decks add `RevealSearch` + (Ctrl+Shift+F) — fine to include. +- **`data-src`** for images so reveal.js lazy-loads them. + +### Title slide template + +```html +
+ + + +

Title goes here

+

Optional subtitle / yoda one-liner

+ + + + + VENUE — DATE
+ Live slides/Sources: + …/.html +
+ + + + + + + + + +
+``` + +- **QR code** points to the live-slides URL on + `datasets.datalad.org/centerforopenneuroscience/talks/.html`. + Generate one per talk and save as + `pics/-qrcode.png`. +- **Logo strip** at the bottom: pick the relevant subset from the canonical + set — DataLad, NeuroDebian, ReproNim, OBC, DANDI, BIDS, YODA, + Ukraine ribbon. Keep order roughly stable. +- **Author block**: keep the four-line affiliation (CON / PBS / CCN / + Dartmouth) verbatim across talks for consistency. +- The first `
` is wrapped in another `
` (reveal.js + vertical stack convention) so a vertical sub-deck can sit under the + title without restructuring. + +### Slide-construction patterns to reuse + +These conventions show up across multiple talks; reuse them rather than +inventing new ones. + +- **Section divider** with a "Challenge:" headline on a soft gradient: + `
`. + Used heavily in `2022-nih-compcore.html`, `2023-brain-dandi.html`, + `2023-lbl-building-dandi.html`. +- **Buddha / Yoda backgrounds** for "nirvana" sections: + `data-background="pics/digits-budda.svg" data-background-opacity="0.9"`. +- **Markdown sub-decks**: long content uses + `
+
` — + this is the dominant style in the 2025/2026 talks. +- **Layered fragment images** (showing a dataset directory tree progressively + reveal): `r-stack` or absolutely-positioned `` + layered with descending top/left offsets. See + `2024-distribits-datalad.html` "2017: DataLad crawler pipeline". +- **Mermaid diagrams** for pipeline / sandwich layering / "the Vault" + inbound/outbound. The mermaid plugin is loaded in every deck. +- **Aside notes** (`