Skip to content

Promote CellProfiler parity and accelerated backends#60

Draft
trissim wants to merge 196 commits intomainfrom
benchmark-platform
Draft

Promote CellProfiler parity and accelerated backends#60
trissim wants to merge 196 commits intomainfrom
benchmark-platform

Conversation

@trissim
Copy link
Copy Markdown
Collaborator

@trissim trissim commented Dec 19, 2025

Summary

This PR promotes the CellProfiler compatibility path from benchmark-only smoke execution into a broader OpenHCS runtime compatibility layer with explicit runtime semantics, output equivalence, accelerated backends, parity tracking, and expanded generated-pipeline coverage.

Audit

Runtime and equivalence semantics

  • Adds CellProfiler measurement dialect normalization and equivalence policy wiring.
  • Expands runtime output equivalence for images, tables, measurements, reference artifacts, and cache-aware OpenHCS adapter comparisons.
  • Promotes runtime artifact query/export/value semantics so measurements, object labels, relationships, and image outputs are compared through typed runtime records instead of ad hoc file checks.
  • Adds source-schema workspace and source matching improvements so CellProfiler setup-module semantics map into OpenHCS components without filename smell heuristics.
  • Extends execution validation and runtime cache behavior to separate OpenHCS execution from external equivalence/materialization work.

CellProfiler module compatibility

  • Expands generated .cppipe conversion and binding coverage for alignment, color conversion, image math, mask objects, smooth, enhance edges, expand/shrink, filter objects, classify objects, illumination, object measurements, object relationships, and tracking.
  • Adds typed settings modules for newly covered converter surfaces.
  • Improves module execution handling for object-measurement rows, source-image provenance, object-label domains, stack/slice alignment, and relationship payloads.
  • Fixes per-object measurement semantics where aggregate rows should not be padded as object rows, while preserving dense object-domain completion where required.

Explicit accelerated backends

  • Adds openhcs.processing.backends.cellprofiler modules for alignment, classification, colocalization, illumination, image quality, intensity, intensity distribution, morphology, neighbors, outlines, region properties, relationships, secondary objects, shape, texture, thresholding, tracking, watershed, and Zernike calculations.
  • Adds shared analysis-region-properties backend support.
  • Makes numba a required dependency and declares CellProfiler compatibility extras separately.
  • Keeps backend selection explicit through OpenHCS registry patterns; no silent backend fallback is introduced.

Benchmark tracking and tests

  • Adds a parity runner script for CellProfiler .cppipe suites.
  • Adds tracker docs for full-pass status, parity evidence, and backend optimization work.
  • Expands unit coverage for CP adapters, generated pipeline execution, library loading, module execution, runtime adapter behavior, source schema, symbol table/settings binding, runtime equivalence, artifact queries, runtime values, materialization, morphology, track objects, expand/shrink, measure granularity, intensity distribution, and backend strategy registries.

Verification

Ran with the sibling OpenHCS virtualenv:

../openhcs/.venv/bin/python -m pytest \
  tests/unit/test_cellprofiler_adapter.py \
  tests/unit/test_cellprofiler_generated_pipeline_execution.py \
  tests/unit/test_cellprofiler_library_loading.py \
  tests/unit/test_cellprofiler_module_execution.py \
  tests/unit/test_cellprofiler_runtime_adapter.py \
  tests/unit/test_cellprofiler_source_schema.py \
  tests/unit/test_cellprofiler_symbol_table.py \
  tests/unit/test_cppipe_execution_validation.py \
  tests/unit/test_cppipe_parser.py \
  tests/unit/test_image_file_serialization.py \
  tests/unit/test_materialization_core.py \
  tests/unit/test_runner_cellprofiler_compatibility.py \
  tests/unit/test_runtime_artifact_queries.py \
  tests/unit/test_runtime_equivalence.py \
  tests/unit/test_runtime_execution_validation.py \
  tests/unit/test_runtime_exports.py \
  tests/unit/test_runtime_values.py \
  tests/unit/test_settings_binder.py \
  tests/unit/test_source_matching.py \
  tests/unit/test_source_schema_workspace.py \
  tests/unit/test_cellprofiler_morphology.py \
  tests/unit/test_cellprofiler_processing_backend.py \
  tests/unit/test_cellprofiler_strategy_registries.py \
  tests/unit/test_cellprofiler_trackobjects.py \
  tests/unit/test_expandorshrinkobjects.py \
  tests/unit/test_measuregranularity.py \
  tests/unit/test_measureobjectintensitydistribution.py \
  tests/unit/test_runtime_semantics.py \
  -q

Result:

645 passed, 1 warning in 21.73s

Follow-up Work

  • Extract the .cppipe dialect compiler out of benchmark ownership into a product-facing OpenHCS interop/dialect package so benchmark, CLI, and PyQt consume the same canonical conversion service.
  • Integrate PyQt Code mode with .cppipe loading alongside .py: compile .cppipe into a normal OpenHCS Pipeline, show generated OpenHCS code/form state, and expose source-schema/provenance mapping instead of a black-box CellProfiler runner.
  • Add phase timing records for startup, setup, compile, execute, validation, equivalence, and cache/materialization phases, with graphable long-table output for OpenHCS vs native CellProfiler comparisons.
  • Continue replacing centrosome/scikit-image hot paths with explicit numba/CuPy-capable backend strategies while preserving CellProfiler semantics and avoiding silent fallbacks.

- plan_01: Benchmark infrastructure with orchestration, storage, comparison
- plan_02: Dataset acquisition with fail-loud validation and caching
- plan_03: Tool adapters for OpenHCS, CellProfiler, ImageJ, Python
- plan_04: Metric collectors (Time, Memory, GPU, Correctness)
- plan_05: Pipeline equivalence system for fair comparison

All plans include:
- UML class diagrams
- Flow diagrams
- Sequence diagrams
- Complete implementation code
- Integration examples

Ready for implementation following smell-loop approval.
@continue
Copy link
Copy Markdown

continue Bot commented Dec 19, 2025

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

trissim and others added 28 commits December 19, 2025 19:22
Research findings from publications using BBBC datasets:
- Complete BBBC021/022/038 dataset specifications with real URLs, sizes, formats
- Real CellProfiler pipeline parameters from actual analysis.cppipe files
- Evaluation metrics from NuSeT (2020), Cimini et al. (2023), and other benchmarking papers
- Illumination correction parameters from Singh et al. (2014)
- Ground truth availability and usage strategies
- Preprocessing pipelines and subsetting approaches

Files added:
- plan_02_ADDENDUM_real_dataset_specs.md: Complete BBBC dataset specs, download strategies, validation without checksums
- plan_03_ADDENDUM_real_pipelines.md: Real CellProfiler pipeline from BBBC021 analysis.cppipe with all 27 modules
- plan_04_ADDENDUM_correctness_metrics.md: Pixel-level and object-level evaluation metrics from publications
- RESEARCH_SUMMARY.md: Complete investigation report with all sources cited

All findings sourced from publications, GitHub repos, and BBBC downloads. No handwaving.

Remaining gaps (require downloads to fill):
- BBBC022 filename pattern (need to download 1 plate to reverse-engineer)
- Dataset checksums (not provided by Broad, will compute or skip)
- File manifests (impractical to list 39,600 files, will use count validation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement proper ABC-compliant handlers for BBBC datasets:

BBBC021Handler (ImageXpress format):
- Pattern: {Well}_{Site}_{Channel}{UUID}.tif (e.g., G10_s1_w1BEDC2073...tif)
- Channels: w1=DAPI, w2=Tubulin, w4=Actin
- FilenameParser with regex for Well/Site/Channel extraction
- MetadataHandler for CSV metadata (BBBC021_v1_image.csv)
- No virtual mapping needed (already flat structure)

BBBC038Handler (Kaggle nuclei, PNG format):
- Folder-based organization: stage1_train/{ImageId}/images/{ImageId}.png
- No structured filename pattern (uses ImageId as identifier)
- FilenameParser accepts .png files, extracts ImageId from path
- MetadataHandler for metadata.xlsx and CSV labels
- Handles segmentation masks in separate masks/ folders

Both handlers:
- Implement all abstract methods from MicroscopeHandler ABC
- Define compatible_backends (DISK only)
- Auto-register via _microscope_type class attribute
- Support FileManager abstraction throughout

No handwaving - ready for benchmark platform integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement ABC-compliant handlers with PASSING TESTS for BBBC datasets:

BBBC021Handler (ImageXpress-like with UUID):
- Parses: G10_s1_w1{UUID}.tif (original files in Week#/Week#_##### subdirectories)
- Constructs: G10_s1_w1_z001_t001.tif (virtual workspace with all components)
- Pattern handles BOTH original (with UUID) and virtual (with z/t) filenames
- Flattens Week#/Week#_##### folder structure to plate root
- Adds default z_index=1, timepoint=1 for pattern discovery consistency
- Channels: w1=DAPI, w2=Tubulin, w4=Actin (w3 not used)

BBBC038Handler (Kaggle nuclei, PNG):
- Parses: {hex_id}.png from stage1_train/{ImageId}/images/ subdirectories
- ImageId treated as unique "well" identifier
- Single channel, single site, no Z or timepoint
- Flattens folder structure to stage1_train/ directory

Both handlers:
- Follow virtual workspace architecture: ALL components in constructed filenames
- Implement all MicroscopeHandler ABC methods
- Auto-register via _microscope_type
- Compatible backends: [DISK]
- Ready for benchmark platform integration

Tests included:
- BBBC021: 6 real filenames from BBBC021_v1_image.csv (ALL PASS)
- BBBC038: 3 hex ID filenames (ALL PASS)
- Roundtrip: parse → construct → parse (ALL PASS)

No handwaving - tested with actual BBBC filenames.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
MICROSCOPE DETECTION & REGISTRATION:
- Add MetadataDetectMixin: reusable detect() implementation delegating to metadata handler
- Add TiffPixelSizeMixin: extract pixel size and channel names from TIFF tags
- BBBC021Handler: implement detect() via filename pattern matching
- BBBC038Handler: implement detect() via stage1_train folder detection
- ImageXpressHandler, OperaPhenixHandler, OpenHCSMicroscopeHandler: use MetadataDetectMixin
- Remove hardcoded handler registration at end of bbbc.py (now automatic via metaclass)

METADATA CACHING:
- Simplify MetadataCache: remove per-file mtime tracking and validation checks
- Cache is now explicit-clear-only (no automatic invalidation)
- Reduces complexity while maintaining correctness for single-plate workflows

REGISTRY DISCOVERY:
- LazyDiscoveryDict: skip cache when secondary registries present
- After discovery, populate secondary registries via _register_secondary hook
- Prevents stale cache from blocking secondary registry population

SIGNAL BATCHING (ImageBrowser performance):
- ColumnFilterWidget.select_all/select_none: always block signals during batch updates
- Emit single filter_changed signal at end instead of N signals
- Fixes signal storm when clicking 'None' button on 96-well filter (96 -> 1 signal)

DEPENDENCIES:
- Add tqdm>=4.66.5 for progress indication

This refactor improves:
- Microscope detection: deterministic, side-effect-free, testable
- Code reuse: mixins eliminate duplication across handlers
- Performance: signal batching prevents UI thrashing
- Maintainability: explicit registration removed, automatic via metaclass
ARCHITECTURE:
- Contracts: ToolAdapter, MetricCollector, DatasetSpec (immutable specs)
- Datasets: Registry of BBBC021, BBBC022, BBBC038 with download/extract/validate
- Pipelines: Registry of benchmark pipelines (nuclei_segmentation)
- Metrics: TimeMetric (perf_counter), MemoryMetric (RSS sampling)
- Adapters: OpenHCSAdapter implementing ToolAdapter contract
- Runner: Orchestrates tool validation, dataset acquisition, execution

DATASET ACQUISITION:
- Download with progress bars (tqdm)
- Extract zip archives atomically
- Validate by image count (±5% tolerance) or manifest
- Cache to ~/.cache/openhcs/benchmark_datasets/{id}/
- Fast path: skip re-download if cached and valid

OPENHCS ADAPTER:
- Validates OpenHCS installation
- Creates FileManager and microscope handler
- Runs minimal segmentation pipeline: blur → threshold → label
- Supports parameter validation (threshold_method, declump_method, diameter_range)
- Collects metrics via context managers
- Returns normalized BenchmarkResult with provenance

METRICS:
- TimeMetric: wall-clock execution time (perf_counter)
- MemoryMetric: peak RSS memory in background thread (psutil)
- Both implement MetricCollector ABC (context manager pattern)

PIPELINES:
- NUCLEI_SEGMENTATION: Otsu threshold + morphological operations
- Parameters: opening_radius, diameter_range, fill_holes
- Extensible: easy to add CELL_PAINTING, etc.

DATASETS:
- BBBC021_SINGLE_PLATE: 720 images, 839 MB
- BBBC022_SINGLE_PLATE_DNA: 3,456 images, 7.8 GB
- BBBC038_FULL: 33,215 images, 382 MB
- All with validation rules and microscope type

This enables:
- Reproducible benchmarking across tools
- Standardized metrics collection
- Dataset caching and validation
- Easy tool adapter implementation
- Extensible pipeline registry
…tion

SUMMARY
=======
Add complete CellProfiler conversion infrastructure for benchmarking OpenHCS
against CellProfiler. Uses a two-phase approach: one-time library absorption
(LLM converts entire CellProfiler library), then instant .cppipe conversion
(registry lookup, no LLM needed at conversion time).

CONVERTER INFRASTRUCTURE (benchmark/converter/)
===============================================
- absorb.py: CLI for one-time library absorption
  python -m benchmark.converter.absorb --model google/gemini-3-flash-preview

- library_absorber.py: Core absorption logic
  - Scans cellprofiler_source/library/modules/_*.py
  - LLM converts each to OpenHCS format
  - Validates: syntax, @numpy decorator, 'image' first param, no relative imports
  - Writes to cellprofiler_library/functions/

- llm_converter.py: Dual-backend LLM converter
  - Ollama (local): model names like 'qwen2.5-coder:7b'
  - OpenRouter (cloud): model names like 'google/gemini-3-flash-preview'
  - Auto-detects backend from model name format (org/model = OpenRouter)

- system_prompt.py: Comprehensive first-principles OpenHCS explanation (~470 lines)
  - Dimensional dataflow architecture
  - ProcessingContract semantics (PURE_2D, PURE_3D, FLEXIBLE, VOLUMETRIC_TO_SLICE)
  - Multi-input operations (stack along dim 0, unstack inside function)
  - special_outputs/special_inputs for labels and measurements
  - Conversion rules and template

- contract_inference.py: Runtime contract inference
- source_locator.py: CellProfiler source code locator
- parser.py: .cppipe file parser
- pipeline_generator.py: Generate OpenHCS pipelines
- settings_binder.py: Bind .cppipe settings to function kwargs
- convert.py: CLI for .cppipe conversion

ABSORBED LIBRARY (benchmark/cellprofiler_library/)
=================================================
26 CellProfiler modules converted to OpenHCS functions:
closing, colortogray, combineobjects, convertimagetoobjects,
convertobjectstoimage, correctilluminationapply, crop, dilateimage,
enhanceedges, enhanceorsuppressfeatures, erodeimage, erodeobjects,
expandorshrinkobjects, fillobjects, gaussianfilter, measureimageoverlap,
measureobjectsizeshape, medialaxis, medianfilter, morphologicalskeleton,
opening, overlayobjects, reducenoise, savecroppedobjects, threshold, watershed

CELLPROFILER SOURCE (benchmark/cellprofiler_source/)
====================================================
Extracted CellProfiler source code for LLM reference:
- modules/: 90 module class files
- library/modules/: 27 pure algorithm implementations
- library/functions/: Core utility functions
- library/opts/: Enums and options

EXAMPLE PIPELINES
=================
- benchmark/cellprofiler_pipelines/: Original .cppipe files + converted
- benchmark/pipelines/: OpenHCS benchmark pipelines (numpy, cupy, gpu variants)
EXPERIMENTAL - may be reverted.

- flash_config.py: Remove max_fps cap (None instead of 60)
- geometry_tracking.py: New orthogonal geometry tracking
  - WidgetSizeMonitor: Detects size changes in watched widgets
  - AutoGeometryTracker: Discovers geometry-affecting widgets
  - FlashGeometryTracker: Queues flashes during layout changes
  - Eliminates timing race conditions by state transitions, not arbitrary delays
CHANGES:
- system_prompt.py: Request structured JSON output with contract, category, confidence, reasoning
- llm_converter.py: Parse JSON response, populate ConversionResult with LLM-inferred metadata
- library_absorber.py: Use LLM-inferred values instead of hardcoded pure_2d/0.5 defaults
- pipeline_generator.py: Map category → variable_components (z_projection→Z_INDEX, channel_operation→CHANNEL)
- Removed LLM fallback mode - purely deterministic conversion from absorbed library
- Deleted broken ExampleHuman_openhcs.py (garbage from early LLM run)

CONTRACTS.JSON NOW INCLUDES:
- contract: PURE_2D | PURE_3D | FLEXIBLE | VOLUMETRIC_TO_SLICE
- category: image_operation | z_projection | channel_operation
- confidence: 0.0-1.0 (LLM's confidence in inference)
- reasoning: Why this contract/category was chosen

PIPELINE GENERATION:
- Fail-loud if modules missing from absorbed library (no fallback)
- variable_components derived from LLM-inferred category
…bsorbed modules

Implemented LLM-powered converter system that transpiles CellProfiler pipelines (.cppipe) into native OpenHCS pipelines. Successfully absorbed all 88 CellProfiler modules using Claude Opus 4.5 and converted both benchmark pipelines (ExampleHuman and ExampleFly) to runnable OpenHCS code.

Three-phase system: (1) Absorption - LLM extracts pure algorithms from CellProfiler source, infers contracts and categories; (2) Parsing - deterministic .cppipe parsing; (3) Generation - maps modules to OpenHCS functions with proper variable_components.

Key features: ROI+CSV materialization for segmentation, infrastructure module handling (LoadData/ExportToSpreadsheet), retry logic, registry system with contracts.json.

Results: 88 absorbed modules (segmentation, measurements, image processing, morphology, projections, transformations), 2 converted pipelines (ExampleHuman 4 modules, ExampleFly 9 modules).

Technical highlights: CamelCase registry fix, dual-axis resolution integration, special I/O handling, fail-loud error handling.
- Fixed parameter name normalization to exactly match SettingsBinder logic
  - Remove parenthetical content before normalization (e.g., '(Min,Max)')
  - This fixes mapping of tuple parameters like 'Typical diameter (Min,Max)' -> [min_diameter, max_diameter]

- Fixed FunctionStep API usage to use tuple pattern: func=(function, {kwargs})
  - Previously was incorrectly passing kwargs directly to FunctionStep
  - Now correctly passes kwargs dict as second element of tuple

- Backfilled parameter mappings for 83/88 absorbed CellProfiler functions
  - Used Gemini Flash 3.0 to generate mappings from original source + absorbed function
  - Mappings stored in function docstrings as single source of truth
  - Added backfill_parameter_mappings.py script

- Generated pipelines now have proper kwargs instead of comments
  - ExampleFly: min_diameter=10, max_diameter=40 correctly mapped from tuple
  - ExampleHuman: min_diameter=8, max_diameter=80 correctly mapped from tuple
  - All other parameters properly translated using docstring mappings
…semantics

Used LLM (Gemini 3.0 Flash Preview) to analyze all 88 absorbed functions and determine
correct categories based on input shape expectations and iteration semantics.

Changes:
- Created recategorize_functions.py script for LLM-based recategorization
- Updated contracts.json with 7 category changes (81 unchanged)

Category changes:
  z_projection (3 functions):
    - MakeProjection: Processes z-stacks (D, H, W) → (H, W) projections
    - Morphologicalskeleton: Has volumetric parameter for 3D processing
    - TrackObjects: Processes temporal sequences (frames over time)

  channel_operation (4 functions):
    - CorrectIlluminationCalculate: Per-channel illumination correction
    - IdentifyPrimaryObjects: Segment same marker across all sites per channel
    - RescaleIntensity: Per-channel intensity normalization
    - Tile: Assembles sites into montage per channel

Impact:
- IdentifyPrimaryObjects now uses VariableComponents.CHANNEL instead of SITE
- MakeProjection now uses VariableComponents.Z_INDEX instead of SITE
- Generated pipelines have semantically correct iteration order
- Functions receive correct input shapes based on their processing semantics

All changes verified against OpenHCS PURE_2D contract behavior:
- PURE_2D unstacks dim 0 and calls function on each (H, W) slice
- variable_components controls what dim 0 represents (sites, channels, or z-slices)
- Total function calls remain the same, only iteration order changes
…ents semantics

Updated LLM recategorization prompt with correct dimensional dataflow semantics:
- image_operation (SITE): Single-channel operations across all sites (default)
- z_projection (Z_INDEX): Functions that NEED z-stacks (projections, 3D ops)
- channel_operation (CHANNEL): Functions that NEED multiple channels simultaneously

Results:
- channel_operation (4): ColorToGray, GrayToColorRgb, MeasureColocalization, UnmixColors
- z_projection (2): MakeProjection, Morphologicalskeleton
- image_operation (82): Everything else (single-channel operations)

Fixed incorrect categorizations from previous run:
- IdentifyPrimaryObjects: channel_operation → image_operation ✓
- CorrectIlluminationCalculate: channel_operation → image_operation ✓
- RescaleIntensity: channel_operation → image_operation ✓
- Tile: channel_operation → image_operation ✓
- TrackObjects: z_projection → image_operation ✓ (time-lapse uses sequential_components)

Added correct categorizations:
- ColorToGray: image_operation → channel_operation ✓
- MeasureColocalization: image_operation → channel_operation ✓
- UnmixColors: image_operation → channel_operation ✓
- GrayToColorRgb: image_operation → channel_operation ✓ (manual fix)

Regenerated pipelines with correct variable_components.
UnmixColors has PURE_2D contract, which means it receives (H, W) and processes
each site independently. PURE_2D with channel_operation would unstack dimension 0
and process each channel independently, which defeats the purpose.

Dimensional dataflow rule:
- PURE_2D contract → ALWAYS image_operation (processes each site independently)
- FLEXIBLE/PURE_3D contract → can be channel_operation or z_projection (processes dim 0 together)

Final categorizations:
- channel_operation (3): ColorToGray, GrayToColorRgb, MeasureColocalization
  All have FLEXIBLE contract and process multiple channels together
- z_projection (2): MakeProjection, Morphologicalskeleton
  Process z-stacks (volumetric data)
- image_operation (83): Everything else, including all PURE_2D functions
Key changes:
1. measure_colocalization: Added channel_1/channel_2 params for arbitrary N-channel input
2. gray_to_color_rgb: Added red/green/blue_channel params for arbitrary N-channel input
3. gray_to_color_cmyk: Added channel selection params for arbitrary N-channel input
4. Fixed @numpy decorator: Removed invalid contract=ProcessingContract.X usage
5. Removed unused ProcessingContract imports from all 88 functions
6. Rewrote __init__.py with dynamic function loading from contracts.json
7. Regenerated pipelines with correct variable_components

The dimensional dataflow compiler perspective:
- Dimension 0 can be of ARBITRARY size (1, 2, 3, 4, 5, ... N)
- Functions should parameterize channel selection, not hardcode indices
- ProcessingContract is orthogonal to variable_components
1. Removed unused ProcessingContract import from header template
2. Removed duplicate imports in header template
3. Changed to dynamic function loading with get_function()
4. Fixed measurecolocalization parameter mapping:
   - 'Select images to measure' -> (pipeline-handled) (requires pipeline context)
   - 'Run all metrics?' -> (pipeline-handled) (multi-param not auto-mappable)
5. Regenerated ExampleFly and ExampleHuman pipelines with clean parameters
1. Removed duplicate parameter mapping from _outline helper function
2. Added correct mapping to identify_tertiary_objects docstring
3. Object selection settings ('Select the larger/smaller identified objects')
   are now (pipeline-handled) since they're @special_inputs
4. Only shrink_primary is an actual function parameter

In OpenHCS, @special_inputs are wired at compile time by name matching,
not passed as string parameters. CellProfiler's object naming convention
doesn't map directly to function kwargs.
- Categorized all 88 absorbed functions into FLEXIBLE vs PURE_3D contracts
- Identified critical architectural issues:
  * Contract mismatch (PURE_2D vs PURE_3D)
  * Tuple handling bug in _execute_pure_2d
  * Inconsistent special outputs format
- Created phased refactoring plan with timeline and risk mitigation
- Documented 14 FLEXIBLE functions (support true 3D + slice-by-slice)
- Documented 74 PURE_3D functions (always internal slicing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comprehensive design document covering:
- Architecture comparison (CellProfiler vs OpenHCS)
- Identified abstraction leaks (A1-A3, B1-B4, C1-C4)
- What we're certain about (contract system, aggregation orthogonality)
- Design proposal: AggregationSpec and compile-time symbol resolution
- Implementation phases
- Open questions for further discussion
Detailed mapping of:
- Core concept mapping (pipeline, data containers, object model)
- Semantic gaps requiring new concepts (ObjectRegistry, etc.)
- Adapter layer design for CellProfiler modules
- ProcessingContract mapping
- Measurement naming conventions
- Settings system mapping
- Abstraction leak analysis
…sign doc

Includes:
- Essential files to read (OpenHCS core + CellProfiler integration)
- Detailed execution flow diagram
- ProcessingContract implementation with code snippets
- Special outputs system explanation
- CellProfiler workspace structure
- Absorbed function patterns (current buggy vs required)
- Key terms glossary
- Quick reference: what to read when
Introduce typed artifact contracts and invocation plans, split FunctionStep execution into focused runtime/output/materialization modules, and move artifact declaration extraction out of path planning.

Rename in-tree special input/output decorators to artifact inputs/outputs, clean stale contract naming, update external submodule pointers, and keep the ZMQ integration test harness isolated from conftest startup.
Replace the runtime plan facade over compiled dicts with typed input-conversion and materialized-output sections, so FunctionStep execution no longer depends on plan.raw access.

Fix artifact materialization filenames to use the typed pipeline position and add unit coverage for the snapshot boundary.
Update materialized analysis result paths and stored materialization config when path collision resolution rewrites a step subdirectory.

Rename the stale artifact bridge variable and add unit coverage for collision path rewrites.
Move disabled-function stripping and kwargs/artifact-input injection out of path planning and into the function-pattern module that owns callable pattern semantics.

This removes path planner responsibility for mutating callable shapes without adding a wrapper around compiled plan dictionaries.
trissim added 30 commits May 4, 2026 01:06
Document the target CellProfiler interop architecture now that parity work has crossed into product semantics.

The plan locks the ownership boundary between OpenHCS interop, benchmark orchestration, PyQt import, runtime equivalence, and accelerated backend strategy work.

It records the non-negotiable invariants: no semantic heuristics, no silent fallback, no stringly backend selectors, normal OpenHCS pipeline output from .cppipe import, typed runtime equivalence, and semantics-preserving acceleration.
Move stable CellProfiler measurement dialect and measurement target-scope semantics under openhcs.interop.cellprofiler as the product-facing owner.

Keep benchmark.cellprofiler_compat compatibility shims so existing benchmark imports continue to work while active imports move to the OpenHCS interop namespace.

Verification: ../openhcs/.venv/bin/python -m pytest tests/unit/test_cellprofiler_interop_namespace.py tests/unit/test_runtime_equivalence.py tests/unit/test_cellprofiler_module_execution.py tests/unit/test_settings_binder.py -q

Result: 230 passed in 7.09s.
Move stable CellProfiler converter contracts into the product interop namespace while keeping benchmark compatibility shims at the old converter paths. Add product import/provenance records, compiler registry, explicit VFS-aware import requests, and PyQt .cppipe loading through the registered compiler provider instead of direct benchmark imports.

Introduce typed runtime invocation records, CellProfiler-specific invocation records, structural VFS protocol typing, and benchmark phase timing traces. Add VFS-aware CPPipe parsing and generated-pipeline saves without silent backend fallback or stringly backend selection.

Split runtime equivalence into package-owned report, policy, key, cell, image, and table snapshot modules. Use Annotated non-negative policy markers for derived validation instead of parallel field lists, and keep NumPy materialization confined to explicit snapshot boundaries.

Verification: ../openhcs/.venv/bin/python -m pytest tests/unit -q -> 805 passed, 1 warning in 28.90s.
Move RuntimeOutputSnapshot construction, output path discovery, table/image comparison helpers, and numeric RuntimeCellSignature counter equivalence into the equivalence package. Keep runtime_equivalence.py as the compatibility facade and measurement-projection owner.

Verification: ../openhcs/.venv/bin/python -m pytest tests/unit -q -> 808 passed, 1 warning in 26.63s.
Extract canonical array payload handling behind the OpenHCS memory/ArrayBridge surface so table and image equivalence do not own backend-specific hashing details.

Move measurement table dedupe payload semantics into equivalence.tables and replace runtime-local helpers with package-owned authorities.

Collapse repeated field-only frozen runtime records through a local frozen-slots record factory while preserving explicit generic and behavioral records.

Verification: ../openhcs/.venv/bin/python -m pytest tests/unit -q (809 passed, 1 warning).
Centralize runtime row projection construction without weakening the generic projection record.

Move measurement row identity and wide-table classification into equivalence.tables so runtime_equivalence delegates table semantics to package-owned helpers.

Verification: ../openhcs/.venv/bin/python -m pytest tests/unit -q (809 passed, 1 warning).
Move measurement row identity and qualifier rendering into equivalence.measurement_rows so runtime_equivalence delegates row dialect semantics to a package-owned authority.

Move generic measurement fact recording and spatial-grid fact projection into equivalence.measurement_facts, keeping runtime_equivalence focused on artifact orchestration and higher-order projection.

Verification: focused runtime equivalence tests passed with PYTEST_DISABLE_PLUGIN_AUTOLOAD=1. Full unit collection is still blocked by pyqt_reactive attempting to write /home/ts/.local/share/pyqt_reactive/logs/performance.log in the sandbox.
Set deterministic PYTHONHASHSEED for native CellProfiler reference runs so CP3 pipeline upgrades produce reproducible measurement orientation. Record the seed in adapter provenance.

Add a configurable speedup target to the CellProfiler-vs-OpenHCS comparison summary and CLI so benchmark runs can report whether each cppipe meets the 5x goal.

Verification: tests/unit/test_cellprofiler_adapter.py, tests/unit/test_cellprofiler_comparison.py, focused colocalization unit tests, and ExampleFly CP3 comparison with difference_count=0.
- add explicit Numba-backed CellProfiler measurement and segmentation backend paths\n- tighten runtime semantics, materialization planning, and payload integration\n- cache CellProfiler function contracts and reduce generic row/slice rewrite overhead\n- add benchmark instrumentation for execution/phase timing and speedup reporting\n- update PolyStore submodule to reduce VFS debug log noise\n\nVerification: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 ../openhcs/.venv/bin/python -m pytest tests/unit/test_cellprofiler_processing_backend.py tests/unit/test_cellprofiler_module_execution.py tests/unit/test_cellprofiler_generated_pipeline_execution.py tests/unit/test_materialization_core.py tests/unit/test_materialization_flag_planner.py tests/unit/test_runtime_equivalence.py -q (255 passed).
Add OpenHCS execution watchdogs and discard-output controls for benchmark runs.

Cache CellProfiler invocation metadata, backend resolution, source candidates, and pipeline-start source payloads to cut repeated runtime plumbing overhead.

Track unit-interval intensity provenance through runtime payload metadata and use it for threshold and colocalization diagnostics without heuristic dtype guessing.

Accelerate CellProfiler-compatible threshold diagnostics, colocalization Costes/RWC, fit-polynomial illumination, alignment, Zernike, morphology, and object modules with explicit backend seams and Numba paths.

Add fixture-capture tooling for profiling backend kernels and focused tests for runtime metadata propagation.

Verified ExampleColocalization reaches 4.036x execution speedup with 5/5 parity against native CellProfiler reference.
Preserve runtime image execution mode metadata through FunctionReference so generated CellProfiler wrappers can carry compiler-visible stack execution semantics.

Add value-only SaveImages pruning and validation suppression for benchmark runs that compare only runtime artifacts, avoiding unnecessary image export materialization.

Cache runtime artifact queries, adapter lookups, callable signature metadata, CellProfiler strategy selectors, and source-binding selectors to reduce repeated plumbing in parity execution.

Extend Distance-B secondary object propagation with explicit max-distance semantics and keep threshold diagnostics aligned with CellProfiler parity behavior.

Tests: PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 ... pytest tests/unit/test_callable_contract.py tests/unit/test_cellprofiler_generated_pipeline_execution.py tests/unit/test_runtime_artifact_queries.py -q -k 'callable_contract or saveimages or value_only or runtime_measurement or generated'
Collapse redundant artifact-driven pattern anchors in the FunctionStep executor so runtime-artifact modules execute once per semantic variable-component group instead of once per incidental channel anchor.

Add batched Numba-backed CellProfiler compatibility paths for tertiary object and intensity measurement work, and keep dense object-domain measurement semantics aligned with runtime adapters.

Verification: Vitra parity green; execution 7.98s CP vs 2.43s OpenHCS (3.29x). Focused measure-object-intensity tests pass.
Pass typed object-label payloads into CalculateMath operand resolution so runtime semantics keep declared dense object domains and avoid redundant label scans.

Verification: Vitra parity green; repeat-2 median execution 7.98s CP vs 1.97s OpenHCS (4.10x).
Add explicit runtime artifact materialization control so benchmark execution can validate typed measurement records without timing persistent CSV/table export writes. Keep default OpenHCS behavior materializing artifacts, while the benchmark adapter disables persistent artifact export for raw execution timing.

Speed up the CP compatibility path with batch/pure-call metadata, OpenCV smoothing backend selection, faster runtime measurement queries, cached source/runtime payload lookups, and shared measurement helpers across CellProfiler modules.

Preserve semantic correctness by making measurement-table ownership and source qualification explicit: object-owned rows are not incorrectly filtered by image-source qualifiers, and heterogeneous RelateObjects tables are scanned for row-level object ownership instead of relying on first-row shape.

Validation evidence: python -m py_compile over all modified Python files; focused ExampleColocalization parity=1.0 speedup=6.48x; full-suite run before the Colocalization query fix had all other completed official CP3 cases parity=1.0 and above 4x execution speedup, with ExampleVitra at 4.42x.
Introduce typed AnalysisTableSource and AnalysisWellResolver boundaries so consolidation consumes semantic table records instead of embedding CSV scanning, well discovery, summarization, column ordering, and writing in one orchestration function.

Route well resolution through OpenHCS filename parser semantics by default, with an explicit legacy configured-well resolver only when callers pass well_ids. Remove A01-style regex extraction from the PyQt consolidation path.

Validation: py_compile for consolidation and PyQt main; parser-backed smoke with ImageXpressFilenameParser; auto-detect parser smoke through FilenameParser registry; nominal-refactor-advisor reports no findings for the consolidation module.
Add a converted-cppipe throughput benchmark that runs independent OpenHCS jobs across sample-count and worker-count sweeps, records per-job and per-batch CSVs, and preserves optional native-reference equivalence checks.

Reuse the CP/OH grouped-bar plotting pipeline for throughput figures so lab-meeting charts share one visual grammar. Throughput speedup is computed against native CellProfiler when a CP/OH summary CSV is provided: replicas * native_cp_seconds / OpenHCS batch wall seconds.

Also record process-tree RSS in comparison summaries and route the existing CP/OH plotting CLI through the reusable report module.
Add a plot-only CLI for cppipe throughput results so figures can be regenerated from throughput_batches.csv without rerunning benchmarks.

Emit separate speedup charts versus native CellProfiler and versus OpenHCS 1-job execution to avoid mixing publication comparisons with within-platform scaling.
Add plot-only throughput filters for readable lab figures and snapshot the current CP-vs-OpenHCS and partial throughput CSVs used for lab-meeting discussion.

Document that the current throughput snapshot measures independent adapter invocations, not native well-level OpenHCS multiprocessing.
Copy the v7 CP-vs-OpenHCS figure package into benchmark results so the lab-meeting artifacts are versioned with the summary CSVs.
Add source-schema well expansion through OpenHCS virtual workspace metadata so repeated sample/well benchmarks can reuse source images without copying files. Preserve virtual-path metadata identity when multiple virtual wells map to the same real source file, avoiding real-path metadata collisions. Materialize generated cppipe modules under their deterministic import names so multiprocessing workers can import and register generated functions generically. Add a native OpenHCS well-level throughput benchmark CLI and preserve the ExampleColocalization 2-vs-16 well result checkpoint.
Add shared headless CPU benchmark runtime setup with native thread caps and fail-loud log-level validation. Add a typed OpenHCS multiprocessing start-method config so CPU-only benchmark runs can use fork while CUDA-safe spawn remains the default. Avoid eager worker registry/GPU initialization in CPU-only workers; function references resolve lazily through the registry boundary when needed. Capture well-throughput progress, worker-lane, and step timing sidecars, and checkpoint the fresh 24-well ExampleColocalization 1/2/3-worker results showing 3-worker execution speedup improved to 2.20x.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant