feat: add threshold statistics preprocessing step#9
feat: add threshold statistics preprocessing step#9vojtech-cifka wants to merge 38 commits intomasterfrom
Conversation
…coverage Joins per-tile ROI coverages from the tiling, tissue_stats, and qc_stats runs on (slide_id, x, y), then logs scalar stats (mean/std/min/max + quantiles) and survival curves stratified by dominant annotation class to MLflow, so coverage thresholds for downstream filtering can be picked from the train distribution.
…splits Loops over train/test, joins each split's tiling/tissue/qc artifacts via templated paths, and namespaces both metrics and artifact directories by split so train and test distributions can be compared in MLflow.
…iling + tissue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inear plots Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new preprocessing script, threshold_stats.py, designed to analyze and visualize tile coverage statistics for dataset splits. It includes new configuration files, a job submission script, and updates to project dependencies. Review feedback highlights the need for a guard against empty arrays in statistical computations, an adjustment to the baseline for log-scale histograms to prevent Matplotlib rendering issues, and the removal of the unused duckdb dependency.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
preprocessing/threshold_stats.py— a QC analysis script that runs on filtered tiles and produces per-split (train/test) coverage statistics and plots logged to MLflowconfigs/preprocessing/threshold_stats.yaml,configs/experiment/preprocessing/threshold_stats_standard.yaml, andscripts/submit_threshold_stats.pyconfigs/data/dataset.yamlwithfilter_tiles_run_idNotes