Skip to content

Restrict closed/open training workloads to {unet3d, retinanet} (depends on #432)#439

Draft
FileSystemGuy wants to merge 2 commits into
FileSystemGuy-rules-validatorfrom
FileSystemGuy-training-to-new-models
Draft

Restrict closed/open training workloads to {unet3d, retinanet} (depends on #432)#439
FileSystemGuy wants to merge 2 commits into
FileSystemGuy-rules-validatorfrom
FileSystemGuy-training-to-new-models

Conversation

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Summary

Update the official training-workload set for closed/open submission from {unet3d, resnet50, cosmoflow} to {unet3d, retinanet} (Rules.md 2.1.11). Propagate the new set through the runtime validator, submission validator, dataset constants, error suggestions, result directory examples, and unit tests.

The whatif exploration mode is unchanged and still accepts the full superset: unet3d, retinanet, cosmoflow, resnet50, dlrm, flux.

Dependency

This PR depends on #432 and is stacked on top of FileSystemGuy-rules-validator. The base will be retargeted to main once #432 merges. Reviewing now: please review only the diff specific to this PR (the GitHub UI will show this automatically while #432 is open).

Marked draft until #432 is merged.

Changes

  • Rules.md — 2.1.11 text rule + Closed/Open result tree examples reduced to the two-workload set.
  • mlpstorage_py/submission_checker/
    • constants.py: NUM_DATASET_{TRAIN,EVAL}_{FILES,FOLDERS} keyed by {unet3d, retinanet}.
    • checks/submission_structure_checks.py: _VALID_TRAINING_WORKLOADS, STRUCT-11 docstring, violation message.
    • checks/training_checks.py (3.3.1): updated comments / skip-diagnostic message.
    • configuration/configuration.py: updated comment example.
  • mlpstorage_py/rules/submission_checkers/training.py: supported_models = MODELS_CLOSED (was MODELS, the whatif superset).
  • mlpstorage_py/validation_helpers.py: --model suggestion now mentions (unet3d or retinanet).
  • mlpstorage_py/reporting/directory_validator.py: example workload list.
  • mlpstorage_py/rules_legacy.py: docstring example.
  • tests/unit/test_rules_checkers.py: updated TrainingSubmissionRulesChecker.supported_models test to assert the new closed/open set and explicitly reject resnet50/cosmoflow.

Out of scope (intentionally unchanged)

  • MODELS = [cosmoflow, resnet50, unet3d, dlrm, retinanet, flux] constant — still the full whatif superset.
  • whatif CLI choices and the whatif test parameterizations.
  • training/README.md and the top-level README.md — already reflect the new set.

Test plan

  • pytest tests/unit against changed-area suites: 257 passed (rules_checkers, config, validation_helpers, cli_parser, parser_modes, help_behavior).
  • Full unit-test pass (skipping modules requiring pyarrow/numpy not installed locally): 1172 passed, 2 skipped, 1 unrelated pre-existing failure (test_version_lookup_uses_correct_distribution_name — needs pip install -e .).
  • CI to re-run on full env.
  • Manual: run mlpstorage closed training --help → shows unet3d | retinanet; mlpstorage whatif training --help → still shows all six.

Rules.md 2.1.11 now enumerates only "unet3d" and "retinanet" as the
official training workloads for closed/open submission. Propagate the
new set through the runtime validator, submission validator, dataset
constants, error suggestions, and result directory examples. The
whatif mode keeps the full six-model exploration set unchanged
(unet3d, retinanet, cosmoflow, resnet50, dlrm, flux).

- Rules.md: text rule + Closed/Open result tree examples
- submission_checker:
  - constants: NUM_DATASET_{TRAIN,EVAL}_{FILES,FOLDERS} keyed by
    {unet3d, retinanet}
  - submission_structure_checks STRUCT-11: valid workload set,
    docstring, violation message
  - training_checks 3.3.1: comments / skip-diagnostic message
  - configuration: comment example
- rules/submission_checkers/training: supported_models now
  MODELS_CLOSED instead of full MODELS superset
- validation_helpers: --model suggestion text
- reporting/directory_validator: example list
- rules_legacy: docstring example
- tests: update TrainingSubmissionRulesChecker supported_models test
  to assert the new closed/open set
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@FileSystemGuy FileSystemGuy changed the title Restrict closed/open training workloads to {unet3d, retinanet} Restrict closed/open training workloads to {unet3d, retinanet} (depends on #432) Jun 13, 2026
The illustrative cross-workload example referred to "ResNet-50 training
task" → "3D-Unet training task". ResNet-50 is no longer in the
closed/open training workload set (Rules.md 2.1.11, {unet3d,
retinanet}). Replace with RetinaNet → 3D-Unet so the example uses
current workloads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant