Restrict closed/open training workloads to {unet3d, retinanet} (depends on #432)#439
Draft
FileSystemGuy wants to merge 2 commits into
Draft
Conversation
Rules.md 2.1.11 now enumerates only "unet3d" and "retinanet" as the
official training workloads for closed/open submission. Propagate the
new set through the runtime validator, submission validator, dataset
constants, error suggestions, and result directory examples. The
whatif mode keeps the full six-model exploration set unchanged
(unet3d, retinanet, cosmoflow, resnet50, dlrm, flux).
- Rules.md: text rule + Closed/Open result tree examples
- submission_checker:
- constants: NUM_DATASET_{TRAIN,EVAL}_{FILES,FOLDERS} keyed by
{unet3d, retinanet}
- submission_structure_checks STRUCT-11: valid workload set,
docstring, violation message
- training_checks 3.3.1: comments / skip-diagnostic message
- configuration: comment example
- rules/submission_checkers/training: supported_models now
MODELS_CLOSED instead of full MODELS superset
- validation_helpers: --model suggestion text
- reporting/directory_validator: example list
- rules_legacy: docstring example
- tests: update TrainingSubmissionRulesChecker supported_models test
to assert the new closed/open set
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
The illustrative cross-workload example referred to "ResNet-50 training
task" → "3D-Unet training task". ResNet-50 is no longer in the
closed/open training workload set (Rules.md 2.1.11, {unet3d,
retinanet}). Replace with RetinaNet → 3D-Unet so the example uses
current workloads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Update the official training-workload set for closed/open submission from
{unet3d, resnet50, cosmoflow}to{unet3d, retinanet}(Rules.md 2.1.11). Propagate the new set through the runtime validator, submission validator, dataset constants, error suggestions, result directory examples, and unit tests.The
whatifexploration mode is unchanged and still accepts the full superset:unet3d, retinanet, cosmoflow, resnet50, dlrm, flux.Dependency
This PR depends on #432 and is stacked on top of
FileSystemGuy-rules-validator. The base will be retargeted tomainonce #432 merges. Reviewing now: please review only the diff specific to this PR (the GitHub UI will show this automatically while #432 is open).Marked draft until #432 is merged.
Changes
Rules.md— 2.1.11 text rule + Closed/Open result tree examples reduced to the two-workload set.mlpstorage_py/submission_checker/constants.py:NUM_DATASET_{TRAIN,EVAL}_{FILES,FOLDERS}keyed by{unet3d, retinanet}.checks/submission_structure_checks.py:_VALID_TRAINING_WORKLOADS, STRUCT-11 docstring, violation message.checks/training_checks.py(3.3.1): updated comments / skip-diagnostic message.configuration/configuration.py: updated comment example.mlpstorage_py/rules/submission_checkers/training.py:supported_models = MODELS_CLOSED(wasMODELS, the whatif superset).mlpstorage_py/validation_helpers.py:--modelsuggestion now mentions(unet3d or retinanet).mlpstorage_py/reporting/directory_validator.py: example workload list.mlpstorage_py/rules_legacy.py: docstring example.tests/unit/test_rules_checkers.py: updatedTrainingSubmissionRulesChecker.supported_modelstest to assert the new closed/open set and explicitly rejectresnet50/cosmoflow.Out of scope (intentionally unchanged)
MODELS = [cosmoflow, resnet50, unet3d, dlrm, retinanet, flux]constant — still the full whatif superset.whatifCLI choices and the whatif test parameterizations.training/README.mdand the top-levelREADME.md— already reflect the new set.Test plan
pytest tests/unitagainst changed-area suites: 257 passed (rules_checkers, config, validation_helpers, cli_parser, parser_modes, help_behavior).pyarrow/numpynot installed locally): 1172 passed, 2 skipped, 1 unrelated pre-existing failure (test_version_lookup_uses_correct_distribution_name— needspip install -e .).mlpstorage closed training --help→ showsunet3d | retinanet;mlpstorage whatif training --help→ still shows all six.