[ENH] Reduce complexity of run_flow_on_task func#1596
[ENH] Reduce complexity of run_flow_on_task func#1596Omswastik-11 wants to merge 22 commits intoopenml:mainfrom
run_flow_on_task func#1596Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1596 +/- ##
==========================================
- Coverage 54.67% 54.53% -0.15%
==========================================
Files 63 63
Lines 5108 5129 +21
==========================================
+ Hits 2793 2797 +4
- Misses 2315 2332 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
geetu040
left a comment
There was a problem hiding this comment.
Looks really nice, I have left a few comments with only minor changes requested.
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
geetu040
left a comment
There was a problem hiding this comment.
Nicely refactored, LGTM.
CC: @fkiraly, @SimonBlanke for review/merge.
SimonBlanke
left a comment
There was a problem hiding this comment.
@Omswastik-11 Do you see a way to increase the test coverage here? This is not a hard requirement.
geetu040
left a comment
There was a problem hiding this comment.
Actually there is no unit test for the function openml.runs.run_flow_on_task, could you please add one. There are some tests that use openml.runs.run_flow_on_task internally, but it would be nice to have an independent test that only checks this functionality. You can add this test in tests/test_runs.
Also if the helper functions in openml.runs.run_flow_on_task can be tested at unit (suggested in #1596 (review)), that would be nice, but again, it's not a hard requirement.
fkiraly
left a comment
There was a problem hiding this comment.
Nice! I left some recommendations on how to further simplify the code flow.
geetu040
left a comment
There was a problem hiding this comment.
please resolve conflicts with main
There was a problem hiding this comment.
Pull request overview
This PR refactors the run_flow_on_task function to reduce complexity and improve maintainability. The function, which had grown to ~160 lines with high cyclomatic complexity, is now a clear orchestrator that delegates to well-defined helper functions.
Changes:
- Extracted four helper functions with single responsibilities: input validation, server synchronization, environment preparation, and run creation
- Improved type safety by replacing assert statements with explicit ValueError/TypeError exceptions
- Made boolean parameters keyword-only in helper functions for better API clarity
- Added unit tests for the new helper functions
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| openml/runs/functions.py | Refactored run_flow_on_task by extracting helper functions: _validate_flow_and_task_inputs, _sync_flow_with_server, _prepare_run_environment, and _create_run_from_results |
| tests/test_runs/test_run_functions.py | Added unit tests for _sync_flow_with_server and _create_run_from_results helper functions, and imported OrderedDict for test data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data_content, trace, fold_evaluations, sample_evaluations = _run_task_get_arffcontent( | ||
| model=flow.model, | ||
| task=task, | ||
| extension=flow.extension, | ||
| add_local_measures=add_local_measures, | ||
| n_jobs=n_jobs, | ||
| ) |
There was a problem hiding this comment.
The PR description mentions introducing a _RunResults NamedTuple to bundle execution outputs and reduce long parameter lists, but this NamedTuple is not present in the actual implementation. The function _run_task_get_arffcontent still returns a tuple that is unpacked directly in line 486. If the NamedTuple was intended but not implemented, consider either updating the PR description to match the implementation or implementing the NamedTuple as described.
There was a problem hiding this comment.
@Omswastik-11 update the PR description to remove this part.
geetu040
left a comment
There was a problem hiding this comment.
@fkiraly, can we merge this now? comments from your last review #1596 (review) are resolved.

Summary
This PR refactors
run_flow_on_task, which had grown to ~160 lines with high cyclomatic complexity, by extracting small helper functions with clear, single responsibilities. The main function is now a readable orchestrator with clearly defined steps.Changes
Extracted helper functions
_validate_flow_and_task_inputsHandles input validation and backward-compatible argument handling
_sync_flow_with_serverSynchronizes the flow with the server and checks for duplicate runs
_prepare_run_environmentPrepares environment information and run tags
_create_run_from_resultsBuilds the
OpenMLRunobject from execution resultsInternal structure improvements
_RunResultsNamedTupleto bundle execution outputs(
data_content,trace,evaluations) and reduce long parameter listsType Safety Improvements
assertstatements with explicitValueError/TypeErrorexceptionsNoneFixes #1580