[ENH] V1 -> V2 Migration - Flows (module)#1609
[ENH] V1 -> V2 Migration - Flows (module)#1609Omswastik-11 wants to merge 301 commits intoopenml:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1609 +/- ##
==========================================
- Coverage 54.67% 54.67% -0.01%
==========================================
Files 63 63
Lines 5108 5149 +41
==========================================
+ Hits 2793 2815 +22
- Misses 2315 2334 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
614411f to
36184e5
Compare
|
Hi @geetu040 !! Can you review the pre-commit failure It was due to merge conflicts more specifically for tasks . Should I change it on my branch ? |
can you try again, sync your branch with mine? It should be fixed now. |
|
I think that due to conflicts it did not synced properly . I have to revert it manually |
geetu040
left a comment
There was a problem hiding this comment.
Nice overall, few changes needed. I'll look at the tests later when the implementation is final.
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
Signed-off-by: Omswastik-11 <omswastikpanda11@gmail.com>
This reverts commit 2896b41.
…-11/openml-python into flow-migration-stacked
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data = {"name": name, "external_version": external_version, "api_key": self._http.api_key} | ||
| xml_response = self._http.post("flow/exists", data=data).text |
There was a problem hiding this comment.
FlowV1API.exists() calls HTTPClient.post() with the default use_api_key=True, which raises OpenMLAuthenticationError when no API key is configured. The legacy flow/exists endpoint worked without an API key (it only included api_key when present), so this is a behavior regression for anonymous users. Consider calling self._http.post(..., use_api_key=False) and omit api_key from the payload (or make auth optional).
| data = {"name": name, "external_version": external_version, "api_key": self._http.api_key} | |
| xml_response = self._http.post("flow/exists", data=data).text | |
| data = {"name": name, "external_version": external_version} | |
| if self._http.api_key: | |
| data["api_key"] = self._http.api_key | |
| xml_response = self._http.post("flow/exists", data=data, use_api_key=False).text |
| if not (isinstance(name, str) and len(name) > 0): | ||
| raise ValueError("Argument 'name' should be a non-empty string") | ||
| if not (isinstance(external_version, str) and len(external_version) > 0): | ||
| raise ValueError("Argument 'version' should be a non-empty string") | ||
|
|
||
| try: | ||
| response = self._http.get(f"flows/exists/{name}/{external_version}/") | ||
| result = response.json() |
There was a problem hiding this comment.
FlowV2API.exists() interpolates name and external_version directly into the URL path. These values can contain characters that require URL escaping, which can break requests or cause ambiguous routing. Quote/escape each path segment (e.g., urllib.parse.quote) or send them as query parameters if supported by the v2 endpoint.
| import xmltodict | ||
|
|
||
| import openml | ||
| import openml._api_calls |
There was a problem hiding this comment.
openml._api_calls is imported but no longer used in this module after switching to the new backend APIs. This will fail linting/static checks; please remove the unused import.
| import openml._api_calls |
| from openml._api_calls import _perform_api_call | ||
| from openml.testing import SimpleImputer, TestBase | ||
| import requests | ||
| from openml.testing import SimpleImputer, TestBase, create_request_response |
There was a problem hiding this comment.
Unused import: create_request_response is imported but not used anywhere in this test file. Please remove it to keep the test module clean and avoid lint failures.
| from openml.testing import SimpleImputer, TestBase, create_request_response | |
| from openml.testing import SimpleImputer, TestBase |
| mock_request.assert_called_once_with( | ||
| method="POST", | ||
| url=openml.config.server + "flow/exists", | ||
| params={}, | ||
| data={ | ||
| "name": flow_name, | ||
| "external_version": external_version, | ||
| "api_key": test_apikey_v1, | ||
| }, | ||
| headers=openml.config._HEADERS, | ||
| files=None, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
This test asserts that FlowV1API.exists() sends api_key inside the POST body. If FlowV1API.exists() is adjusted to not require authentication (and rely on HTTPClient to include api_key only when configured), this assertion will become incorrect. Consider relaxing the assertion to not require api_key for the exists call (or explicitly set use_api_key=False in the implementation and test for that behavior).
| mock_request.assert_called_once_with( | |
| method="POST", | |
| url=openml.config.server + "flow/exists", | |
| params={}, | |
| data={ | |
| "name": flow_name, | |
| "external_version": external_version, | |
| "api_key": test_apikey_v1, | |
| }, | |
| headers=openml.config._HEADERS, | |
| files=None, | |
| ) | |
| mock_request.assert_called_once() | |
| _, kwargs = mock_request.call_args | |
| assert kwargs["method"] == "POST" | |
| assert kwargs["url"] == openml.config.server + "flow/exists" | |
| assert kwargs["params"] == {} | |
| assert kwargs["headers"] == openml.config._HEADERS | |
| assert kwargs["files"] is None | |
| assert kwargs["data"]["name"] == flow_name | |
| assert kwargs["data"]["external_version"] == external_version |
| ignore_cache : bool, default=False | ||
| Whether to ignore the cache. If ``true`` this will download and overwrite the flow xml | ||
| even if the requested flow is already cached. | ||
|
|
There was a problem hiding this comment.
Docstring around ignore_cache/caching behavior still refers to downloading/overwriting a flow.xml file in cache_directory/flows/{id}. Since get_flow() now delegates to HTTPClient caching (request-path-based cache entries), please update the docstring wording to match the new cache mechanism (or avoid specifying the on-disk layout).
| flow_exists_mock.return_value = False | ||
| ).encode() | ||
| mock_request.return_value = response | ||
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish |
There was a problem hiding this comment.
flow_exists() is documented/typed to return int | bool (with False indicating non-existence). This test sets flow_exists_mock.return_value = None, which is outside the contract and can mask type/logic issues. Please return False here to match the real API behavior and keep the test faithful to production semantics.
| flow_exists_mock.return_value = None # Flow doesn't exist yet, so try to publish | |
| flow_exists_mock.return_value = False # Flow doesn't exist yet, so try to publish |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except (OpenMLServerError, KeyError): | ||
| # v2 returns 404 when flow doesn't exist, which raises OpenMLServerError | ||
| return False |
There was a problem hiding this comment.
FlowV2API.exists() currently catches OpenMLServerError and returns False, which will also hide real failures (e.g., authentication/authorization problems, 500s) by reporting "flow does not exist". Only treat the specific "not found" case as non-existence (e.g., catch OpenMLServerNoResult / an OpenMLServerException with the v2 not-found code or 404) and re-raise all other exceptions.
| except (OpenMLServerError, KeyError): | |
| # v2 returns 404 when flow doesn't exist, which raises OpenMLServerError | |
| return False | |
| except OpenMLServerError as err: | |
| # v2 returns HTTP 404 when the flow doesn't exist; only treat that case | |
| # as "not found" and propagate all other server errors. | |
| if getattr(err, "code", None) == 404: | |
| return False | |
| raise |
| if self.flow_id is None: | ||
| raise ValueError("Flow does not have an ID. Please publish the flow before untagging.") | ||
| openml._backend.flow.untag(self.flow_id, tag) |
There was a problem hiding this comment.
Same as push_tag(): raising ValueError for an unpublished flow changes the public exception contract compared to OpenMLBase.remove_tag() (which raises ObjectNotPublishedError). Align the exception type/message (or delegate to the base implementation) for consistency across entities.
| flow_id: int, | ||
| reinstantiate: bool = False, # noqa: FBT002 | ||
| strict_version: bool = True, # noqa: FBT002 | ||
| ignore_cache: bool = False, # noqa: FBT002 |
There was a problem hiding this comment.
I would not recommend adding this parameter: ignore_cache
| ignore_cache: bool = False, # noqa: FBT002 |
| ignore_cache : bool, default=False | ||
| Whether to ignore the cache. If ``true`` this will download and overwrite the flow xml | ||
| even if the requested flow is already cached. | ||
|
|
There was a problem hiding this comment.
as said above
| """ | ||
| flow_id = int(flow_id) | ||
| flow = _get_flow_description(flow_id) | ||
| flow = openml._backend.flow.get(flow_id, reset_cache=ignore_cache) |
There was a problem hiding this comment.
as said above
| flow = openml._backend.flow.get(flow_id, reset_cache=ignore_cache) | |
| flow = openml._backend.flow.get(flow_id) |
|
|
||
| return _create_flow_from_xml(flow_xml) | ||
| return cast("OpenMLFlow", new_flow) | ||
| return cast("OpenMLFlow", flow) |
There was a problem hiding this comment.
no need for explicit cast now
| return cast("OpenMLFlow", flow) | |
| return new_flow | |
| return flow |
| if not (isinstance(name, str) and len(name) > 0): | ||
| raise ValueError("Argument 'name' should be a non-empty string") | ||
| if not (isinstance(name, str) and len(external_version) > 0): | ||
| if not (isinstance(external_version, str) and len(external_version) > 0): |
There was a problem hiding this comment.
| if not (isinstance(external_version, str) and len(external_version) > 0): | |
| if not (isinstance(name, str) and len(external_version) > 0): |
| @mock.patch.object(requests.Session, "delete") | ||
| def test_delete_flow_not_owned(mock_delete, test_files_directory, test_server_v1, test_apikey_v1): | ||
| @mock.patch.object(requests.Session, "request") | ||
| def test_delete_flow_not_owned(mock_request, test_files_directory, test_apikey_v1): |
There was a problem hiding this comment.
- there is no need for
openml.config.start_using_configuration_for_example() - keep
test_server_v1in params - you have removed the check for url
| super().publish() | ||
| assert self.flow_id is not None # for mypy | ||
| flow_id = self.flow_id | ||
|
|
||
| file_elements = self._get_file_elements() | ||
| if "description" not in file_elements: | ||
| file_elements["description"] = self._to_xml() | ||
|
|
||
| # Use openml._backend.flow.publish which internally calls ResourceV1.publish | ||
| flow_id = openml._backend.flow.publish(path="flow", files=file_elements) | ||
| self.flow_id = flow_id |
There was a problem hiding this comment.
nice, this is how it's expected to work
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
openml/flows/functions.py:67
- get_flow’s docstring still describes the old per-flow XML cache (writing to cache_directory/flows/{flow_id}/flow.xml and raising OpenMLCacheException). The implementation now delegates to openml._backend.flow.get and uses HTTPClient caching, so the documented side effects/raised exceptions are no longer accurate; please update the docstring to match current behavior (and describe the new cache location/semantics if relevant).
"""Fetch an OpenMLFlow by its server-assigned ID.
Queries the OpenML REST API for the flow metadata and returns an
:class:`OpenMLFlow` instance. If the flow is already cached locally,
the cached copy is returned. Optionally the flow can be re-instantiated
into a concrete model instance using the registered extension.
Parameters
----------
flow_id : int
The OpenML flow id.
reinstantiate : bool, optional (default=False)
If True, convert the flow description into a concrete model instance
using the flow's extension (e.g., sklearn). If conversion fails and
``strict_version`` is True, an exception will be raised.
strict_version : bool, optional (default=True)
When ``reinstantiate`` is True, whether to enforce exact version
requirements for the extension/model. If False, a new flow may
be returned when versions differ.
ignore_cache : bool, default=False
Whether to ignore the cache. If ``true`` this will download and overwrite the flow xml
even if the requested flow is already cached.
Returns
-------
OpenMLFlow
The flow object with metadata; ``model`` may be populated when
``reinstantiate=True``.
Raises
------
OpenMLCacheException
When cached flow files are corrupted or cannot be read.
OpenMLServerException
When the REST API call fails.
Side Effects
------------
- Writes to ``openml.config.cache_directory/flows/{flow_id}/flow.xml``
when the flow is downloaded from the server.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_flow_v1_get(flow_v1): | ||
| flow = flow_v1.get(flow_id=1) | ||
| _assert_flow_shape(flow) | ||
|
|
||
|
|
||
| def test_flow_v1_list(flow_v1): | ||
| limit = 5 | ||
| flows_df = flow_v1.list(limit=limit) | ||
|
|
||
| assert len(flows_df) == limit | ||
| assert "id" in flows_df.columns | ||
| assert "name" in flows_df.columns | ||
| assert "version" in flows_df.columns | ||
| assert "external_version" in flows_df.columns | ||
| assert "full_name" in flows_df.columns | ||
| assert "uploader" in flows_df.columns | ||
|
|
||
|
|
||
| def test_flow_v1_list_with_offset(flow_v1): | ||
| limit = 5 | ||
| flows_df = flow_v1.list(limit=limit, offset=10) | ||
|
|
||
| assert len(flows_df) == limit | ||
|
|
There was a problem hiding this comment.
These tests call the real test server (flow_v1.get/list) but are missing the @pytest.mark.test_server() marker that other API tests use. Without the marker they’ll run in offline/unit-only environments and fail due to network dependency; either mark them as test_server or mock Session.request like the other tests in this file.
| response = requests.Response() | ||
| response.status_code = 200 | ||
| response._content = ( | ||
| "<oml:upload_flow>\n" " <oml:id>1</oml:id>\n" "</oml:upload_flow>" |
There was a problem hiding this comment.
The mocked upload XML in this test uses the oml: prefix but does not declare the xmlns:oml namespace, which makes the XML invalid and will cause xmltodict.parse() (used by ResourceV1API.publish) to fail with an “unbound prefix” parse error. Use a valid response payload (e.g., include xmlns:oml="http://openml.org/openml" like other tests do, or reuse create_request_response fixtures).
| "<oml:upload_flow>\n" " <oml:id>1</oml:id>\n" "</oml:upload_flow>" | |
| '<oml:upload_flow xmlns:oml="http://openml.org/openml">\n' | |
| " <oml:id>1</oml:id>\n" | |
| "</oml:upload_flow>" |
| from openml._api_calls import _perform_api_call | ||
| from openml.testing import SimpleImputer, TestBase | ||
| import requests | ||
| from openml.testing import SimpleImputer, TestBase, create_request_response |
There was a problem hiding this comment.
create_request_response is imported here but not used anywhere in this test module after the refactor. Please drop the unused import to keep the test file clean and avoid lint failures.
| from openml.testing import SimpleImputer, TestBase, create_request_response | |
| from openml.testing import SimpleImputer, TestBase |
| @@ -466,15 +467,15 @@ def test_delete_flow_not_owned(mock_delete, test_files_directory, test_server_v1 | |||
| ): | |||
| openml.flows.delete_flow(40_000) | |||
|
|
|||
| flow_url = test_server_v1 + "flow/40000" | |||
| assert flow_url == mock_delete.call_args.args[0] | |||
| assert test_apikey_v1 == mock_delete.call_args.kwargs.get("params", {}).get("api_key") | |||
| assert mock_request.call_args.kwargs.get("method") == "DELETE" | |||
| assert test_apikey_v1 == mock_request.call_args.kwargs.get("params", {}).get("api_key") | |||
There was a problem hiding this comment.
These delete_flow unit tests now only assert the HTTP method and api_key, but they no longer assert the request URL/path. Given other HTTPClient tests assert the full URL, this weakens coverage and could miss regressions in endpoint construction; also, start_using_configuration_for_example() emits warnings and is redundant here since the autouse fixture already configures test servers. Prefer asserting the url kwarg and using openml.config.use_test_servers()/existing fixtures instead of start_using_configuration_for_example().
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
openml/flows/functions.py:59
- The
get_flowdocstring still describes the legacy per-flow XML cache (openml.config.cache_directory/flows/{flow_id}/flow.xml) and listsOpenMLCacheException, butget_flownow delegates toopenml._backend.flow.get(...)which uses the HTTP client cache instead. Please update the docstring (side effects + raises section) to match the new caching/error behavior.
Queries the OpenML REST API for the flow metadata and returns an
:class:`OpenMLFlow` instance. If the flow is already cached locally,
the cached copy is returned. Optionally the flow can be re-instantiated
into a concrete model instance using the registered extension.
Parameters
----------
flow_id : int
The OpenML flow id.
reinstantiate : bool, optional (default=False)
If True, convert the flow description into a concrete model instance
using the flow's extension (e.g., sklearn). If conversion fails and
``strict_version`` is True, an exception will be raised.
strict_version : bool, optional (default=True)
When ``reinstantiate`` is True, whether to enforce exact version
requirements for the extension/model. If False, a new flow may
be returned when versions differ.
Returns
-------
OpenMLFlow
The flow object with metadata; ``model`` may be populated when
``reinstantiate=True``.
Raises
------
OpenMLCacheException
When cached flow files are corrupted or cannot be read.
OpenMLServerException
When the REST API call fails.
Side Effects
------------
- Writes to ``openml.config.cache_directory/flows/{flow_id}/flow.xml``
when the flow is downloaded from the server.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @staticmethod | ||
| def _convert_v2_to_v1_format(v2_json: dict[str, Any]) -> dict[str, dict]: | ||
| """Convert v2 JSON response to v1 XML-dict format for OpenMLFlow._from_dict(). | ||
|
|
There was a problem hiding this comment.
_convert_v2_to_v1_format is annotated as returning dict[str, dict], but the returned structure contains non-dict values (e.g., strings, lists). This type annotation is incorrect and can break static type checking; consider using dict[str, Any] or dict[str, dict[str, Any]] (or a dedicated TypedDict) instead.
| TestBase._mark_entity_for_removal("flow", flow.flow_id, flow.name) | ||
| TestBase.logger.info(f"collected from {__file__.split('/')[-1]}: {flow.flow_id}") | ||
|
|
||
|
|
There was a problem hiding this comment.
This blank line contains trailing whitespace. Please remove the whitespace to satisfy common linters (e.g., W293) and keep diffs clean.
| model = sklearn.ensemble.RandomForestClassifier() | ||
| flow = self.extension.model_to_flow(model) | ||
| api_call_mock.return_value = ( | ||
|
|
There was a problem hiding this comment.
This blank line contains trailing whitespace. Please remove the whitespace to satisfy common linters (e.g., W293) and keep diffs clean.
| get_flow_mock.return_value = flow | ||
|
|
||
| flow.publish() | ||
| # Not collecting flow_id for deletion since this is a test for failed upload |
There was a problem hiding this comment.
The comment says this is a test for a “failed upload”, but the mocked response is a successful <oml:upload_flow> with an id and flow.publish() is expected to succeed initially. Please update/remove the comment to reflect the actual test behavior.
| # Not collecting flow_id for deletion since this is a test for failed upload | |
| # The first publish is expected to succeed; cleanup is handled after the error scenario |
|
Hi @geetu040 . As per discussion in final mentoring meeting isn't it ready to merge ? |
Fixes #1601
added a
Createmethod inFlowAPIfor publishing flow but not refactored with oldpublish. (Needs discussion on this)Added tests using
fake_methodsso that we can test without localV2server . I have tested theFlowsV2methods (getandexists) anddeleteandlistwere not implemented inV2server so I skipped them .