Skip to content

fix(workflows): auto-wrap bare workflow definitions in {"specification": ...}#460

Open
imbgar-roboflow wants to merge 2 commits intoroboflow:mainfrom
imbgar-roboflow:fix/workflow-specification-autowrap
Open

fix(workflows): auto-wrap bare workflow definitions in {"specification": ...}#460
imbgar-roboflow wants to merge 2 commits intoroboflow:mainfrom
imbgar-roboflow:fix/workflow-specification-autowrap

Conversation

@imbgar-roboflow
Copy link
Copy Markdown

@imbgar-roboflow imbgar-roboflow commented Apr 21, 2026

Why

roboflow workflow create --definition <file> and Workspace.create_workflow(name, definition) dump the user's JSON verbatim into the backend's config field. That breaks execution via the Roboflow Inference server, which parses config and then does:

# inference.core.roboflow_api:749
workflow_config = json.loads(response["workflow"]["config"])
specification = workflow_config["specification"]
# -> KeyError -> MalformedWorkflowResponseError -> HTTP 502

User-facing Workflows JSON — as published at the Workflows docs, in inference/development/workflows_examples/*, and in the in-repo integration tests (e.g. LEGACY_DETECTION_PLUS_CLASSIFICATION_WORKFLOW) — is the flat shape {"version", "inputs", "steps", "outputs"}. The web app silently wraps it in {"specification": ...} before POST so the inference server can unwrap it. Nothing in the SDK/CLI did the same, so every CLI user who pasted the documented shape and then invoked /infer/workflows/... hit the 502 on first execution.

Reproduced on latest roboflow==1.3.3 against a fresh Roboflow hosted deployment (EU staging). CLI-created workflow returned:

HTTP 502 {"message": "Internal error. Request to Roboflow API failed."}

…and the inference server's own logs showed:

MalformedWorkflowResponseError: Workflow specification not found in Roboflow API response
  at /app/inference/core/roboflow_api.py:749

Re-saving the same definition wrapped in {"specification": ...} made execution work end-to-end (15/15 test images, p50 ~1.2s).

What

  • New helper _normalize_workflow_config(config) on rfapi that:
    • Wraps a bare workflow spec (dict or JSON string) in {"specification": ...}.
    • Passes dicts already containing a top-level specification key through unchanged.
    • Leaves dicts without any of version / inputs / steps / outputs at the top level alone, so callers sending custom payloads aren't second-guessed.
    • Passes non-JSON strings through verbatim.
    • Preserves None -> "{}" (the existing "create empty workflow" default).
  • create_workflow and update_workflow in roboflow.adapters.rfapi now call the helper on config.

Back-compat: no existing public signature changed. All 469 existing tests still pass. New unit tests cover bare-dict wrap, bare-JSON-string wrap, already-wrapped no-op, non-workflow-dict passthrough, and the empty / None / non-JSON edge cases on _normalize_workflow_config directly.

Not included (intentional)

This PR only fixes the write path. Workflows stored with the bare shape before this fix will still 502 until re-saved. The read-side symmetry (inference.core.roboflow_api.get_workflow_specification accepting either shape) is orthogonal and belongs in roboflow/inference. Skipped here because the set of already-broken flat workflows in the wild is small and a one-line workflow update per affected workflow is sufficient.

We should also have schema validation for any objects we're allowing to pass through our CLI to the database but that can come as a later refinement.

Test plan

  • pytest tests/ -q --ignore=tests/manual — 469 passed, 1 skipped.
  • End-to-end manual validation: roboflow workflow create --definition <bare spec> → workflow stored with {"specification": ...} wrapper → POST /infer/workflows/<ws>/<name> returns 200 with predictions.

…n": ...}

`roboflow workflow create --definition <file>` and
`Workspace.create_workflow(name, definition)` previously dumped the user's
JSON verbatim into the backend's `config` field. That breaks execution via
the Roboflow Inference server, which parses `config` and then does:

    workflow_config = json.loads(response["workflow"]["config"])
    specification = workflow_config["specification"]
    # -> KeyError -> MalformedWorkflowResponseError -> HTTP 502

User-facing Workflows JSON — as published at
https://inference.roboflow.com/workflows/create_and_run/, in
`inference/development/workflows_examples/*`, and in the in-repo
integration tests (e.g. LEGACY_DETECTION_PLUS_CLASSIFICATION_WORKFLOW) —
is the flat shape {"version", "inputs", "steps", "outputs"}. The web app
silently wraps it in {"specification": ...} before POST so that the
inference server can unwrap it. Nothing in the SDK/CLI did the same,
so every CLI user who pasted the documented shape and then invoked
`/infer/workflows/...` hit the 502 on first execution.

Reproduced in EU staging (roboflow-eu-staging) against a freshly trained
yolov11n model: CLI-created workflow returned
`{"message": "Internal error. Request to Roboflow API failed."}` at the
inference endpoint; inference-server logs showed
`MalformedWorkflowResponseError: Workflow specification not found in
Roboflow API response` at roboflow_api.py:749. Re-saving the same
definition wrapped in {"specification": ...} made execution work end-to-
end (15/15 test-set images, p50 ~1.2s).

Fix is limited to the SDK write path and is backward compatible:
- New helper `_normalize_workflow_config(config)` on `rfapi` wraps a
  bare workflow spec (dict or JSON string) in {"specification": ...}.
  Dicts already containing a top-level `specification` key are passed
  through unchanged; dicts without any of version/inputs/steps/outputs
  at the top level are left alone so callers sending custom payloads
  aren't second-guessed; non-JSON strings pass through as-is; None
  continues to map to "{}" (preserving the "create empty workflow"
  default behaviour exercised by test_defaults_config_and_template).
- `create_workflow` and `update_workflow` in `roboflow.adapters.rfapi`
  now call the helper on `config`.

New tests cover bare-dict wrap, bare-JSON-string wrap, already-wrapped
no-op, non-workflow-dict passthrough, and the empty/None/non-JSON
edge cases in `_normalize_workflow_config` directly. Existing tests
pass unchanged — including the byte-for-byte string passthrough for
non-workflow JSON.
Addresses reviewer feedback on the workflow-spec auto-wrap:

1. Strip a leading UTF-8 BOM (``\ufeff``) before ``json.loads`` so
   configs saved from Windows editors are recognized as bare workflow
   specs and get wrapped. Previously they fell through the JSON-parse
   ``except`` branch and shipped to the backend unchanged, reproducing
   the original 502.

2. Serialize wrapped output with compact separators (``(",", ":")``)
   to match the shape the web app writes via ``JSON.stringify``.
   Downstream audit/diff tooling sees SDK- and UI-written rows as
   byte-identical when the logical content matches.

3. Additional unit tests cover the boundaries reviewers flagged:
   already-wrapped JSON string passthrough, partial workflow dicts
   (single workflow-shaped key), JSON array / scalar input
   (guards the ``isinstance(parsed, dict)`` check), UTF-8 BOM
   auto-wrap, and the compact-separators invariant.

4. Rename ``test_empty_dict_preserved`` to
   ``test_empty_dict_serialized_to_empty_json`` with a comment noting
   the behavior is ``json.dumps({}) == "{}"`` rather than an explicit
   preservation branch.

5. Hoist ``_normalize_workflow_config`` and ``json`` imports to
   module level in the test file; individual tests no longer repeat
   the import.

475 passed, 1 skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant