Virtual Context bug on parent id and FAIL checkpoint emitting#364
Open
Virtual Context bug on parent id and FAIL checkpoint emitting#364
Conversation
Virtual child contexts (FLAT-mode map/parallel branches) no longer write FAIL checkpoints when the user function raises. The branch is a logical scope only; it does not appear in the execution history regardless of outcome, aligning with the JS reference SDK. Also fixes an incoherent state when a user set ChildConfig.is_virtual=True via run_in_child_context: lifecycle checkpoints were suppressed, but the child context's _parent_id was the child's own operation id (never announced in the checkpoint stream), so inner operations stamped a parent_id pointing to a dangling reference. Nesting produced a chain of such references. The two decisions (lifecycle suppression, parent-id propagation) are now coupled through ChildConfig.is_virtual. Refactor of the supporting mechanism: - Single source of truth for the virtual-vs-real decision. create_child_context computes two fields (_parent_id, _step_id_prefix) at construction; no per-operation-method knowledge is required. New operations just read self._parent_id and work correctly under both modes. - Field names match their roles. _parent_id is "the id my inner operations stamp as their parent"; _step_id_prefix is "how I prefix step ids". Each field has one job. - is_virtual is encapsulated in the context as a cached property. Callers opt in with create_child_context(..., is_virtual=True); the property makes the state inspectable. - Nested virtual-in-virtual matches the JS reference. A virtual child of a virtual parent inherits its parent's reporting ancestor, so chained FLAT layers collapse to the outermost non-virtual context without dangling parent-id references. - ChildConfig.is_virtual drives lifecycle-checkpoint suppression in ChildOperationExecutor (START, SUCCEED, FAIL) and, via run_in_child_context, the child context's own virtual-ness. The field remains public, matching the JS SDK's ChildConfig.virtualContext. - Fewer parameters threaded through. operation_identifier is gone from ConcurrentExecutor, MapExecutor, and ParallelExecutor constructors and from_items/from_callables; the concurrency layer no longer needs an OperationIdentifier to figure out the reporting parent. - Tests exercise the invariants directly with a real DurableContext and assert wire-format decisions, including nested-virtual-in- virtual coverage. Improves observability (no phantom FAILED CONTEXT entries for virtual branches), cost (no billable operation per failed virtual branch), and cross-SDK wire parity. fixes #362, fixes #363
Kiro and VS Code mangle the hatch interpreter path when it contains spaces, breaking "Select Interpreter". Document the `.venv` symlink workaround and split the existing VS Code section into Interpreter and Linting subsections.
zhongkechen
reviewed
Apr 30, 2026
| When both 'serdes' and 'item_serdes' are provided: | ||
| - item_serdes: Used for individual item results in child contexts | ||
| - serdes: Used for the entire BatchResult at handler level | ||
|
|
Contributor
There was a problem hiding this comment.
Let's remove is_virtual from ChildConfig. It's a mistake to expose this field to users. This is supposed to be used only by concurrency operations internally
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes:
Fixes two functional bugs in virtual child contexts and refactors the supporting mechanism.
#363 —
run_in_child_context(is_virtual=True)sends invalidparent_idto the backend. Onmain, inner operations inside a virtualrun_in_child_contextstamp the branch's own operation id as theirparent_id. The branch itself writes no START/SUCCEED underis_virtual=True, so that id does not exist in the execution history. The backend correctly rejects the checkpoint withInvalidParameterValueException(CHECKPOINT_INVALID_PARENT_OPERATION_ID), and the user's execution fails at the first inner checkpoint. Nesting virtual scopes (and FLATmap/parallelinside a virtual scope) compounds the failure. After this change, inner operations stamp the enclosing non-virtual ancestor, which is a real checkpointed parent.#362 — virtual branches wrote a FAIL checkpoint on exceptions. On
main, an exception inside a virtual branch causedChildOperationExecutor.executeto unconditionally write a FAIL checkpoint for the branch — producing a phantom FAILED CONTEXT entry in the execution history with no matching START. On the FLATmap/parallelpath this was cosmetic (the FAIL's parent id was the valid map/parallel op, the backend accepts a CONTEXT FAIL without a prior START); on therun_in_child_context(is_virtual=True)path it compounded #363 (the FAIL's parent id was the same orphan branch id, which the backend rejects). Either way it's incorrect: virtual branches should emit no lifecycle entries at all. After this change, virtual branches emit no lifecycle checkpoints; failures still propagate to the concurrency executor'sBatchResult(or to the caller ofrun_in_child_context), and completion-tolerance logic still applies.Mechanism changes (applied alongside the fixes):
create_child_contextcomputes two fields (_parent_id,_step_id_prefix) at construction; no per-operation-method knowledge is required. New operations just readself._parent_idand work correctly under both modes._parent_idis "the id my inner operations stamp as their parent";_step_id_prefixis "how I prefix step ids". Each field has one job.is_virtualis encapsulated in the context as a cached property. Callers opt in withcreate_child_context(..., is_virtual=True); the property makes the state inspectable.ChildConfig.is_virtualdrives lifecycle-checkpoint suppression inChildOperationExecutor(START, SUCCEED, FAIL) and, viarun_in_child_context, the child context's own virtual-ness. The field remains public, matching the JS SDK'sChildConfig.virtualContext.operation_identifieris gone fromConcurrentExecutor,MapExecutor, andParallelExecutorconstructors andfrom_items/from_callables; the concurrency layer no longer needs anOperationIdentifierto figure out the reporting parent.DurableContextand assert wire-format decisions, including nested-virtual-in-virtual coverage.Impact:
ctx.run_in_child_context(config=ChildConfig(is_virtual=True))without their execution being rejected at checkpoint time.map/parallelno longer emits phantom FAILED CONTEXT entries on branch failure.Also includes minor CONTRIBUTING.md improvement (hatch venv symlink tip for editors that mangle paths with spaces).
Issue #, if available:
Fixes #362, Fixes #363
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.