Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ It works with native iOS and Android apps, plus apps built with Expo, Flutter, a

## Capabilities

- **Inspect** real app UI through compact accessibility snapshots, interactive refs like `@e3`, selectors, and React Native component trees.
- **Inspect** real app UI through structured accessibility snapshots, interactive refs like `@e3`, selectors, and React Native component trees.
- **Interact** by opening apps, tapping, typing, scrolling, performing gestures, waiting, asserting state, handling alerts, and closing sessions.
- **Capture evidence** with screenshots, videos, logs, traces, network traffic, performance samples, crash context, and React profiles.
- **Replay workflows** by recording `.ad` scripts for local runs, CI, repeatable e2e checks, and strict Maestro YAML export when a flow needs to run in Maestro.
Expand Down
62 changes: 27 additions & 35 deletions docs/adr/0004-ios-snapshot-backend-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,23 @@

## Status

Accepted — implemented by the snapshot capture plan runner (RunnerTests+SnapshotCapturePlan.swift):
each strategy declares its backend chain, and a structured snapshot quality verdict makes
degraded or recovered output observable end to end.
Accepted. Amended after iOS snapshot capture was simplified to two public modes:
regular interactive snapshots and raw diagnostic snapshots.

The current implementation is owned by `RunnerTests+SnapshotCapturePlan.swift`. Capture plans
declare their XCTest backend chain, and structured snapshot quality verdicts make degraded or
recovered output observable end to end.

## Context

Agent Device exposes iOS UI state through snapshots produced by the long-lived XCTest runner. The
runner has three different snapshot needs:
runner has two durable snapshot needs:

- agent-facing regular context, where the important contract is the effective user-visible UI,
fixed controls such as tab bars, and scroll-hidden hints for content outside visible scroll
containers;
- rich diagnostics and selector disambiguation, where a raw recursive XCTest snapshot is useful
because it preserves hierarchy, static text, wrappers, scroll containers, and ancestry;
- agent-facing compact interactive context, where the important contract is fast, bounded discovery
of visible controls and stable refs for the next action.
because it preserves hierarchy, static text, wrappers, scroll containers, and ancestry.

These needs should not share one capture strategy blindly. Recursive `XCUIElement.snapshot()` is
rich, but some real simulator app trees can make XCTest fail with `kAXErrorIllegalArgument` while
Expand All @@ -36,60 +37,51 @@ predictable.
Keep XCTest as the default iOS automation runner and split iOS snapshot capture into explicit
strategies:

- **Regular visible strategy**: use recursive XCTest snapshots, but emit only the effective
user-visible tree plus visible ancestors and scroll-hidden hints. A node inside a scroll
container is user-visible only when it intersects both the app viewport and the nearest visible
scroll container. Offscreen descendants should be visited to set `hiddenContentAbove` /
`hiddenContentBelow`, not emitted as normal visible nodes. This strategy must not use an
arbitrary node-count cutoff: fixed controls that are later in traversal order, such as bottom tab
bars after long lists, are part of the visible UI contract.
- **Regular visible strategy**: use recursive XCTest snapshots, emit the effective user-visible
tree plus visible ancestors and scroll-hidden hints, and fall back through the capture plan when
XCTest returns sparse output. A node inside a scroll container is user-visible only when it
intersects both the app viewport and the nearest visible scroll container. Offscreen descendants
should be visited to set `hiddenContentAbove` / `hiddenContentBelow`, not emitted as normal
visible nodes. This strategy must not use an arbitrary node-count cutoff: fixed controls that are
later in traversal order, such as bottom tab bars after long lists, are part of the visible UI
contract.
- **Raw diagnostic strategy**: use recursive XCTest snapshots for raw snapshots, diagnostics, and
cases that need hierarchy. Raw output is allowed to be noisy and large; if the transport cannot
carry the response, fail explicitly instead of silently truncating the tree at a hard node count.
If XCTest reports a real AX serialization failure, preserve that error instead of pretending the
UI is empty.
- **Compact interactive strategy**: for `snapshot -i -c`, use a bounded flat XCTest query strategy
that avoids recursive root snapshots and app/window property reads. It should prefer fast,
one-screen actionability over hierarchy fidelity and should return a sparse root quickly when
XCTest cannot enumerate controls. Its bound is time-based, not a hidden fixed node budget.
- **Future simulator AX-service strategy**: treat Bluesky-class failures as evidence that XCTest is
not a complete semantic snapshot backend. A robust semantic fix should add a host-side simulator
accessibility backend, similar in role to `idb` accessibility commands or Argent's `ax-service`,
and normalize its output into the same `SnapshotNode` model. That backend can be simulator-only;
physical devices can continue using XCTest unless a supported lower-level API exists.

The daemon should make degraded compact output observable. If an iOS compact interactive snapshot
contains only the synthetic application root, surface a warning so agents know the snapshot is
bounded fallback output rather than proof that the screen has no controls.
The daemon should make degraded output observable. If an iOS interactive snapshot contains only the
application root or another sparse shape, surface a structured quality verdict and warning so
agents know the snapshot is degraded output rather than proof that the screen has no controls.

## Regression Notes

PR #639 made XCTest AX serialization failures explicit instead of swallowing them as empty
snapshots. That was the correct diagnostic change, but it exposed apps whose accessibility trees
XCTest cannot serialize.

The first compact fallback then still paid several XCTest reads (`app.label`, `app.identifier`,
`app.frame`, window frame lookup) before enumerating flat controls. On broken trees those reads can
hit the same AX failure path, which made `snapshot -i -c` much slower than the plain snapshot in
some apps. PR #700 changed compact interactive snapshots to enter the flat strategy immediately and
avoid those app/window reads.
Later work moved recovery into the regular visible capture plan so healthy apps keep the fast
recursive tree path while degraded simulator app classes can still return bounded, honest output
when fallback query tiers are the only available source of visible controls.

## Consequences

Compact interactive snapshots are allowed to be less complete than regular or raw snapshots, but
they must be bounded and honest. They should never block for the full daemon snapshot timeout
because one app has a pathological AX tree.

Regular snapshots remain the right tool for agents and Maestro compatibility because they describe
what a user can currently perceive and interact with. Raw snapshots remain the right tool when
hierarchy matters. Both may still fail loudly on XCTest-broken trees; that failure is useful because
retrying the same recursive capture is unlikely to reveal a different tree.
hierarchy matters. Both may still fail loudly on XCTest-broken trees; that failure is useful
because retrying the same recursive capture is unlikely to reveal a different tree.

A future AX-service backend is the correct place to regain Bluesky-class semantic coverage. It
should be added as a platform backend with its own lifecycle, protocol, normalization, timing
metrics, and fallback rules, not as another special case inside the XCTest runner.

When adding new iOS snapshot behavior, maintainers should first decide which strategy owns it. If a
change tries to make compact snapshots rich by reintroducing recursive snapshots, tries to make
regular snapshots fast by dropping visible controls behind a node budget, or tries to make raw
snapshots safe by silently truncating, it is probably crossing strategy boundaries.
change tries to make regular snapshots fast by dropping visible controls behind a node budget, or
tries to make raw snapshots safe by silently truncating, it is probably crossing strategy
boundaries.
Original file line number Diff line number Diff line change
Expand Up @@ -107,15 +107,14 @@ extension RunnerTests {
let typeName = elementTypeName(rawElementType: rawType)
let enabled = privateAXBool(rawNode["enabled"]) ?? true
let visible = isVisibleInViewport(rect, viewport)
let compactCandidate = privateAXFlatCompactCandidate(rawElementType: rawType)
let interactiveCandidate = privateAXInteractiveCandidate(rawElementType: rawType)
let filterDecision = flatSnapshotFilterDecision(
FlatSnapshotFilterNode(
isRoot: parentIndex == nil,
label: label,
identifier: identifier,
valueText: value.isEmpty ? nil : value,
visible: visible,
compactCandidate: compactCandidate
visible: visible
),
options: options,
insideMatchedScope: insideMatchedScope
Expand All @@ -137,7 +136,7 @@ extension RunnerTests {
enabled: enabled,
focused: privateAXBool(rawNode["focused"]) == true ? true : nil,
selected: privateAXBool(rawNode["selected"]) == true ? true : nil,
hittable: visible && enabled && compactCandidate,
hittable: visible && enabled && interactiveCandidate,
depth: depth,
parentIndex: parentIndex,
hiddenContentAbove: nil,
Expand Down Expand Up @@ -230,8 +229,7 @@ extension RunnerTests {
appendPrivateAXNode(
tree,
to: &nodes,
options: SnapshotOptions(
interactiveOnly: false, compact: false, depth: nil, scope: "homeScreen", raw: false),
options: SnapshotOptions(interactiveOnly: false, depth: nil, scope: "homeScreen", raw: false),
viewport: .infinite,
depth: 0,
parentIndex: nil,
Expand All @@ -245,7 +243,7 @@ extension RunnerTests {
XCTAssertFalse(labels.contains("unrelated sibling"))
}

func testPrivateAXCompactInteractiveFiltersLoginLikeHiddenDrawer() {
func testPrivateAXInteractiveFiltersLoginLikeHiddenDrawer() {
let tree: [String: Any] = [
"type": Int(XCUIElement.ElementType.application.rawValue),
"label": "Blue Sky",
Expand Down Expand Up @@ -313,8 +311,7 @@ extension RunnerTests {
appendPrivateAXNode(
tree,
to: &nodes,
options: SnapshotOptions(
interactiveOnly: true, compact: true, depth: nil, scope: nil, raw: false),
options: SnapshotOptions(interactiveOnly: true, depth: nil, scope: nil, raw: false),
viewport: CGRect(x: 0, y: 0, width: 390, height: 844),
depth: 0,
parentIndex: nil,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -691,7 +691,6 @@ extension RunnerTests {
case .snapshot:
let options = SnapshotOptions(
interactiveOnly: command.interactiveOnly ?? false,
compact: command.compact ?? false,
depth: command.depth,
scope: command.scope,
raw: command.raw ?? false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ struct FlatSnapshotFilterNode {
let identifier: String
let valueText: String?
let visible: Bool
let compactCandidate: Bool

var hasContent: Bool {
return !label.isEmpty || !identifier.isEmpty || valueText != nil
Expand Down Expand Up @@ -46,23 +45,14 @@ extension RunnerTests {
include = false
} else if options.interactiveOnly && !node.visible {
include = false
} else if options.compact {
include = node.hasContent || node.compactCandidate
} else {
include = true
}

return FlatSnapshotFilterDecision(include: include, insideMatchedScope: nowInsideScope)
}

func querySweepFlatCompactCandidate(
elementType: XCUIElement.ElementType,
hittable: Bool
) -> Bool {
return hittable || interactiveTypes.contains(elementType)
}

func privateAXFlatCompactCandidate(rawElementType: Int) -> Bool {
func privateAXInteractiveCandidate(rawElementType: Int) -> Bool {
guard let type = flatSnapshotElementType(rawElementType: rawElementType) else {
return false
}
Expand All @@ -84,55 +74,48 @@ extension RunnerTests {
label: "Welcome back",
identifier: "",
valueText: nil,
visible: true,
compactCandidate: false
visible: true
)
let hiddenInteractive = FlatSnapshotFilterNode(
isRoot: false,
label: "Hidden menu",
identifier: "",
valueText: nil,
visible: false,
compactCandidate: true
visible: false
)
let decorative = FlatSnapshotFilterNode(
isRoot: false,
label: "",
identifier: "",
valueText: nil,
visible: true,
compactCandidate: false
visible: true
)

XCTAssertTrue(
flatSnapshotFilterDecision(
visibleContent,
options: SnapshotOptions(
interactiveOnly: false, compact: true, depth: nil, scope: nil, raw: false),
options: SnapshotOptions(interactiveOnly: false, depth: nil, scope: nil, raw: false),
insideMatchedScope: false
).include
)
XCTAssertFalse(
flatSnapshotFilterDecision(
hiddenInteractive,
options: SnapshotOptions(
interactiveOnly: true, compact: true, depth: nil, scope: nil, raw: false),
options: SnapshotOptions(interactiveOnly: true, depth: nil, scope: nil, raw: false),
insideMatchedScope: false
).include
)
XCTAssertFalse(
XCTAssertTrue(
flatSnapshotFilterDecision(
decorative,
options: SnapshotOptions(
interactiveOnly: false, compact: true, depth: nil, scope: nil, raw: false),
options: SnapshotOptions(interactiveOnly: false, depth: nil, scope: nil, raw: false),
insideMatchedScope: false
).include
)
XCTAssertTrue(
flatSnapshotFilterDecision(
decorative,
options: SnapshotOptions(
interactiveOnly: false, compact: false, depth: nil, scope: nil, raw: false),
options: SnapshotOptions(interactiveOnly: false, depth: nil, scope: nil, raw: false),
insideMatchedScope: false
).include
)
Expand All @@ -144,19 +127,16 @@ extension RunnerTests {
label: "",
identifier: "homeScreen",
valueText: nil,
visible: true,
compactCandidate: false
visible: true
)
let unmatchedDescendant = FlatSnapshotFilterNode(
isRoot: false,
label: "Post body without the scope text",
identifier: "",
valueText: nil,
visible: true,
compactCandidate: false
visible: true
)
let options = SnapshotOptions(
interactiveOnly: false, compact: false, depth: nil, scope: "homeScreen", raw: false)
let options = SnapshotOptions(interactiveOnly: false, depth: nil, scope: "homeScreen", raw: false)

let rootDecision = flatSnapshotFilterDecision(
scopeRoot,
Expand All @@ -182,14 +162,10 @@ extension RunnerTests {
)
}

func testFlatSnapshotCompactCandidatesPreserveBackendInputs() {
XCTAssertFalse(
querySweepFlatCompactCandidate(elementType: .scrollView, hittable: false),
"query sweep should not newly admit contentless scroll containers"
)
func testPrivateAXInteractiveCandidatesPreserveBackendInputs() {
XCTAssertTrue(
privateAXFlatCompactCandidate(rawElementType: Int(XCUIElement.ElementType.scrollView.rawValue)),
"private AX keeps its existing scroll-container compact candidate behavior"
privateAXInteractiveCandidate(rawElementType: Int(XCUIElement.ElementType.scrollView.rawValue)),
"private AX marks scroll containers as interactive candidates"
)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,6 @@ struct Command: Codable {
let fps: Int?
let quality: Int?
let interactiveOnly: Bool?
let compact: Bool?
let depth: Int?
let scope: String?
let raw: Bool?
Expand Down Expand Up @@ -351,7 +350,6 @@ struct SnapshotNode: Codable {

struct SnapshotOptions {
let interactiveOnly: Bool
let compact: Bool
let depth: Int?
let scope: String?
let raw: Bool
Expand Down
Loading
Loading