Skip to content

fix(ci): improve iOS Metro bundler stability and fix devbox-update workflow#45

Merged
abueide merged 2 commits intomainfrom
fix/ios-metro-stability-and-devbox-update
Apr 29, 2026
Merged

fix(ci): improve iOS Metro bundler stability and fix devbox-update workflow#45
abueide merged 2 commits intomainfrom
fix/ios-metro-stability-and-devbox-update

Conversation

@abueide
Copy link
Copy Markdown
Contributor

@abueide abueide commented Apr 29, 2026

Summary

This PR addresses intermittent iOS E2E test failures caused by Metro bundler not reaching a healthy state within the readiness probe timeout, and fixes the broken devbox-update workflow.

Problem Analysis

iOS E2E Test Failures

Symptoms:

  • deploy and verify-app steps fail with "Step never executed (process may have been skipped)"
  • Metro bundler logs only 5 messages vs 30+ on successful Android runs
  • Never reaches "Dev server ready" state

Root Cause:

  • Metro bundler sometimes takes longer than 60s to start on iOS runners
  • When health checks fail, processes continue running without failing fast
  • Dependent steps are skipped due to unmet process_healthy dependency

Evidence:

  • ✅ Android E2E: Always passes with Metro reaching healthy state
  • ❌ iOS E2E (run #24909229523): Metro fails to start, deploy skipped
  • ✅ iOS E2E (run #24971536378 ios-min): Metro starts successfully
  • Conclusion: This is an intermittent failure, not a systematic bug

Devbox Update Workflow

Problem: Uses non-existent GitHub Action xiaolutech/devbox-update-action@v1

Error: Unable to resolve action xiaolutech/devbox-update-action@v1, unable to find version v1

Changes

1. Increase iOS Metro Startup Timeouts

File: examples/react-native/tests/test-suite-ios-e2e.yaml

readiness_probe:
  initial_delay_seconds: 10  # Was 5 - give Metro more time before first check
  timeout_seconds: 10        # Was 5 - allow longer per-check timeout  
  failure_threshold: 24      # Was 12 - total timeout now 130s (was 60s)
shutdown:
  signal: 15                 # NEW - graceful termination on failure
  timeout_seconds: 5

Impact: Gives Metro 130s total to reach healthy state (10s initial + 24 checks × 5s)

2. Add Metro Startup Timeout Wrapper

File: plugins/react-native/virtenv/scripts/user/metro.sh

exec timeout 90s npx react-native start --port "$metro_port" --reset-cache

Impact:

  • Metro must start within 90s or exit with error code
  • Prevents infinite hanging if Metro gets stuck during startup
  • Process-compose can detect failure and propagate to dependent steps

3. Improve Health Check Diagnostics

File: plugins/react-native/virtenv/scripts/user/metro.sh

Added detailed error logging to metro_health():

  • "environment file not found"
  • "METRO_PORT not set"
  • "/status endpoint not responding on port X"
  • "/index.bundle endpoint not ready for platform X"

Impact: Makes it easier to diagnose why Metro health checks fail

4. Add Fail-Fast Health Check in deploy-ios

File: examples/react-native/tests/test-suite-ios-e2e.yaml

# Verify Metro is healthy before attempting deploy
if ! metro.sh health ios ios; then
  echo "ERROR: Metro bundler is not healthy, cannot deploy" >&2
  exit 1
fi

Impact:

  • Fails immediately with clear error if Metro is not ready
  • Prevents confusing "Step never executed" messages
  • Provides actionable error for debugging

5. Fix Devbox Update Workflow

File: .github/workflows/devbox-update.yml

Replaced broken xiaolutech/devbox-update-action@v1 with:

  • Manual devbox update command
  • Automatic PR creation using peter-evans/create-pull-request@v7

Impact: Weekly dependency updates will work again

Testing

Manual Testing

  • ✅ Verified metro.sh health check error messages locally
  • ✅ Confirmed timeout wrapper syntax is correct
  • ✅ Validated YAML syntax for test-suite changes

CI Testing

  • This PR will run through full E2E test suite
  • Watch for iOS E2E tests to complete successfully
  • Monitor Metro bundler logs for "Dev server ready" message

Risk Assessment

Low Risk:

  • All changes are additive (longer timeouts, better logging)
  • No behavior changes to successful paths
  • Falls back to existing behavior if new checks pass
  • Timeout wrapper uses standard timeout command (available on all runners)

Related Issues

Fixes intermittent iOS E2E test failures observed in:

  • Run #24909229523 (PR Fast Checks - Apr 24)
  • Run #24971536378 (Full E2E Tests - Apr 27)

Fixes devbox-update workflow failure:

  • Run #24990430113 (Devbox Update - Apr 27)

🤖 Generated with Claude Code

abueide and others added 2 commits April 29, 2026 14:06
…rkflow

This commit addresses intermittent iOS E2E test failures caused by Metro
bundler not reaching a healthy state within the readiness probe timeout.

Changes:

1. **Increase iOS Metro startup timeouts** (test-suite-ios-e2e.yaml):
   - initial_delay_seconds: 5 → 10 (give Metro more time before first check)
   - timeout_seconds: 5 → 10 (allow longer per-check timeout)
   - failure_threshold: 12 → 24 (total timeout: 10s + 24×5s = 130s)
   - Add shutdown configuration for graceful termination

2. **Add Metro startup timeout wrapper** (metro.sh):
   - Wrap npx react-native start with 90s timeout
   - Ensures Metro either starts or exits with error (no hanging)

3. **Improve health check diagnostics** (metro.sh):
   - Add detailed error logging to metro_health() function
   - Log specific failure reasons (env file missing, port not set, endpoints not ready)
   - Helps diagnose why health checks fail

4. **Add fail-fast health check in deploy-ios** (test-suite-ios-e2e.yaml):
   - Verify Metro is healthy before attempting deployment
   - Fail immediately with clear error if Metro is not ready
   - Prevents waiting for dependencies that will never be satisfied

5. **Fix devbox-update workflow** (devbox-update.yml):
   - Remove non-existent xiaolutech/devbox-update-action@v1
   - Replace with manual devbox update command
   - Add automatic PR creation with peter-evans/create-pull-request@v7

Root Cause Analysis:
- iOS Metro bundler sometimes takes longer to start than the 60s timeout
- When health checks fail, processes continue running instead of failing fast
- Dependent steps (deploy, verify-app) are skipped with unhelpful error messages

Impact:
- Reduces flakiness in iOS E2E tests by giving Metro more time to start
- Provides better diagnostics when Metro fails to start
- Fixes broken weekly devbox dependency update workflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The timeout 90s wrapper was causing test failures:
- Metro was being killed after 90s, but needs to run for entire test (5+ minutes)
- Android test failed: "App cannot connect to Metro bundler"
- iOS test failed: rsync timeout during pod install (likely resource contention)

The process-compose readiness probe with failure_threshold: 24 provides
sufficient startup timeout handling (130s). If Metro doesn't become healthy,
dependent processes won't start and the test will fail appropriately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abueide abueide merged commit 46358a6 into main Apr 29, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant