Give your AI agent hands. Remote Android device control for hermes-agent.
Phone (home WiFi) ──WebSocket──> Hermes Server (cloud) <──HTTP── AI Agent
relay on port 8766
The phone connects out to your Hermes server — works behind any NAT, no port forwarding, no VPN, no USB. Just a 6-character pairing code.
This repo contains two components:
| Component | Path | Language | Purpose |
|---|---|---|---|
| Android bridge app | hermes-android-bridge/ |
Kotlin | Runs on the phone. Connects to server via WebSocket, executes commands via AccessibilityService |
| Python toolset | tools/, tests/ |
Python | Runs on the server. 36 android_* tools + WebSocket relay. Also lives in hermes-agent as the production copy |
Note: The Python code exists here for standalone development and testing (
pip install -e .,pytest). The production copy is in the hermes-agent repo. The Android app does not use or depend on the Python files.
curl -sSL https://raw.githubusercontent.com/raulvidis/hermes-android/main/install.sh | bashOr manually:
mkdir -p ~/.hermes/plugins
cp -r hermes-android-plugin ~/.hermes/plugins/hermes-androidRestart hermes — run /plugins to verify. Should show: ✓ hermes-android v0.3.0 (38 tools)
Option A — Download the prebuilt APK (easiest). Every push to main automatically
publishes a fresh debug APK to the Latest Build
release. Download the hermes-android-<version>.apk asset from there and install it — either
by opening the file on the device (enable "Install unknown apps" for your browser/file
manager when prompted) or via adb install hermes-android-*.apk.
Option B — Build from source:
cd hermes-android-bridge
./gradlew assembleDebug
adb install app/build/outputs/apk/debug/hermes-android-*.apkThe APK is an unsigned debug build, which is why Android/Play Protect may warn on install — it is not yet published on the Play Store or F-Droid.
- Open Hermes Bridge app
- Tap Enable Accessibility Service → find Hermes Bridge → toggle ON
- Tap Enable Status Overlay → grant permission
- Tap Grant Screen Recording → approve the system dialog (needed for
android_screen_record) - Grant additional runtime permissions in Settings > Apps > Hermes Bridge > Permissions:
- Location — for
android_location - Contacts — for
android_search_contacts - SMS — for
android_send_sms - Phone — for
android_call(direct dialing)
- Location — for
- Enable Notification Listener in Settings > Apps > Special app access > Notification access → enable Hermes Bridge (for
android_notifications/android_events)
Tell hermes (via Telegram, Discord, etc):
Connect to my phone, code is <CODE>
Where <CODE> is the 6-character pairing code shown in the app.
Hermes will reply with the server address. Enter it in the app and tap Connect.
The agent can now control your phone. Try: "open Instagram", "take a screenshot", "what apps do I have?"
The bridge app can run on Android Automotive OS (AAOS) car head units. Phone-specific features (SMS, calls, contacts) gracefully return errors when the hardware is unavailable.
- Get the APK: download
hermes-android-<version>.apkfrom the Latest Build release, or build it withcd hermes-android-bridge && ./gradlew assembleDebug - Sideload via ADB:
adb install hermes-android-*.apk- USB: connect directly to the head unit's USB port
- WiFi:
adb connect <head-unit-ip>:5555then install
- Grant Accessibility Service: Settings > Accessibility > Hermes Bridge > Enable
- Grant overlay permission: Settings > Apps > Special access > Draw over apps > Hermes Bridge
- Skip phone-specific permissions (SMS, calls, contacts) — not applicable
The car head unit needs network access to reach the relay server:
- USB tethering (recommended):
adb forward tcp:8766 tcp:8766, then enterhttp://localhost:8766in the app - WiFi: enter the relay server's
http://<ip>:8766in the app (both devices on same network)
| Tool | Status on AAOS |
|---|---|
android_send_sms |
Not available — returns error |
android_call |
Not available — returns error |
android_search_contacts |
Not available — returns error |
android_location |
Depends on car unit GPS configuration |
android_screen_record |
May behave differently (restricted MediaProjection on some OEMs) |
All other tools (tap, swipe, type, screenshot, read screen, open apps, etc.) work normally.
| Tool | Description |
|---|---|
android_setup |
Start relay and configure pairing code |
android_ping |
Check if phone is connected |
android_read_screen |
Get accessibility tree of current screen (System UI excluded by default; include_system_ui=true to include) |
android_screenshot |
Capture screenshot and send to user |
android_tap |
Tap by coordinates or node ID |
android_tap_text |
Tap element by visible text |
android_type |
Type into focused input field |
android_swipe |
Swipe up/down/left/right |
android_scroll |
Scroll screen or element |
android_open_app |
Launch app by package name |
android_press_key |
Press back, home, recents, etc. |
android_wait |
Wait for element to appear |
android_get_apps |
List installed apps |
android_current_app |
Get foreground app info |
android_long_press |
Long press by coordinates or node ID |
android_drag |
Drag from one point to another |
android_pinch |
Pinch zoom in/out at coordinates |
android_find_nodes |
Search accessibility nodes by text/class/clickable |
android_describe_node |
Get detailed info about a specific node |
android_screen_hash |
Get hash of current screen for change detection |
android_diff_screen |
Compare current screen against a previous hash |
android_location |
Get phone GPS location |
android_search_contacts |
Search contacts by name |
android_send_sms |
Send SMS to a phone number |
android_call |
Make a phone call or open dialer |
android_media |
Control media playback (play, pause, next, previous) |
android_send_intent |
Send an Android intent |
android_broadcast |
Send a broadcast intent |
android_clipboard_read |
Read clipboard contents |
android_clipboard_write |
Write text to clipboard |
android_notifications |
Read current notifications |
android_events |
Read recent accessibility events |
android_event_stream |
Stream accessibility events in real-time |
android_screen_record |
Record screen video for a duration |
android_read_widgets |
Read home screen widgets |
android_speak |
Text-to-speech output |
android_speak_stop |
Stop text-to-speech |
| Permission | How to Grant | Required For |
|---|---|---|
| Accessibility Service | App button → Settings > Accessibility | All tools (core requirement) |
| System Alert Window (Overlay) | App button → Settings > Draw over apps | Status overlay display |
| Screen Recording (MediaProjection) | App button → approve system dialog | android_screen_record |
| Location | Settings > Apps > Permissions > Location | android_location |
| Read Contacts | Settings > Apps > Permissions > Contacts | android_search_contacts |
| Send SMS | Settings > Apps > Permissions > SMS | android_send_sms |
| Call Phone | Settings > Apps > Permissions > Phone | android_call (auto-dial) |
| Notification Listener | Settings > Special app access > Notification access | android_notifications, android_events |
Android app (Kotlin):
- AccessibilityService reads the UI tree and performs taps/types/swipes
- WebSocket client (OkHttp) connects out to the Hermes server
- Ktor HTTP server for local/USB development
- Pairing code authentication
- Screenshot capture via AccessibilityService API
- Terminal-themed UI
Server (Python):
- WebSocket + HTTP relay (aiohttp) on port 8766
- Tools register into hermes-agent's tool registry
- Rate-limited authentication (5 attempts / 60s, then 5min block)
- Auto-detects server public IP for setup instructions
See SECURITY.md for details. Key points:
- Pairing code authentication with rate limiting
- Phone connects out (never directly exposed)
- Currently unencrypted (
ws://) — use TLS proxy for production - Full device access once paired — only connect to trusted servers
# Python tests
pip install -e ".[dev]"
python -m pytest tests/
# Android build
cd hermes-android-bridge
./gradlew assembleDebugThis is a working prototype. The vision: give Hermes its own phone — a fully autonomous mobile presence.
- TLS/WSS support for encrypted phone-server communication
- Persistent relay service (systemd unit, auto-start with gateway)
- Server-side call counter to prevent tool call loops
- Better error reporting (screenshot + annotated explanation on failure)
- Auto-reconnect relay on gateway restart
- Notification listener — agent reads incoming notifications in real-time
- Clipboard bridge — copy/paste between server and phone
- File transfer — send files/photos between phone and server
- Direct SMS/calls — send texts and make calls without navigating the UI
- Location sharing — agent knows where the phone is for contextual tasks
- Multiple phones — connect more than one device to the same relay
- Scheduled automations — "every morning, check my commute price on Bolt"
- Event triggers — "when a notification arrives from this app, do X"
- Macro recording — watch a workflow once, replay it on demand
- Phone call capability — agent can answer and speak in phone calls using TTS/STT
- Voice assistant mode — always-listening on the phone, responds via speaker
- Call handling — "answer my phone, take a message, tell them I'll call back"
- Local model execution — run small models (Qwen 0.5B, Gemma 2B) directly on the phone
- Offline fallback — basic commands work without server connection using on-device model
- Hybrid routing — simple tasks run locally, complex tasks go to the server
- On-device app adapters — fast structured parsing without round-tripping to server
- iOS support via Shortcuts/accessibility bridge
- Web dashboard for monitoring phone activity
- Cross-app workflows ("find a restaurant on Maps, share on WhatsApp, book an Uber there")
- Dedicated "Hermes Phone" — a phone that boots straight into agent mode
- hermes-agent: github.com/NousResearch/hermes-agent