Juaravibecoding is an elite, high-performance local AI gateway and proxy application built for Android. It transforms your mobile hardware into a powerful local and cloud-orchestrated AI server by translating standard OpenAI-compatible API requests into secure proxy calls to Google's Google AI Studio (Gemini) cloud endpoints, alongside offering high-performance, real-time on-device LiteRT (TensorFlow Lite / MediaPipe) local inference execution with automated simulation fallbacks.
Through this application, developers can seamlessly route completions queries from development tools (such as AI code editors, agent frameworks, or Python automation scripts) directly to LLMs running either locally on-device or bridged securely to the cloud on their Android phones.
- Product Requirement Document (PRD)
- Architectural Blueprint
- Step-by-Step User Guide
- Pushing to GitHub & Developer Notes
The target of Juaravibecoding is to bridge heavy cloud-dependent developer environments with local mobile-centric intelligence. Its primary goal is to expose an OpenAI-compatible API server running directly on an Android device's local network loopback (localhost:8080) or local area network (LAN wildcard 0.0.0.0), orchestrating both cloud-bound and local completions pipelines. Samsung Galaxy S-series and modern Android devices are particularly optimized for local model execution.
- Robust TCP Socket Engine: Spawns an embedded HTTP server directly within Android background threads, handling standard CORS preflights, Keep-Alive socket pooling, authentication token validation, and multi-tenant connection dispatching.
- On-Device LiteRT-LM Core: Full local integration with the modern LiteRT (MediaPipe GenAI) Android API, executing raw
.bin/.litertlmlocal models directly on physical CPU, GPU, or hardware NPU backends. - System Memory Profiler: Automatically profiles device RAM capability on startup, alerting of memory limits and suggesting custom models matching hardware specs (e.g. recommends Gemma 3 1b for lower-spec <3GB RAM devices; Gemma 4 E2B for heavier loads).
- Synchronized Cloud Fallback & Simulations: Includes automatic, seamless failover logic. If local model weights are missing or incomplete, the application utilizes optimized contextual simulations to let developers preview pipeline latency on mobile threads.
- Audit Trails & Tracing: Every transaction registers in a localized Room database, logging HTTP status, requested models, payload bytes, tokens count, latency in milliseconds, and potential network or model parsing errors.
- Cosmic-Infused Material Design 3 UI: Provides a visually stunning dark-slate and neon-teal dashboard containing real-time gateway status indicators, capability charts, detailed logs, and a central setting gear panel.
Stores global server parameters, cryptographic key overrides, and active backend routing instructions.
| SQLite Column Name | Kotlin Class Variable Type | Description / Constraints |
|---|---|---|
id |
Int (Primary Key, Default: 1) |
Enforces a Single-Row constraint for global system configurations. |
port |
Int (Default: 8080) |
The listening socket address for incoming HTTP requests. |
geminiApiKey |
String (Default: "") |
Encrypted override API key for Google AI Studio cloud requests. |
enableNpuBackend |
Boolean (Default: false) |
Toggles the use of Neural processing specialized SDKs during execution. |
bypassGpu |
Boolean (Default: true) |
Standard GPU driver fallback bypass to enforce robust CPU instructions on complex architectures. |
gatewayAuthToken |
String (Default: "") |
Encrypted, secure bearer token used to authenticate local API gateway requests. |
preferredBackend |
String (Default: "AUTO") |
Specifies the routing preference for local inference engines (e.g., AUTO, CPU, GPU, NPU). |
exposeToLan |
Boolean (Default: false) |
Toggles socket binding between loopback-only (127.0.0.1) and LAN interface wildcard (0.0.0.0). |
Tracks transactional requests made to the server, assisting debugging, analysis, and latency/token tracing.
| SQLite Column Name | Kotlin Class Variable Type | Description / Constraints |
|---|---|---|
id |
Long (Primary Key, Auto-Generate) |
Unique sequential ID assigned to every network transition. |
timestamp |
Long (Default: Unix epoch MS) |
Precise time indicating when the incoming socket connection was opened. |
method |
String? |
Http Method of the request (GET, POST, OPTIONS). |
endpoint |
String? |
Request path requested (e.g., /v1/chat/completions). |
requestSnippet |
String? (Default: null) |
Visual trim of the JSON request payload containing inputs and system configurations. |
responseSnippet |
String? (Default: null) |
Visual trim of the generated completions response payload. |
statusCode |
Int |
Handled HTTP response status (e.g., 200 OK, 401 Unauthorized, 500 Error). |
latencyMs |
Long |
Complete handling duration from socket read to socket stream flush. |
modelUsed |
String? |
System identifier of the model routed for the completions task. |
errorMessage |
String? (Default: null) |
Stores detailed exception messages or routing errors where applicable. |
tokensCount |
Int (Default: 0) |
Evaluated token footprint for request and response context. |
Maintains individual model weight initialization, range-resume offsets, and checksum status verification.
| SQLite Column Name | Kotlin Class Variable Type | Description / Constraints |
|---|---|---|
modelId |
String (Primary Key) |
Standard slug matching the target model in ModelsRegistry. |
progress |
Int (Default: 0) |
Current percentage download progression (0 to 100). |
status |
String (Default: "NOT_STARTED") |
Active lifecycle status (NOT_STARTED, DOWNLOADING, PAUSED, VERIFYING, COMPLETED, FAILED). |
downloadedBytes |
Long (Default: 0) |
Number of bytes correctly retrieved and stored on disk. |
totalBytes |
Long (Default: 0) |
Expected file payload footprint derived from the download headers. |
errorMessage |
String? (Default: null) |
Captured string of download or checksum error exceptions. |
The codebase isolates platform elements, UI state models, network routers, and raw on-device execution kernels:
+---------------------------------------+
| MainActivity |
+-------------------+-------------------+
|
v Uses (Triggers / Observes)
+-------------------+-------------------+
| GatewayViewModel |
+---------+-------------------+---------+
| |
Observes Flow | | Launches / Binds
v v
+-----------------------+ +---------+---------+ +-----+-----------------+
| GatewayRepository |<--------------+ GatewayScreen | | GatewayForeground |
+-----------+-----------+ +-------------------+ | Service |
| +-----------+-----------+
| Reads/Writes | Runs / Manages
v v
+-----------+-----------+ +-----------+-----------+
| AppDatabase | | ProxyServerManager |
| - proxy_settings | +-----------+-----------+
| - gateway_logs | |
| - model_down_states | v Start / Stop
+-----------------------+ +-----------+-----------+
| HttpGatewayServer |
+-----------+-----------+
|
v Receives HTTP Requests
| ModelRouter |
+-----+-----------+-----+
| |
Local Inference| | Cloud Proxy
v v
+-----------------+---+ +---+-------------------+
| LiteRtLmEngine | | GeminiCloudClient |
+-----------+---------+ +-----------------------+
|
v Uses (Initialize & Execute Stream)
+-----------+---------+
| LiteRtLmRepository |
+-----------+---------+
|
v Instantiates
+-----------+---------+
|LiteRtLmEngineWrapper|
+---------------------+
MainActivity.kt- Used For: Entry-point of the Android application. Sets up dynamic window edge-to-edge configurations and displays the core Compose viewport.
- Used By: Local Android OS launcher.
GatewayScreen.kt- Used For: Renders the complete futuristic black dashboard, real-time telemetry metrics, log details, and a memory profile optimization card that analyzes on-board memory.
- Used By: Main layout rendering pipeline within
MainActivity.kt.
GatewayViewModel.kt- Used For: Exposes thread-safe data vectors using read-only
StateFlowtypes, letting the UI invoke repository instructions asynchronously. - Used By: UI components in
GatewayScreen.ktto display metrics, logs, and settings overlays.
- Used For: Exposes thread-safe data vectors using read-only
LiteRtLmRepository.kt- Used For: Thread-safe manager for device-level models. Handles lazy-loaded on-device engine creations under
Dispatchers.IO, and wraps the active generation pipeline into functional cold Kotlin flows. - Used By: Coordinates on-device execution routes called from
LiteRtLmEngine.kt.
- Used For: Thread-safe manager for device-level models. Handles lazy-loaded on-device engine creations under
LiteRtLmEngineWrapper.kt- Used For: Wraps the underlying Google MediaPipe Tasks LLM inference instances. Implements Kotlin's
AutoCloseableinterface to ensure native execution memory arrays are safely cleaned up and released immediately upon execution completion or failure. - Used By: Created dynamically within
LiteRtLmRepository.kt.
- Used For: Wraps the underlying Google MediaPipe Tasks LLM inference instances. Implements Kotlin's
LiteRtLmEngine.kt- Used For: Primary local router. Attempts real LiteRT on-device inference via
LiteRtLmRepository. If weight files are incomplete or the device lacks setup files, it seamlessly routes completions dynamically through an optimized local simulation layer. - Used By:
ModelRouter.kt.
- Used For: Primary local router. Attempts real LiteRT on-device inference via
ModelDownloadManager.kt- Used For: Manages the download of local model parameters. Exposes
getDeviceRamGbto verify hardware RAM boundaries, andgetRecommendedModelto suggest the best model profiles. - Used By: Renders configuration guidelines inside
GatewayScreen.kt.
- Used For: Manages the download of local model parameters. Exposes
GatewayForegroundService.kt- Used For: Keeps the socket backend persistent using Android foreground states protected from reclamation by the OS.
- Used By: Launched via ViewModel user button binds.
ProxyServerManager.kt- Used For: Initializes and stops the raw socket parameters and routes server activities to background threads.
- Used By: Bound inside
GatewayForegroundService.kt.
HttpGatewayServer.kt- Used For: Spawns an internal HTTP background socket listener, intercepting inbound TCP calls and managing CORS preflights, Keep-Alive frames, and routing flows.
- Used By:
ProxyServerManager.kt.
ModelRouter.kt- Used For: Maps incoming target models to their correct handlers, deciding between cloud proxy endpoints or local offline engines.
- Used By: Called by
HttpGatewayServer.ktduring endpoint routing.
OpenAiToGeminiTranslator.kt- Used For: Dynamic bidirectional parameters mapping. Maps incoming standard OpenAI payload keys into appropriate Google Gemini attributes, translating downstream streams back to format-compatible chunks.
- Used By:
HttpGatewayServer.kt.
- Model-View-ViewModel (MVVM) Design Pattern: Prevents UI coupling.
GatewayScreenholds zero business logic, interacting with the system solely via decoupled state parameters exposed asStateFlownodes. - Resource Scoping (RAII / Auto-Closeable Pattern): Leverages Kotlin’s
.use {}extensions onLiteRtLmEngineWrapper. Because LLM execution weights consume up to 2.5 GB of RAM, native memory handles are automatically released upon task finishing, eliminating native memory leak risks. - Thread Concurrency Isolation Pattern: Delegates heavy background initialization vectors strictly to
Dispatchers.IO. This prevents slow engine spin-ups from blocking the Main UI Thread, avoiding Android Application Not Responding (ANR) warnings. - Strategy Routing Pattern: Maps generic endpoints to dynamic execution paths inside
ModelRouter. The server redirects queries dynamically to cloud pipelines, real LiteRT hardware, or robust simulators without client-side restructuring. - Gateway Singleton Pattern: Central engines, database layers, and repositories are structured as unique Singleton patterns, preserving thread integrity across isolated socket threads.
Ready to get your localized AI gateway server running? Follow these steps to set up, profile, download, and test your gateway.
- Launch the Juaravibecoding application on your Android device.
- Ensure you are on the Dashboard panel.
- Tap the turquoise neon Launch Proxy Server button.
- The system status indicator immediately transitions to a pulsing neon circle labeled GATEWAY ACTIVE.
- The local access addresses are displayed under the status display:
- Chat completions:
http://localhost:8080/v1/chat/completions - Catalog models list:
http://localhost:8080/v1/models
- Chat completions:
- Look at the top of the Dashboard or Models & Logs page.
- Locate the neon-framed System Capability Profiler card.
- The app dynamically scans the system memory limits on startup. It displays your available RAM (e.g.,
5.64 GB RAMor7.41 GB RAM). - Read the recommendation:
- RAM under 3.0 GB: Recommended Model is Gemma3-1B-IT (efficient footprint).
- RAM above 3.0 GB: Recommended Model is Gemma 4 E2B (optimal intelligence).
- Navigate to the Models & Logs tab at the top of the console.
- Underneath the header, ensure the Models Library subtab is selected.
- Review the model registry list:
- Gemma3-1B-IT: Highly lightweight fallback model.
- Gemma 4 E2B: Maximum intelligence local model.
- Tap the Install weights button of your chosen model.
- Watch the download indicators update. Once download has successfully completed, the model entry shows a checkmark.
- Want to free memory? Tap the Clear Cache button on any installed model to remove weights safely.
- On the top-right corner of the application bar, tap the Settings Gear Icon.
- An overlay dialog reading Google AI Studio API Secrets will slide onto the screen.
- Paste your custom Google AI Studio Gemini API Key into the input box.
- Tap the Apply Override button. A notification bar verifies your settings have been encrypted and saved securely locally.
To route requests from development tools (like VS Code, Cursor, Python CLI scripts, or terminals) to your Android device, you need to route ports using Android Debug Bridge (ADB).
- Install Android platform-tools containing ADB on your workstation.
- Enable USB Debugging on your phone (found in Settings > Developer Options).
- Connect the phone to your computer with a high-speed USB cable.
Run the following instructions in your computer's terminal:
# Confirm the laptop correctly detects your authorized Android phone
adb devices
# Forward all outgoing laptop port 8080 requests directly to port 8080 on the Android device
adb reverse tcp:8080 tcp:8080Once reverse forwarding is active, any request sent to http://localhost:8080 on your laptop is securely bridged to your Android phone gateway!
You can send requests via terminal on your computer using cURL to query your on-device engine:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer test-token" \
-d '{
"model": "litert-community/gemma-4-E2B-it-litert-lm",
"messages": [
{
"role": "user",
"content": "Explain why gravity is a consequence of space-time curvature."
}
],
"temperature": 0.3
}'Create a Python file named local_ai_test.py with the following:
import openai
# Bind OpenAI client directly to the ADB-forwarded port
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="your_android_local_bearer_token"
)
try:
print("Dispatching request to local Android LiteRT/Gemini Gateway...")
response = client.chat.completions.create(
model="gemini-3.5-flash", # Reroutes automatically to Gemini
messages=[
{"role": "user", "content": "Tell me a highly unique programming pun."}
],
temperature=0.7
)
print("\n[Android Gateway Server Response]:")
print(response.choices[0].message.content)
except Exception as e:
print(f"Failed to query local Android Gateway server: {e}")- Switch to the Audit Logs tab under Models & Logs inside the app.
- You will instantly see your query listed in the telemetry stream!
- Tap on the log entry card to slide up a detail viewer. You can inspect exact payload snippets, latency timings, and status codes.
To sync this project with your personal GitHub account, use the platform tools inside the Google AI Studio Build environment:
- Tap the Settings/Project Menu in the top-right corner of the development interface.
- Select Push to GitHub from the dropdown option list.
- Authenticate with your GitHub account when prompted to push the entire repository structure.
- Alternatively, select Export as ZIP to download the Android Studio source archive directly onto your workstation for custom localized extensions.
The application uses secure SQLite encryption to store your custom API keys. However, never distribute production APK release files containing hardcoded key values. Always prefer system environment injection or explicit runtime user configuration screens.