Voice-first, offline-capable form assistant that helps Indian citizens understand and fill government forms in 8 languages — powered by document-grounded AI, AWS cloud services, and accessibility-first design.
- Problem Statement
- Solution
- Key Features
- Architecture Overview
- User Flow
- AWS Service Integration
- Tech Stack
- Project Structure
- Forms System
- AI Safety & Grounding
- Offline & Sync Architecture
- Accessibility
- API Reference
- How to Run
- Environment Variables
- Limitations & Disclaimer
Government and public-service forms in India are difficult for millions of citizens due to:
| Barrier | Who it affects |
|---|---|
| Complex legal language on forms | Low-literacy and first-time users |
| English-only interfaces | 900M+ non-English speakers |
| Small text & dense layouts | Elderly citizens, vision-impaired users |
| No offline access | Rural users with intermittent connectivity |
| AI tools that hallucinate | Users who trust incorrect information |
| Middlemen charging fees | Economically vulnerable populations |
These barriers disproportionately affect rural users, elderly citizens, differently-abled individuals, and first-time digital users — the very populations these government schemes are designed to help.
SarthiAI is a voice-first Progressive Web App (PWA) that walks users through government forms field-by-field using speech. An AI assistant explains each field in simple, regional language — grounded entirely in verified documents so it never fabricates information.
Users dictate answers, hear them read back, scan documents to auto-fill fields, and review everything before confirming. The app works fully offline and syncs when connectivity returns. When deployed with AWS, it gains cloud-grade OCR, speech services, translation, and scalable storage.
- Speech-to-Text — dictate answers via Web Speech Recognition API or AWS Transcribe. Supports continuous and single-utterance modes with real-time interim transcripts.
- Text-to-Speech — the app speaks field labels, help text, and AI responses via Web Speech Synthesis API or Amazon Polly (natural neural voices).
- Voice states — clear animated UI states: Idle → Listening (ripple) → Thinking (spinner) → Speaking (pulse) → Error.
Full UI and form-content translations in English, Hindi (हिन्दी), Marathi (मराठी), Tamil (தமிழ்), Telugu (తెలుగు), Bengali (বাংলা), Gujarati (ગુજરાતી), Kannada (ಕನ್ನಡ).
- Auto-detects browser language on first visit.
- Every field label, placeholder, help text, option, and validation message is translated.
- Locale-aware date/number formatting (Indian numbering: lakhs/crores).
- Dynamic translation powered by Amazon Translate when AWS is enabled.
Forms are defined once as typed TypeScript schemas and rendered dynamically:
| Form | Description | Fields |
|---|---|---|
| Old Age Pension (IGNOAPS) | Monthly pension for senior citizens | 14 |
| Housing Subsidy (PMAY) | Housing for All scheme assistance | 17 |
| Ration Card (NFSA) | Subsidised food grains entitlement | 15+ |
| Caste Certificate | SC/ST/OBC category proof | 12+ |
| Income Certificate | Family income verification | 12+ |
| PM-KISAN | ₹6,000/year farmer income support | 14+ |
Each form includes eligibility criteria, required documents list, purpose description, and estimated completion time. New forms are added by creating a definition file and registering it — zero UI code changes needed.
Supported field types: text, date, select, tel, textarea, number, checkbox, radio — with validation rules (required, min/max length, regex pattern, cross-field validation), conditional visibility, and field grouping.
- Client-side: Tesseract.js for offline OCR
- Server-side: Amazon Textract for high-accuracy extraction
- Auto-detects and extracts: Aadhaar numbers, PAN, dates of birth, phone numbers, IFSC codes, gender, names, PIN codes
- Scanned data auto-fills matching form fields
- RAG architecture — 7+ embedded knowledge documents covering pension, housing, ration card, caste certificate, income certificate, and PM-KISAN schemes.
- Strict grounding — 9-rule system prompt prevents hallucination. Every response tagged as Verified or Unverified.
- Multi-provider support — Ollama (local), LM Studio (local), or Amazon Bedrock (cloud). Supports Claude, Llama, and Titan model families.
- Streaming responses — tokens appear in real-time via SSE.
- Context-aware — knows which field the user is filling and provides targeted help.
- Full PWA with Service Worker caching (Workbox).
- IndexedDB local storage (sessions, submissions, sync queue).
- Incremental sync queue with exponential backoff.
- Bulk sync endpoint for efficient reconciliation on slow networks.
- Slow-connection detection (2G/slow-2G) with user-facing banners.
- Installable to mobile home screens.
- Client-side PDF generation via jsPDF (works offline).
- Includes tracking ID, submission date, all field values, and disclaimer footer.
┌─────────────────────────────────────────────────────────────────────┐
│ Browser (PWA) │
│ │
│ React 19 + Tailwind CSS v4 + Framer Motion │
│ ┌────────────┐ ┌────────────┐ ┌───────────────────────────┐ │
│ │ Voice Layer│ │ Form │ │ Offline Layer │ │
│ │ STT / TTS │ │ Context │ │ IndexedDB + SyncQueue │ │
│ │ (Browser │ │ (Single │ │ ┌─────────┐ ┌──────────┐ │ │
│ │ APIs) │ │ React │ │ │sessions │ │syncQueue │ │ │
│ └─────┬──────┘ │ Context) │ │ └─────────┘ └──────────┘ │ │
│ │ └─────┬──────┘ └────────────┬──────────────┘ │
│ │ │ │ │
│ ┌─────┴───────────────┴──────────────────────┴────────┐ │
│ │ offlineApi wrapper │ │
│ │ (online → API + cache | offline → IndexedDB) │ │
│ └─────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────┴────────────────────────────────┐ │
│ │ Service Worker (Workbox) │ │
│ │ Static: CacheFirst | API: NetworkFirst | Chat: NetworkOnly │
│ └─────────────────────┬────────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────────────┘
│ /api/*
┌───────────▼───────────┐
│ Express Server │ (port 3001)
│ ┌───────────────┐ │
│ │ DB Provider │──┼──▶ SQLite (local) OR DynamoDB (AWS)
│ └───────────────┘ │
│ ┌───────────────┐ │
│ │ AI Layer │──┼──▶ Ollama / LM Studio (local) OR Bedrock (AWS)
│ │ + Grounding │ │
│ └───────────────┘ │
│ ┌───────────────┐ │
│ │ AWS Services │──┼──▶ S3 · Textract · Polly · Transcribe · Translate
│ └───────────────┘ │
└───────────────────────┘
| Decision | Rationale |
|---|---|
| State-driven routing (no router library) | formPhase × activeTab keeps navigation simple. No URL complexity for target users. |
| Single React Context for all state | Avoids external state management complexity. Single source of truth. |
| Offline-first API wrapper | Every call goes through offlineApi which saves to IndexedDB first, then syncs. |
| Local-first + AWS upgrade path | Runs fully offline with local tools. AWS services layer on when USE_AWS=true. |
| Schema-driven form rendering | Adding new forms requires zero UI changes — just a schema file + registration. |
| Document-grounded AI only | 9-rule system prompt ensures no hallucination. Refuses when unsure. |
┌──────────────┐ ┌───────────────┐ ┌─────────────────┐
│ 1. Welcome │────▶│ 2. Language │────▶│ 3. Home │
│ (First Visit)│ │ Selection │ │ (Voice CTA) │
└──────────────┘ └───────────────┘ └────────┬────────┘
│
┌──────────────────────────┤
▼ ▼
┌─────────────┐ ┌──────────────┐
│ 4a. Browse │ │ 4b. Speak to │
│ Forms Tab │ │ AI on Home │
└──────┬──────┘ └──────────────┘
│
▼
┌──────────────┐
│ 5. Form Info │
│ (Purpose, │
│ Eligibility,│
│ Documents) │
└──────┬───────┘
│ "Start Filling"
▼
┌──────────────┐ ┌──────────────┐
│ 6. Field-by- │────▶│ 7. Scan │
│ Field Input │◀────│ Document │
│ (Voice/Type) │ │ (OCR) │
│ │ └──────────────┘
│ ┌────────┐ │
│ │AI Help │ │ ← Floating AI chat panel
│ │ Panel │ │ (grounded answers)
│ └────────┘ │
└──────┬───────┘
│ All fields complete
▼
┌──────────────┐
│ 8. Review │
│ All Answers │
│ (Edit any) │
└──────┬───────┘
│ "Submit"
▼
┌──────────────┐ ┌──────────────┐
│ 9. Submitted │────▶│ 10. Download │
│ Confirmation │ │ PDF Summary │
│ (Tracking ID)│ └──────────────┘
└──────────────┘
Step-by-step breakdown:
- Welcome Modal — First-time visitors see a language picker. Auto-detects browser language as default.
- Language Selection — Choose from 8 languages. All UI, form content, and AI responses switch instantly.
- Home Screen — Large microphone button (192px) for voice-first interaction. Users can speak questions or navigate to forms.
- Browse Forms — Grid of available government forms with icons, descriptions, and estimated completion times.
- Form Info — Before starting, users see: purpose of the form, eligibility criteria, required documents, and estimated time.
- Field-by-Field Input — One field at a time with large text, help tooltips, voice dictation, and progress indicators (dots + percentage + animated bar).
- Document Scanning — Camera-based OCR scans Aadhaar, PAN, etc. and auto-fills fields. Uses Textract (AWS) or Tesseract.js (offline).
- Review — All answers displayed for final verification. Users can tap any field to edit it.
- Submission — Confirmation screen with a unique tracking ID. Session marked as completed.
- PDF Download — Client-side PDF generated with all form data, tracking ID, and official disclaimer.
Auto-save: Progress is saved automatically when switching tabs or navigating away. Users can resume from the Activity page at any time.
SarthiAI is designed with a dual-mode architecture — it runs fully offline using local tools (SQLite, Ollama, Tesseract.js, Web Speech APIs), and gains cloud-grade capabilities when AWS is enabled via a single environment variable:
USE_AWS=true → All 7 AWS services activate (with automatic local fallbacks on error)
USE_AWS=false → Fully local operation (default)
Every AWS service has a local fallback. If a cloud call fails, the system degrades gracefully to the local alternative — the user never sees a broken experience.
┌─────────────────────────────────────────────────────────────────────┐
│ AWS Service Map │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Bedrock │ │ DynamoDB │ │ S3 │ │
│ │ (AI/LLM) │ │ (Database) │ │ (File Storage) │ │
│ │ │ │ │ │ │ │
│ │ Claude 3 │ │ Sessions │ │ Document uploads │ │
│ │ Llama │ │ Submissions │ │ Scanned IDs │ │
│ │ Titan │ │ PAY_PER_REQ │ │ Generated PDFs │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ ┌──────▼───────┐ ┌──────▼───────┐ ┌──────────▼───────────┐ │
│ │ Textract │ │ Polly │ │ Transcribe │ │
│ │ (OCR) │ │ (TTS) │ │ (STT) │ │
│ │ │ │ │ │ │ │
│ │ Aadhaar scan │ │ Neural voice │ │ 8 Indian languages │ │
│ │ PAN scan │ │ 8 languages │ │ Streaming support │ │
│ │ Form fields │ │ SSML rate │ │ PCM/OGG input │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Translate │ │
│ │ (Dynamic Translation) │ │
│ │ │ │
│ │ Single text + batch (up to 25) · Auto-detect source lang │ │
│ │ Supports: en, hi, ta, te, mr, bn, gu, kn │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Purpose: Cloud-hosted foundation model inference for the document-grounded AI assistant, replacing local Ollama/LM Studio.
File: server/aws/bedrock.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-bedrock-runtime |
| Operations | InvokeModel (non-streaming), InvokeModelWithResponseStream (streaming SSE) |
| Default Model | anthropic.claude-3-haiku-20240307-v1:0 |
| Supported Families | Claude (Messages API), Llama (prompt-based), Titan (inputText-based) |
| Temperature | 0.3 (low for factual accuracy) |
| Max Tokens | 512 |
| Local Fallback | Ollama (llama3) or LM Studio |
How it works:
- The chat route checks
useBedrock()— if true, delegates tobedrock.ts; otherwise usesollama.ts. - Knowledge documents are selected based on the active form (
getDocumentsForForm(formId)) or all documents. - A system prompt with 9 grounding rules + the full knowledge base is built.
- The request body is formatted per model family (Claude Messages API / Llama prompt template / Titan inputText).
- The response is parsed per family, tagged with grounding metadata, and checked for refusal phrases.
Streaming flow:
Client POST /api/chat/stream
→ Server builds Bedrock request
→ InvokeModelWithResponseStreamCommand
→ For each chunk: SSE event "token" with partial text
→ Final SSE event "done" with {reply, grounded, sources}
Auto-detection logic:
isClaude(modelId) → modelId.startsWith('anthropic.claude')
isLlama(modelId) → modelId.includes('llama')
isTitan(modelId) → modelId.startsWith('amazon.titan')Purpose: Scalable NoSQL database replacing local SQLite for session and submission storage.
File: server/aws/dynamodb.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-dynamodb + @aws-sdk/lib-dynamodb (DocumentClient) |
| Tables | {prefix}-Sessions, {prefix}-Submissions |
| Key Schema | Single hash key: id (String) |
| Billing Mode | PAY_PER_REQUEST (on-demand, no capacity planning) |
| Auto-creation | Tables are created at server startup if they don't exist |
| Local Fallback | SQLite (better-sqlite3) at data/sarthi.db |
Table structure:
{prefix}-Sessions
├── id (PK) — UUID
├── form_id — e.g. "pension"
├── form_title — e.g. "Old Age Pension"
├── status — "in-progress" | "completed"
├── form_values — JSON string of all field values
├── current_step — integer (1-based)
├── created_at — ISO timestamp
└── updated_at — ISO timestamp
{prefix}-Submissions
├── id (PK) — UUID
├── session_id — linked session (nullable)
├── form_id — form identifier
├── form_title — form display name
├── form_values — JSON string of final values
└── submitted_at — ISO timestamp
Database Provider Pattern (server/dbProvider.ts):
The provider abstracts the database layer. All route handlers call dbProvider functions — never db.ts directly. The provider checks useDynamoDB() at runtime and delegates accordingly, with automatic fallback:
export async function createSession(data) {
if (useDynamoDB()) {
try {
return await dynamo.createSession(data); // Cloud
} catch (err) {
console.warn('DynamoDB failed, using SQLite');
}
}
return sqliteDb.createSession(data); // Local fallback
}Purpose: Cloud storage for document uploads (scanned IDs, photos) and generated PDFs.
File: server/aws/s3.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner |
| Default Bucket | sarthi-ai-documents (configurable via S3_BUCKET) |
| Operations | PutObject, GetObject, ListObjectsV2, DeleteObject, HeadBucket, CreateBucket |
| Presigned URLs | 1-hour expiry for secure downloads |
| Max Upload Size | 10 MB |
| Local Fallback | Local filesystem at data/uploads/ |
API routes (server/routes/storage.ts):
| Method | Route | Description |
|---|---|---|
POST |
/api/storage/upload |
Upload file (multipart, ?prefix=sessions/abc) |
GET |
/api/storage/download/* |
Get presigned URL or direct download |
GET |
/api/storage/list |
List files by prefix |
DELETE |
/api/storage/* |
Delete a file |
Bootstrap: At server startup, ensureBucket() checks if the bucket exists and creates it if not:
await client.send(new HeadBucketCommand({ Bucket }));
// If 404 → CreateBucketCommand with region constraintPurpose: Server-side document text extraction for scanning Indian government documents (Aadhaar, PAN, etc.).
File: server/aws/textract.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-textract |
| Operations | DetectDocumentText (basic OCR), AnalyzeDocument (FORMS feature for key-value pairs) |
| Input | Raw image bytes (JPEG, PNG, PDF) — up to 10 MB |
| Local Fallback | Tesseract.js in the browser (src/hooks/useOCR.ts) |
Field extraction patterns (regex-based, shared between server and client):
| Field | Pattern | Example |
|---|---|---|
| Aadhaar Number | \b(\d{4}\s?\d{4}\s?\d{4})\b |
1234 5678 9012 |
| PAN Number | \b([A-Z]{5}\d{4}[A-Z])\b |
ABCDE1234F |
| Date of Birth | (\d{2}[\/\-\.]\d{2}[\/\-\.]\d{4}) |
15/03/1960 |
| Phone Number | \b([6-9]\d{9})\b |
9876543210 |
| IFSC Code | \b([A-Z]{4}0[A-Za-z0-9]{6})\b |
SBIN0001234 |
| Gender | \b(male|female|पुरुष|महिला)\b |
Male |
| PIN Code | \b(\d{6})\b |
226001 |
API routes (server/routes/ocr.ts):
| Method | Route | Description |
|---|---|---|
POST |
/api/ocr |
Basic text detection |
POST |
/api/ocr/analyze |
FORMS-based document analysis (richer extraction) |
Dual-mode OCR flow:
User takes photo of Aadhaar card
├─ AWS enabled → POST /api/ocr → Textract DetectDocumentText → structured fields
└─ AWS disabled → Client-side Tesseract.js → regex extraction → auto-fill fields
Purpose: Server-side neural text-to-speech in Indian languages, replacing browser Speech Synthesis for higher quality output.
File: server/aws/polly.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-polly |
| Operation | SynthesizeSpeech |
| Output Format | MP3 (audio/mpeg) at 24000 Hz sample rate |
| Engine | Neural (natural-sounding) |
| Default Voice | Kajal (Indian English/Hindi neural voice) |
| Max Input | 3000 characters |
| Local Fallback | Browser Web Speech Synthesis API |
Language-to-voice mapping:
| Language | Voice ID | Language Code | Engine |
|---|---|---|---|
| English | Kajal | en-IN |
Neural |
| Hindi | Kajal | hi-IN |
Neural |
| Tamil | Kajal | en-IN |
Neural |
| Telugu | Kajal | en-IN |
Neural |
| Marathi | Kajal | hi-IN |
Neural |
| Bengali | Kajal | en-IN |
Neural |
| Gujarati | Kajal | hi-IN |
Neural |
| Kannada | Kajal | en-IN |
Neural |
SSML support: When speech rate is specified (slow/fast), Polly receives SSML with <prosody rate="..."> tags.
API route: POST /api/speech/synthesize → returns raw MP3 binary with Content-Type: audio/mpeg.
Purpose: Server-side speech recognition in Indian languages, complementing the browser Web Speech Recognition API.
File: server/aws/transcribe.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-transcribe-streaming |
| Operation | StartStreamTranscription |
| Input | Audio buffer (PCM/WAV or OGG-Opus) up to 5 MB |
| Default Sample Rate | 16000 Hz |
| Local Fallback | Browser Web Speech Recognition API |
Language support:
| Language | Transcribe Code |
|---|---|
| English | en-IN |
| Hindi | hi-IN |
| Tamil | ta-IN |
| Telugu | te-IN |
| Marathi | mr-IN |
| Bengali | bn-IN |
| Gujarati | gu-IN |
| Kannada | kn-IN |
API route: POST /api/speech/transcribe (multipart audio upload) → returns { transcript, confidence, provider }.
Purpose: Dynamic real-time text translation between SarthiAI's 8 supported languages, supplementing static i18n bundles.
File: server/aws/translate.ts
| Aspect | Detail |
|---|---|
| SDK | @aws-sdk/client-translate |
| Operation | TranslateText |
| Auto-detect | Source language can be set to "auto" |
| Batch | Parallel TranslateText calls for up to 25 texts |
| Max Text Length | 5000 characters per request |
| Local Fallback | Static i18n translation bundles (pre-built) |
API routes (server/routes/translate.ts):
| Method | Route | Description |
|---|---|---|
POST |
/api/translate |
Translate single text |
POST |
/api/translate/batch |
Translate up to 25 texts in parallel |
Central config: server/aws/config.ts
All AWS services share a common configuration factory:
# Master switch
USE_AWS=true # Enables all AWS services
# Credentials (optional — SDK falls back to instance roles on EC2/ECS/Lambda)
AWS_REGION=ap-south-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
# Service-specific
BEDROCK_MODEL=anthropic.claude-3-haiku-20240307-v1:0
DYNAMO_TABLE_PREFIX=SarthiAI
S3_BUCKET=sarthi-ai-documents
DB_PROVIDER=dynamodb # or "sqlite" to force local DB even with USE_AWS=true
AI_PROVIDER=bedrock # or "ollama" / "lmstudio"Fallback matrix:
| AWS Service | Local Fallback | Trigger |
|---|---|---|
| Bedrock | Ollama / LM Studio | AI_PROVIDER≠bedrock or Bedrock API error |
| DynamoDB | SQLite (data/sarthi.db) |
DB_PROVIDER=sqlite or DynamoDB error |
| S3 | Local filesystem (data/uploads/) |
USE_AWS=false or S3 error |
| Textract | Tesseract.js (client-side) | USE_AWS=false |
| Polly | Browser Speech Synthesis | USE_AWS=false or Polly error |
| Transcribe | Browser Speech Recognition | USE_AWS=false or Transcribe error |
| Translate | Static i18n bundles | USE_AWS=false or Translate error |
Feature flags (from config.ts):
isAWSEnabled() // Master switch: USE_AWS=true?
useDynamoDB() // isAWSEnabled() && DB_PROVIDER !== 'sqlite'
useBedrock() // isAWSEnabled() && AI_PROVIDER === 'bedrock'
useTextract() // isAWSEnabled()
usePolly() // isAWSEnabled()
useTranscribe() // isAWSEnabled()
useTranslateService() // isAWSEnabled()| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 19, TypeScript 5.8 | UI framework |
| Styling | Tailwind CSS v4, Framer Motion | Responsive design & animations |
| Voice (Client) | Web Speech Recognition API, Web Speech Synthesis API | Browser-native speech |
| Voice (Cloud) | Amazon Polly, Amazon Transcribe | Neural TTS & multi-language STT |
| OCR (Client) | Tesseract.js | Offline document scanning |
| OCR (Cloud) | Amazon Textract | High-accuracy document extraction |
| AI (Local) | Ollama (llama3) or LM Studio |
Local LLM inference |
| AI (Cloud) | Amazon Bedrock (Claude / Llama / Titan) | Cloud LLM inference |
| Translation | Amazon Translate + static i18n bundles | Dynamic & static translations |
| Offline | IndexedDB (idb), Workbox, vite-plugin-pwa |
PWA offline support |
| Backend | Express.js, Node.js | REST API server |
| Database (Local) | SQLite (better-sqlite3) |
Local persistence |
| Database (Cloud) | Amazon DynamoDB | Scalable cloud persistence |
| Storage (Cloud) | Amazon S3 | Document & file storage |
| jsPDF | Client-side PDF generation | |
| Build | Vite 6, tsx |
Dev server & TypeScript execution |
| Icons | Lucide React | 30+ accessible icons |
sarthi-ai/
├── index.html # Entry HTML with PWA meta tags & skip link
├── vite.config.ts # Vite + Tailwind + PWA + Workbox config
├── package.json # Dependencies & scripts
├── tsconfig.json # TypeScript config (client)
├── tsconfig.server.json # TypeScript config (server)
├── metadata.json # App metadata (name, permissions)
│
├── public/icons/ # PWA icons (192px, 512px)
├── data/ # Runtime data (auto-created)
│ ├── sarthi.db # SQLite database
│ └── uploads/ # Local file storage fallback
│
├── server/ # ── Express Backend ──────────────
│ ├── index.ts # Server entry, middleware, route mounting
│ ├── db.ts # SQLite schema, prepared statements, CRUD
│ ├── dbProvider.ts # Unified DB interface (SQLite ↔ DynamoDB)
│ │
│ ├── ai/
│ │ ├── grounding.ts # 7+ RAG knowledge documents (verified text)
│ │ └── ollama.ts # Ollama / LM Studio client + Bedrock delegation
│ │
│ ├── aws/
│ │ ├── config.ts # Central AWS config, feature flags, client factory
│ │ ├── bedrock.ts # Bedrock AI (Claude/Llama/Titan) — chat + streaming
│ │ ├── dynamodb.ts # DynamoDB sessions & submissions tables
│ │ ├── s3.ts # S3 file upload, download, presigned URLs
│ │ ├── polly.ts # Polly TTS — neural voice synthesis
│ │ ├── textract.ts # Textract OCR — document text extraction
│ │ ├── transcribe.ts # Transcribe STT — speech recognition
│ │ └── translate.ts # Translate — dynamic text translation
│ │
│ └── routes/
│ ├── sessions.ts # Session CRUD endpoints
│ ├── submissions.ts # Submission endpoints
│ ├── sync.ts # Bulk offline sync endpoint
│ ├── chat.ts # AI chat (non-streaming + SSE streaming)
│ ├── ocr.ts # Textract OCR endpoints
│ ├── speech.ts # Polly TTS + Transcribe STT endpoints
│ ├── translate.ts # Translate endpoints (single + batch)
│ └── storage.ts # S3 / local file storage endpoints
│
└── src/ # ── React Frontend ───────────────
├── main.tsx # App entry, SW registration, sync init
├── App.tsx # State-driven page router (lazy-loaded)
├── index.css # Global styles + Tailwind directives
│
├── api/
│ └── client.ts # Typed fetch wrapper for all API endpoints
│
├── components/
│ ├── Layout.tsx # App shell: top nav + bottom nav + sidebar
│ ├── Navigation.tsx # Top bar + bottom tab bar
│ ├── ContextPanel.tsx # Desktop sidebar (steps) + mobile slide menu
│ ├── AIAssistant.tsx # Floating AI chat panel with streaming
│ ├── OfflineBanner.tsx # Connectivity status banners (amber/blue/green)
│ ├── WelcomeModal.tsx # First-visit language picker modal
│ └── UIElements.tsx # Shared buttons, cards, badges, inputs
│
├── context/
│ └── FormContext.tsx # Single React Context for all app state
│
├── pages/
│ ├── Home.tsx # Voice-first home screen with large mic button
│ ├── Forms.tsx # Form catalog with search
│ ├── FormInfo.tsx # Form details (purpose, eligibility, docs)
│ ├── FieldInput.tsx # Field-by-field input with voice & OCR
│ ├── Review.tsx # Review all answers before submission
│ ├── Activity.tsx # Session history with filters
│ ├── Help.tsx # FAQ and usage help
│ ├── Settings.tsx # Language, theme, voice, text size controls
│ └── Scan.tsx # Document scanning UI
│
├── forms/
│ ├── schema.ts # FormSchema, FormField, validation type defs
│ ├── registry.ts # Central form registry (register + lookup)
│ ├── validation.ts # Field validation logic
│ ├── index.ts # Public API re-exports
│ ├── definitions/ # Form schemas (one file per form)
│ │ ├── pension.ts # Old Age Pension (IGNOAPS)
│ │ ├── housing.ts # Housing Subsidy (PMAY)
│ │ ├── rationCard.ts # Ration Card (NFSA)
│ │ ├── casteCertificate.ts # Caste Certificate
│ │ ├── incomeCertificate.ts# Income Certificate
│ │ ├── kisanSammanNidhi.ts # PM-KISAN
│ │ └── locationFields.ts # Shared location fields (state, district, etc.)
│ └── i18n/ # Per-form translations (7 languages × 6 forms)
│ ├── index.ts # Translation loader
│ ├── types.ts # Translation type definitions
│ ├── pension/ # hi, mr, ta, te, bn, gu, kn
│ ├── housing/
│ ├── rationCard/
│ ├── casteCertificate/
│ ├── incomeCertificate/
│ └── kisanSammanNidhi/
│
├── hooks/
│ ├── useAIChat.ts # AI conversation state + streaming
│ ├── useSpeechRecognition.ts # Web Speech STT hook
│ ├── useSpeechSynthesis.ts # Web Speech TTS hook + Polly integration
│ ├── useOCR.ts # Tesseract.js OCR + Textract fallback
│ ├── useNetworkStatus.ts # Online/offline + connection quality detection
│ ├── useResponsiveLayout.ts # Breakpoint detection
│ └── useTranslatedSchema.ts # FormSchema i18n translation hook
│
├── i18n/ # App-level UI translations
│ ├── index.ts # translate() function
│ ├── en.ts, hi.ts, mr.ts # 8 language files
│ ├── ta.ts, te.ts, bn.ts
│ └── gu.ts, kn.ts
│
├── offline/
│ ├── db.ts # IndexedDB schema (sessions, submissions, syncQueue)
│ ├── offlineApi.ts # Offline-first API wrapper
│ ├── syncManager.ts # Queue drain + reconciliation + backoff
│ ├── schemaCache.ts # Cache form schemas offline
│ └── index.ts # Re-exports
│
├── utils/
│ ├── aadhaarVerhoeff.ts # Aadhaar 12-digit Verhoeff checksum validation
│ ├── announceToScreenReader.ts # ARIA live-region announcements
│ ├── detectLanguage.ts # Browser language auto-detection
│ ├── generatePdf.ts # Client-side PDF generation (jsPDF)
│ ├── localeFormat.ts # Indian number/date formatting (lakhs/crores)
│ └── trackingId.ts # Unique tracking ID generator
│
└── voice/
└── language.ts # BCP-47 locale mappings, voice picker
- Create a schema in
src/forms/definitions/myForm.ts:
import type { FormSchema } from '../schema';
export const myForm: FormSchema = {
id: 'my-form',
title: 'My Government Form',
description: 'Short description',
icon: 'FileText', // Lucide icon name
iconBgColor: 'bg-teal-50',
iconColor: 'text-teal-600',
purpose: 'What this form is for...',
eligibility: ['Criterion 1', 'Criterion 2'],
requiredDocuments: ['Aadhaar Card', 'Bank Passbook'],
estimatedTime: '5–10 minutes',
fields: [
{
key: 'fullName',
label: 'Full Name',
type: 'text',
placeholder: 'e.g., Ramesh Kumar',
helpText: 'Enter your name as it appears on your Aadhaar.',
validation: { required: true, minLength: 2, maxLength: 100 },
group: 'Personal Details',
},
// ... more fields
],
};- Register it in
src/forms/registry.ts:
import { myForm } from './definitions/myForm';
register(myForm);-
(Optional) Add translations in
src/forms/i18n/myForm/hi.ts,ta.ts, etc. -
(Optional) Add grounding documents in
server/ai/grounding.tsfor AI assistance.
That's it. The form appears in the catalog, renders field-by-field, validates, saves sessions, and generates PDFs — all automatically.
| Safeguard | Implementation |
|---|---|
| Grounding-only responses | 9-rule system prompt; RAG retrieval required before any generation |
| Hallucination tagging | Every response marked Verified (shield icon) or Unverified (warning icon) |
| Refusal on uncertainty | 7 refusal-phrase heuristics; refuses rather than guessing |
| No authority claims | System prompt prohibits "I guarantee", "I confirm", "I certify" |
| Scam protection | Detects payment-related queries — warns: "Government form filing is free" |
| Local processing option | Ollama/LM Studio run locally; no user data leaves the device |
| Input limits | 1000-char message limit, 10-message history cap, 512-token response cap |
| No auto-submission | User must review every field and explicitly confirm before submission |
| Rate limiting | 15 req/min for AI chat, 100 req/min for general API |
ONLINE OFFLINE
┌──────────────────────┐ ┌──────────────────────┐
│ offlineApi.ts │ │ offlineApi.ts │
│ │ │ │
│ 1. Call real API │ │ 1. Read from IDB │
│ 2. Cache to IndexedDB│ │ 2. Enqueue write to │
│ 3. Return response │ │ syncQueue │
└──────────┬───────────┘ └──────────────────────┘
│ │
│ ┌───────────┐ │
│ │ RECONNECT │◀──────────┘
│ └─────┬─────┘
│ │
│ ┌───────────▼──────────┐
│ │ syncManager.ts │
│ │ │
│ │ 1. Drain queue FIFO │
│ │ 2. POST /api/sync │
│ │ (batched actions) │
│ │ 3. Reconcile server │
│ │ state with local │
│ │ 4. Exponential │
│ │ backoff on fail │
│ │ (1s→2s→4s→…→30s) │
│ │ 5. Max 5 retries │
│ └──────────────────────┘
│
┌──────────▼───────────┐
│ IndexedDB Stores │
│ │
│ sessions │ ← form-filling progress
│ submissions │ ← completed forms
│ syncQueue │ ← pending API calls
└──────────────────────┘
Data persistence layers: localStorage (settings) → IndexedDB (sessions/submissions/queue) → Server (SQLite/DynamoDB).
Conflict resolution: Server wins unless the local record has a newer updatedAt timestamp.
| Feature | Implementation |
|---|---|
| Voice as primary input | Large mic button (192px Home, 128px fields) with animated ripple |
| Text size control | Small / Medium / Large, persisted in localStorage |
| Dark mode | System preference detection + manual toggle |
| Volume & speed controls | Adjustable TTS rate (0.5×–2×) with test button |
| Field help tooltips | Every field has a plain-language explanation |
| Large touch targets | All interactive elements ≥ 48px, most 64–80px |
| Press feedback | active:scale-95 on all buttons |
| Step progress | Dot indicators, text percentage, animated progress bar |
| Auto-save | Tab switching during form filling auto-saves |
| ARIA attributes | aria-label, aria-live="polite", role="status" |
| Skip link | "Skip to main content" link in HTML |
| Screen reader | announceToScreenReader() utility for dynamic announcements |
| Method | Path | Purpose |
|---|---|---|
GET |
/api/sessions |
List all sessions |
GET |
/api/sessions/:id |
Get a single session |
POST |
/api/sessions |
Create a new session |
PUT |
/api/sessions/:id |
Update a session |
DELETE |
/api/sessions/:id |
Delete a session |
GET |
/api/submissions |
List all submissions |
GET |
/api/submissions/:id |
Get a single submission |
POST |
/api/submissions |
Create a submission |
POST |
/api/sync |
Bulk sync (batch actions + full state return) |
POST |
/api/chat |
AI chat (grounded, multi-turn) |
POST |
/api/chat/stream |
AI chat with SSE streaming |
GET |
/api/health |
Health check (includes AWS status) |
| Method | Path | Purpose | AWS Service |
|---|---|---|---|
POST |
/api/ocr |
Document text detection | Textract |
POST |
/api/ocr/analyze |
Document analysis (FORMS) | Textract |
POST |
/api/speech/synthesize |
Text-to-speech (MP3) | Polly |
POST |
/api/speech/transcribe |
Speech-to-text | Transcribe |
POST |
/api/translate |
Single text translation | Translate |
POST |
/api/translate/batch |
Batch translation (≤25) | Translate |
POST |
/api/storage/upload |
File upload | S3 |
GET |
/api/storage/download/* |
File download / presigned URL | S3 |
GET |
/api/storage/list |
List files by prefix | S3 |
DELETE |
/api/storage/* |
Delete file | S3 |
| Endpoint | Limit |
|---|---|
General API (/api/*) |
100 requests/minute |
AI Chat (/api/chat) |
15 requests/minute |
- Node.js 18+
- AI Model (one of):
# Clone the repository
git clone <repo-url> && cd sarthi-ai
# Install dependencies
npm install
# Start both frontend (port 3000) and backend (port 3001)
npm run dev:all# Set environment variables
export USE_AWS=true
export AWS_REGION=ap-south-1
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AI_PROVIDER=bedrock
# Start
npm run dev:all| Command | Description |
|---|---|
npm run dev |
Vite dev server on port 3000 |
npm run server |
Express API on port 3001 |
npm run dev:all |
Both frontend + backend concurrently |
npm run build |
Production build |
npm run start |
Build + serve production |
npm run lint |
TypeScript type check |
npm run clean |
Remove dist/ folder |
| Variable | Default | Description |
|---|---|---|
USE_AWS |
false |
Master switch for all 7 AWS services |
AWS_REGION |
ap-south-1 |
AWS region |
AWS_ACCESS_KEY_ID |
— | IAM access key (optional on EC2/ECS/Lambda) |
AWS_SECRET_ACCESS_KEY |
— | IAM secret key |
AI_PROVIDER |
ollama |
ollama, lmstudio, or bedrock |
OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3 |
Ollama model name |
LMSTUDIO_URL |
http://localhost:1234 |
LM Studio server URL |
LMSTUDIO_MODEL |
gemma-3-4b |
LM Studio model name |
BEDROCK_MODEL |
anthropic.claude-3-haiku-20240307-v1:0 |
Bedrock model ID |
DB_PROVIDER |
— | Set to sqlite to force local DB even with AWS |
DYNAMO_TABLE_PREFIX |
SarthiAI |
DynamoDB table name prefix |
S3_BUCKET |
sarthi-ai-documents |
S3 bucket name |
SERVER_PORT |
3001 |
Express server port |
- Prototype / hackathon project — not production-hardened.
- 6 demo forms. New forms require adding a definition + translations.
- No user accounts or authentication.
- AI assistant requires a running LLM (local or Bedrock); unavailable offline.
- Speech recognition browser support varies (Chrome/Edge recommended; limited on Firefox/Safari).
- Does NOT submit forms to any government system — users must submit via official channels.
SarthiAI is a prototype project. It does not replace official guidance from government agencies. The tool does NOT submit forms automatically — users retain full control. The AI assistant refuses to answer when reliable information cannot be found in verified documents. No middleman fee is required for any government form filing.