SarthiAI

Voice-first, offline-capable form assistant that helps Indian citizens understand and fill government forms in 8 languages — powered by document-grounded AI, AWS cloud services, and accessibility-first design.

Problem Statement

Government and public-service forms in India are difficult for millions of citizens due to:

Barrier	Who it affects
Complex legal language on forms	Low-literacy and first-time users
English-only interfaces	900M+ non-English speakers
Small text & dense layouts	Elderly citizens, vision-impaired users
No offline access	Rural users with intermittent connectivity
AI tools that hallucinate	Users who trust incorrect information
Middlemen charging fees	Economically vulnerable populations

These barriers disproportionately affect rural users, elderly citizens, differently-abled individuals, and first-time digital users — the very populations these government schemes are designed to help.

Solution

SarthiAI is a voice-first Progressive Web App (PWA) that walks users through government forms field-by-field using speech. An AI assistant explains each field in simple, regional language — grounded entirely in verified documents so it never fabricates information.

Users dictate answers, hear them read back, scan documents to auto-fill fields, and review everything before confirming. The app works fully offline and syncs when connectivity returns. When deployed with AWS, it gains cloud-grade OCR, speech services, translation, and scalable storage.

Key Features

Voice-First Interaction

Speech-to-Text — dictate answers via Web Speech Recognition API or AWS Transcribe. Supports continuous and single-utterance modes with real-time interim transcripts.
Text-to-Speech — the app speaks field labels, help text, and AI responses via Web Speech Synthesis API or Amazon Polly (natural neural voices).
Voice states — clear animated UI states: Idle → Listening (ripple) → Thinking (spinner) → Speaking (pulse) → Error.

8-Language Support

Full UI and form-content translations in English, Hindi (हिन्दी), Marathi (मराठी), Tamil (தமிழ்), Telugu (తెలుగు), Bengali (বাংলা), Gujarati (ગુજરાતી), Kannada (ಕನ್ನಡ).

Auto-detects browser language on first visit.
Every field label, placeholder, help text, option, and validation message is translated.
Locale-aware date/number formatting (Indian numbering: lakhs/crores).
Dynamic translation powered by Amazon Translate when AWS is enabled.

Schema-Driven Forms

Forms are defined once as typed TypeScript schemas and rendered dynamically:

Form	Description	Fields
Old Age Pension (IGNOAPS)	Monthly pension for senior citizens	14
Housing Subsidy (PMAY)	Housing for All scheme assistance	17
Ration Card (NFSA)	Subsidised food grains entitlement	15+
Caste Certificate	SC/ST/OBC category proof	12+
Income Certificate	Family income verification	12+
PM-KISAN	₹6,000/year farmer income support	14+

Each form includes eligibility criteria, required documents list, purpose description, and estimated completion time. New forms are added by creating a definition file and registering it — zero UI code changes needed.

Supported field types: text, date, select, tel, textarea, number, checkbox, radio — with validation rules (required, min/max length, regex pattern, cross-field validation), conditional visibility, and field grouping.

Document Scanning (OCR)

Client-side: Tesseract.js for offline OCR
Server-side: Amazon Textract for high-accuracy extraction
Auto-detects and extracts: Aadhaar numbers, PAN, dates of birth, phone numbers, IFSC codes, gender, names, PIN codes
Scanned data auto-fills matching form fields

Document-Grounded AI Assistant

RAG architecture — 7+ embedded knowledge documents covering pension, housing, ration card, caste certificate, income certificate, and PM-KISAN schemes.
Strict grounding — 9-rule system prompt prevents hallucination. Every response tagged as Verified or Unverified.
Multi-provider support — Ollama (local), LM Studio (local), or Amazon Bedrock (cloud). Supports Claude, Llama, and Titan model families.
Streaming responses — tokens appear in real-time via SSE.
Context-aware — knows which field the user is filling and provides targeted help.

Offline & Low-Connectivity

Full PWA with Service Worker caching (Workbox).
IndexedDB local storage (sessions, submissions, sync queue).
Incremental sync queue with exponential backoff.
Bulk sync endpoint for efficient reconciliation on slow networks.
Slow-connection detection (2G/slow-2G) with user-facing banners.
Installable to mobile home screens.

PDF Generation

Client-side PDF generation via jsPDF (works offline).
Includes tracking ID, submission date, all field values, and disclaimer footer.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         Browser (PWA)                               │
│                                                                     │
│   React 19 + Tailwind CSS v4 + Framer Motion                       │
│   ┌────────────┐  ┌────────────┐  ┌───────────────────────────┐    │
│   │ Voice Layer│  │   Form     │  │   Offline Layer           │    │
│   │ STT / TTS  │  │  Context   │  │ IndexedDB + SyncQueue     │    │
│   │ (Browser   │  │ (Single    │  │ ┌─────────┐ ┌──────────┐ │    │
│   │  APIs)     │  │  React     │  │ │sessions │ │syncQueue │ │    │
│   └─────┬──────┘  │  Context)  │  │ └─────────┘ └──────────┘ │    │
│         │         └─────┬──────┘  └────────────┬──────────────┘    │
│         │               │                      │                    │
│   ┌─────┴───────────────┴──────────────────────┴────────┐          │
│   │              offlineApi wrapper                      │          │
│   │    (online → API + cache | offline → IndexedDB)      │          │
│   └─────────────────────┬────────────────────────────────┘          │
│                         │                                           │
│   ┌─────────────────────┴────────────────────────────────┐          │
│   │              Service Worker (Workbox)                 │          │
│   │  Static: CacheFirst | API: NetworkFirst | Chat: NetworkOnly     │
│   └─────────────────────┬────────────────────────────────┘          │
└─────────────────────────┼───────────────────────────────────────────┘
                          │  /api/*
              ┌───────────▼───────────┐
              │    Express Server     │ (port 3001)
              │    ┌───────────────┐  │
              │    │  DB Provider  │──┼──▶ SQLite (local) OR DynamoDB (AWS)
              │    └───────────────┘  │
              │    ┌───────────────┐  │
              │    │   AI Layer    │──┼──▶ Ollama / LM Studio (local) OR Bedrock (AWS)
              │    │  + Grounding  │  │
              │    └───────────────┘  │
              │    ┌───────────────┐  │
              │    │  AWS Services │──┼──▶ S3 · Textract · Polly · Transcribe · Translate
              │    └───────────────┘  │
              └───────────────────────┘

Key Design Decisions

Decision	Rationale
State-driven routing (no router library)	`formPhase` × `activeTab` keeps navigation simple. No URL complexity for target users.
Single React Context for all state	Avoids external state management complexity. Single source of truth.
Offline-first API wrapper	Every call goes through `offlineApi` which saves to IndexedDB first, then syncs.
Local-first + AWS upgrade path	Runs fully offline with local tools. AWS services layer on when `USE_AWS=true`.
Schema-driven form rendering	Adding new forms requires zero UI changes — just a schema file + registration.
Document-grounded AI only	9-rule system prompt ensures no hallucination. Refuses when unsure.

User Flow

┌──────────────┐     ┌───────────────┐     ┌─────────────────┐
│  1. Welcome  │────▶│  2. Language   │────▶│  3. Home        │
│  (First Visit)│     │  Selection    │     │  (Voice CTA)    │
└──────────────┘     └───────────────┘     └────────┬────────┘
                                                     │
                          ┌──────────────────────────┤
                          ▼                          ▼
                   ┌─────────────┐          ┌──────────────┐
                   │ 4a. Browse  │          │ 4b. Speak to │
                   │   Forms Tab │          │   AI on Home │
                   └──────┬──────┘          └──────────────┘
                          │
                          ▼
                   ┌──────────────┐
                   │ 5. Form Info │
                   │ (Purpose,    │
                   │  Eligibility,│
                   │  Documents)  │
                   └──────┬───────┘
                          │ "Start Filling"
                          ▼
                   ┌──────────────┐     ┌──────────────┐
                   │ 6. Field-by- │────▶│  7. Scan     │
                   │ Field Input  │◀────│  Document    │
                   │ (Voice/Type) │     │  (OCR)       │
                   │              │     └──────────────┘
                   │  ┌────────┐  │
                   │  │AI Help │  │     ← Floating AI chat panel
                   │  │ Panel  │  │       (grounded answers)
                   │  └────────┘  │
                   └──────┬───────┘
                          │ All fields complete
                          ▼
                   ┌──────────────┐
                   │ 8. Review    │
                   │ All Answers  │
                   │ (Edit any)   │
                   └──────┬───────┘
                          │ "Submit"
                          ▼
                   ┌──────────────┐     ┌──────────────┐
                   │ 9. Submitted │────▶│ 10. Download │
                   │ Confirmation │     │  PDF Summary │
                   │ (Tracking ID)│     └──────────────┘
                   └──────────────┘

Step-by-step breakdown:

Welcome Modal — First-time visitors see a language picker. Auto-detects browser language as default.
Language Selection — Choose from 8 languages. All UI, form content, and AI responses switch instantly.
Home Screen — Large microphone button (192px) for voice-first interaction. Users can speak questions or navigate to forms.
Browse Forms — Grid of available government forms with icons, descriptions, and estimated completion times.
Form Info — Before starting, users see: purpose of the form, eligibility criteria, required documents, and estimated time.
Field-by-Field Input — One field at a time with large text, help tooltips, voice dictation, and progress indicators (dots + percentage + animated bar).
Document Scanning — Camera-based OCR scans Aadhaar, PAN, etc. and auto-fills fields. Uses Textract (AWS) or Tesseract.js (offline).
Review — All answers displayed for final verification. Users can tap any field to edit it.
Submission — Confirmation screen with a unique tracking ID. Session marked as completed.
PDF Download — Client-side PDF generated with all form data, tracking ID, and official disclaimer.

Auto-save: Progress is saved automatically when switching tabs or navigating away. Users can resume from the Activity page at any time.

AWS Service Integration

Architecture Decision: Local-First with AWS Upgrade

SarthiAI is designed with a dual-mode architecture — it runs fully offline using local tools (SQLite, Ollama, Tesseract.js, Web Speech APIs), and gains cloud-grade capabilities when AWS is enabled via a single environment variable:

USE_AWS=true   →  All 7 AWS services activate (with automatic local fallbacks on error)
USE_AWS=false  →  Fully local operation (default)

Every AWS service has a local fallback. If a cloud call fails, the system degrades gracefully to the local alternative — the user never sees a broken experience.

┌─────────────────────────────────────────────────────────────────────┐
│                     AWS Service Map                                  │
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐  │
│  │   Bedrock    │    │  DynamoDB    │    │        S3            │  │
│  │  (AI/LLM)   │    │ (Database)   │    │   (File Storage)     │  │
│  │             │    │              │    │                      │  │
│  │ Claude 3    │    │ Sessions     │    │ Document uploads     │  │
│  │ Llama       │    │ Submissions  │    │ Scanned IDs          │  │
│  │ Titan       │    │ PAY_PER_REQ  │    │ Generated PDFs       │  │
│  └──────┬───────┘    └──────┬───────┘    └──────────┬───────────┘  │
│         │                   │                       │              │
│  ┌──────▼───────┐    ┌──────▼───────┐    ┌──────────▼───────────┐  │
│  │  Textract    │    │   Polly      │    │    Transcribe        │  │
│  │   (OCR)      │    │   (TTS)      │    │     (STT)            │  │
│  │             │    │              │    │                      │  │
│  │ Aadhaar scan │    │ Neural voice │    │ 8 Indian languages   │  │
│  │ PAN scan     │    │ 8 languages  │    │ Streaming support    │  │
│  │ Form fields  │    │ SSML rate    │    │ PCM/OGG input        │  │
│  └──────────────┘    └──────────────┘    └──────────────────────┘  │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                     Translate                                 │   │
│  │                  (Dynamic Translation)                        │   │
│  │                                                               │   │
│  │  Single text + batch (up to 25) · Auto-detect source lang     │   │
│  │  Supports: en, hi, ta, te, mr, bn, gu, kn                    │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

1. Amazon Bedrock — AI/LLM

Purpose: Cloud-hosted foundation model inference for the document-grounded AI assistant, replacing local Ollama/LM Studio.

File: server/aws/bedrock.ts

Aspect	Detail
SDK	`@aws-sdk/client-bedrock-runtime`
Operations	`InvokeModel` (non-streaming), `InvokeModelWithResponseStream` (streaming SSE)
Default Model	`anthropic.claude-3-haiku-20240307-v1:0`
Supported Families	Claude (Messages API), Llama (prompt-based), Titan (inputText-based)
Temperature	0.3 (low for factual accuracy)
Max Tokens	512
Local Fallback	Ollama (`llama3`) or LM Studio

How it works:

The chat route checks useBedrock() — if true, delegates to bedrock.ts; otherwise uses ollama.ts.
Knowledge documents are selected based on the active form (getDocumentsForForm(formId)) or all documents.
A system prompt with 9 grounding rules + the full knowledge base is built.
The request body is formatted per model family (Claude Messages API / Llama prompt template / Titan inputText).
The response is parsed per family, tagged with grounding metadata, and checked for refusal phrases.

Streaming flow:

Client POST /api/chat/stream
  → Server builds Bedrock request
  → InvokeModelWithResponseStreamCommand
  → For each chunk: SSE event "token" with partial text
  → Final SSE event "done" with {reply, grounded, sources}

Auto-detection logic:

isClaude(modelId)  → modelId.startsWith('anthropic.claude')
isLlama(modelId)   → modelId.includes('llama')
isTitan(modelId)   → modelId.startsWith('amazon.titan')

2. Amazon DynamoDB — Database

Purpose: Scalable NoSQL database replacing local SQLite for session and submission storage.

File: server/aws/dynamodb.ts

Aspect	Detail
SDK	`@aws-sdk/client-dynamodb` + `@aws-sdk/lib-dynamodb` (DocumentClient)
Tables	`{prefix}-Sessions`, `{prefix}-Submissions`
Key Schema	Single hash key: `id` (String)
Billing Mode	`PAY_PER_REQUEST` (on-demand, no capacity planning)
Auto-creation	Tables are created at server startup if they don't exist
Local Fallback	SQLite (`better-sqlite3`) at `data/sarthi.db`

Table structure:

{prefix}-Sessions
├── id (PK)        — UUID
├── form_id        — e.g. "pension"
├── form_title     — e.g. "Old Age Pension"
├── status         — "in-progress" | "completed"
├── form_values    — JSON string of all field values
├── current_step   — integer (1-based)
├── created_at     — ISO timestamp
└── updated_at     — ISO timestamp

{prefix}-Submissions
├── id (PK)        — UUID
├── session_id     — linked session (nullable)
├── form_id        — form identifier
├── form_title     — form display name
├── form_values    — JSON string of final values
└── submitted_at   — ISO timestamp

Database Provider Pattern (server/dbProvider.ts):

The provider abstracts the database layer. All route handlers call dbProvider functions — never db.ts directly. The provider checks useDynamoDB() at runtime and delegates accordingly, with automatic fallback:

export async function createSession(data) {
  if (useDynamoDB()) {
    try {
      return await dynamo.createSession(data);  // Cloud
    } catch (err) {
      console.warn('DynamoDB failed, using SQLite');
    }
  }
  return sqliteDb.createSession(data);           // Local fallback
}

3. Amazon S3 — Storage

Purpose: Cloud storage for document uploads (scanned IDs, photos) and generated PDFs.

File: server/aws/s3.ts

Aspect	Detail
SDK	`@aws-sdk/client-s3` + `@aws-sdk/s3-request-presigner`
Default Bucket	`sarthi-ai-documents` (configurable via `S3_BUCKET`)
Operations	`PutObject`, `GetObject`, `ListObjectsV2`, `DeleteObject`, `HeadBucket`, `CreateBucket`
Presigned URLs	1-hour expiry for secure downloads
Max Upload Size	10 MB
Local Fallback	Local filesystem at `data/uploads/`

API routes (server/routes/storage.ts):

Method	Route	Description
`POST`	`/api/storage/upload`	Upload file (multipart, `?prefix=sessions/abc`)
`GET`	`/api/storage/download/*`	Get presigned URL or direct download
`GET`	`/api/storage/list`	List files by prefix
`DELETE`	`/api/storage/*`	Delete a file

Bootstrap: At server startup, ensureBucket() checks if the bucket exists and creates it if not:

await client.send(new HeadBucketCommand({ Bucket }));
// If 404 → CreateBucketCommand with region constraint

4. Amazon Textract — OCR

Purpose: Server-side document text extraction for scanning Indian government documents (Aadhaar, PAN, etc.).

File: server/aws/textract.ts

Aspect	Detail
SDK	`@aws-sdk/client-textract`
Operations	`DetectDocumentText` (basic OCR), `AnalyzeDocument` (FORMS feature for key-value pairs)
Input	Raw image bytes (JPEG, PNG, PDF) — up to 10 MB
Local Fallback	Tesseract.js in the browser (`src/hooks/useOCR.ts`)

Field extraction patterns (regex-based, shared between server and client):

Field	Pattern	Example
Aadhaar Number	`\b(\d{4}\s?\d{4}\s?\d{4})\b`	`1234 5678 9012`
PAN Number	`\b([A-Z]{5}\d{4}[A-Z])\b`	`ABCDE1234F`
Date of Birth	`(\d{2}[\/\-\.]\d{2}[\/\-\.]\d{4})`	`15/03/1960`
Phone Number	`\b([6-9]\d{9})\b`	`9876543210`
IFSC Code	`\b([A-Z]{4}0[A-Za-z0-9]{6})\b`	`SBIN0001234`
Gender	`\b(male\|female\|पुरुष\|महिला)\b`	`Male`
PIN Code	`\b(\d{6})\b`	`226001`

API routes (server/routes/ocr.ts):

Method	Route	Description
`POST`	`/api/ocr`	Basic text detection
`POST`	`/api/ocr/analyze`	FORMS-based document analysis (richer extraction)

Dual-mode OCR flow:

User takes photo of Aadhaar card
  ├─ AWS enabled  → POST /api/ocr → Textract DetectDocumentText → structured fields
  └─ AWS disabled → Client-side Tesseract.js → regex extraction → auto-fill fields

5. Amazon Polly — Text-to-Speech

Purpose: Server-side neural text-to-speech in Indian languages, replacing browser Speech Synthesis for higher quality output.

File: server/aws/polly.ts

Aspect	Detail
SDK	`@aws-sdk/client-polly`
Operation	`SynthesizeSpeech`
Output Format	MP3 (`audio/mpeg`) at 24000 Hz sample rate
Engine	Neural (natural-sounding)
Default Voice	`Kajal` (Indian English/Hindi neural voice)
Max Input	3000 characters
Local Fallback	Browser Web Speech Synthesis API

Language-to-voice mapping:

Language	Voice ID	Language Code	Engine
English	Kajal	`en-IN`	Neural
Hindi	Kajal	`hi-IN`	Neural
Tamil	Kajal	`en-IN`	Neural
Telugu	Kajal	`en-IN`	Neural
Marathi	Kajal	`hi-IN`	Neural
Bengali	Kajal	`en-IN`	Neural
Gujarati	Kajal	`hi-IN`	Neural
Kannada	Kajal	`en-IN`	Neural

SSML support: When speech rate is specified (slow/fast), Polly receives SSML with <prosody rate="..."> tags.

API route: POST /api/speech/synthesize → returns raw MP3 binary with Content-Type: audio/mpeg.

6. Amazon Transcribe — Speech-to-Text

Purpose: Server-side speech recognition in Indian languages, complementing the browser Web Speech Recognition API.

File: server/aws/transcribe.ts

Aspect	Detail
SDK	`@aws-sdk/client-transcribe-streaming`
Operation	`StartStreamTranscription`
Input	Audio buffer (PCM/WAV or OGG-Opus) up to 5 MB
Default Sample Rate	16000 Hz
Local Fallback	Browser Web Speech Recognition API

Language support:

Language	Transcribe Code
English	`en-IN`
Hindi	`hi-IN`
Tamil	`ta-IN`
Telugu	`te-IN`
Marathi	`mr-IN`
Bengali	`bn-IN`
Gujarati	`gu-IN`
Kannada	`kn-IN`

API route: POST /api/speech/transcribe (multipart audio upload) → returns { transcript, confidence, provider }.

7. Amazon Translate — Translation

Purpose: Dynamic real-time text translation between SarthiAI's 8 supported languages, supplementing static i18n bundles.

File: server/aws/translate.ts

Aspect	Detail
SDK	`@aws-sdk/client-translate`
Operation	`TranslateText`
Auto-detect	Source language can be set to `"auto"`
Batch	Parallel `TranslateText` calls for up to 25 texts
Max Text Length	5000 characters per request
Local Fallback	Static i18n translation bundles (pre-built)

API routes (server/routes/translate.ts):

Method	Route	Description
`POST`	`/api/translate`	Translate single text
`POST`	`/api/translate/batch`	Translate up to 25 texts in parallel

AWS Configuration & Fallback Strategy

Central config: server/aws/config.ts

All AWS services share a common configuration factory:

# Master switch
USE_AWS=true              # Enables all AWS services

# Credentials (optional — SDK falls back to instance roles on EC2/ECS/Lambda)
AWS_REGION=ap-south-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

# Service-specific
BEDROCK_MODEL=anthropic.claude-3-haiku-20240307-v1:0
DYNAMO_TABLE_PREFIX=SarthiAI
S3_BUCKET=sarthi-ai-documents
DB_PROVIDER=dynamodb      # or "sqlite" to force local DB even with USE_AWS=true
AI_PROVIDER=bedrock        # or "ollama" / "lmstudio"

Fallback matrix:

AWS Service	Local Fallback	Trigger
Bedrock	Ollama / LM Studio	`AI_PROVIDER≠bedrock` or Bedrock API error
DynamoDB	SQLite (`data/sarthi.db`)	`DB_PROVIDER=sqlite` or DynamoDB error
S3	Local filesystem (`data/uploads/`)	`USE_AWS=false` or S3 error
Textract	Tesseract.js (client-side)	`USE_AWS=false`
Polly	Browser Speech Synthesis	`USE_AWS=false` or Polly error
Transcribe	Browser Speech Recognition	`USE_AWS=false` or Transcribe error
Translate	Static i18n bundles	`USE_AWS=false` or Translate error

Feature flags (from config.ts):

isAWSEnabled()         // Master switch: USE_AWS=true?
useDynamoDB()          // isAWSEnabled() && DB_PROVIDER !== 'sqlite'
useBedrock()           // isAWSEnabled() && AI_PROVIDER === 'bedrock'
useTextract()          // isAWSEnabled()
usePolly()             // isAWSEnabled()
useTranscribe()        // isAWSEnabled()
useTranslateService()  // isAWSEnabled()

Tech Stack

Layer	Technology	Purpose
Frontend	React 19, TypeScript 5.8	UI framework
Styling	Tailwind CSS v4, Framer Motion	Responsive design & animations
Voice (Client)	Web Speech Recognition API, Web Speech Synthesis API	Browser-native speech
Voice (Cloud)	Amazon Polly, Amazon Transcribe	Neural TTS & multi-language STT
OCR (Client)	Tesseract.js	Offline document scanning
OCR (Cloud)	Amazon Textract	High-accuracy document extraction
AI (Local)	Ollama (`llama3`) or LM Studio	Local LLM inference
AI (Cloud)	Amazon Bedrock (Claude / Llama / Titan)	Cloud LLM inference
Translation	Amazon Translate + static i18n bundles	Dynamic & static translations
Offline	IndexedDB (`idb`), Workbox, `vite-plugin-pwa`	PWA offline support
Backend	Express.js, Node.js	REST API server
Database (Local)	SQLite (`better-sqlite3`)	Local persistence
Database (Cloud)	Amazon DynamoDB	Scalable cloud persistence
Storage (Cloud)	Amazon S3	Document & file storage
PDF	jsPDF	Client-side PDF generation
Build	Vite 6, `tsx`	Dev server & TypeScript execution
Icons	Lucide React	30+ accessible icons

Project Structure

sarthi-ai/
├── index.html                      # Entry HTML with PWA meta tags & skip link
├── vite.config.ts                  # Vite + Tailwind + PWA + Workbox config
├── package.json                    # Dependencies & scripts
├── tsconfig.json                   # TypeScript config (client)
├── tsconfig.server.json            # TypeScript config (server)
├── metadata.json                   # App metadata (name, permissions)
│
├── public/icons/                   # PWA icons (192px, 512px)
├── data/                           # Runtime data (auto-created)
│   ├── sarthi.db                   # SQLite database
│   └── uploads/                    # Local file storage fallback
│
├── server/                         # ── Express Backend ──────────────
│   ├── index.ts                    # Server entry, middleware, route mounting
│   ├── db.ts                       # SQLite schema, prepared statements, CRUD
│   ├── dbProvider.ts               # Unified DB interface (SQLite ↔ DynamoDB)
│   │
│   ├── ai/
│   │   ├── grounding.ts            # 7+ RAG knowledge documents (verified text)
│   │   └── ollama.ts               # Ollama / LM Studio client + Bedrock delegation
│   │
│   ├── aws/
│   │   ├── config.ts               # Central AWS config, feature flags, client factory
│   │   ├── bedrock.ts              # Bedrock AI (Claude/Llama/Titan) — chat + streaming
│   │   ├── dynamodb.ts             # DynamoDB sessions & submissions tables
│   │   ├── s3.ts                   # S3 file upload, download, presigned URLs
│   │   ├── polly.ts                # Polly TTS — neural voice synthesis
│   │   ├── textract.ts             # Textract OCR — document text extraction
│   │   ├── transcribe.ts           # Transcribe STT — speech recognition
│   │   └── translate.ts            # Translate — dynamic text translation
│   │
│   └── routes/
│       ├── sessions.ts             # Session CRUD endpoints
│       ├── submissions.ts          # Submission endpoints
│       ├── sync.ts                 # Bulk offline sync endpoint
│       ├── chat.ts                 # AI chat (non-streaming + SSE streaming)
│       ├── ocr.ts                  # Textract OCR endpoints
│       ├── speech.ts               # Polly TTS + Transcribe STT endpoints
│       ├── translate.ts            # Translate endpoints (single + batch)
│       └── storage.ts              # S3 / local file storage endpoints
│
└── src/                            # ── React Frontend ───────────────
    ├── main.tsx                    # App entry, SW registration, sync init
    ├── App.tsx                     # State-driven page router (lazy-loaded)
    ├── index.css                   # Global styles + Tailwind directives
    │
    ├── api/
    │   └── client.ts               # Typed fetch wrapper for all API endpoints
    │
    ├── components/
    │   ├── Layout.tsx              # App shell: top nav + bottom nav + sidebar
    │   ├── Navigation.tsx          # Top bar + bottom tab bar
    │   ├── ContextPanel.tsx        # Desktop sidebar (steps) + mobile slide menu
    │   ├── AIAssistant.tsx         # Floating AI chat panel with streaming
    │   ├── OfflineBanner.tsx       # Connectivity status banners (amber/blue/green)
    │   ├── WelcomeModal.tsx        # First-visit language picker modal
    │   └── UIElements.tsx          # Shared buttons, cards, badges, inputs
    │
    ├── context/
    │   └── FormContext.tsx          # Single React Context for all app state
    │
    ├── pages/
    │   ├── Home.tsx                # Voice-first home screen with large mic button
    │   ├── Forms.tsx               # Form catalog with search
    │   ├── FormInfo.tsx            # Form details (purpose, eligibility, docs)
    │   ├── FieldInput.tsx          # Field-by-field input with voice & OCR
    │   ├── Review.tsx              # Review all answers before submission
    │   ├── Activity.tsx            # Session history with filters
    │   ├── Help.tsx                # FAQ and usage help
    │   ├── Settings.tsx            # Language, theme, voice, text size controls
    │   └── Scan.tsx                # Document scanning UI
    │
    ├── forms/
    │   ├── schema.ts               # FormSchema, FormField, validation type defs
    │   ├── registry.ts             # Central form registry (register + lookup)
    │   ├── validation.ts           # Field validation logic
    │   ├── index.ts                # Public API re-exports
    │   ├── definitions/            # Form schemas (one file per form)
    │   │   ├── pension.ts          # Old Age Pension (IGNOAPS)
    │   │   ├── housing.ts          # Housing Subsidy (PMAY)
    │   │   ├── rationCard.ts       # Ration Card (NFSA)
    │   │   ├── casteCertificate.ts # Caste Certificate
    │   │   ├── incomeCertificate.ts# Income Certificate
    │   │   ├── kisanSammanNidhi.ts # PM-KISAN
    │   │   └── locationFields.ts   # Shared location fields (state, district, etc.)
    │   └── i18n/                   # Per-form translations (7 languages × 6 forms)
    │       ├── index.ts            # Translation loader
    │       ├── types.ts            # Translation type definitions
    │       ├── pension/            # hi, mr, ta, te, bn, gu, kn
    │       ├── housing/
    │       ├── rationCard/
    │       ├── casteCertificate/
    │       ├── incomeCertificate/
    │       └── kisanSammanNidhi/
    │
    ├── hooks/
    │   ├── useAIChat.ts            # AI conversation state + streaming
    │   ├── useSpeechRecognition.ts # Web Speech STT hook
    │   ├── useSpeechSynthesis.ts   # Web Speech TTS hook + Polly integration
    │   ├── useOCR.ts               # Tesseract.js OCR + Textract fallback
    │   ├── useNetworkStatus.ts     # Online/offline + connection quality detection
    │   ├── useResponsiveLayout.ts  # Breakpoint detection
    │   └── useTranslatedSchema.ts  # FormSchema i18n translation hook
    │
    ├── i18n/                       # App-level UI translations
    │   ├── index.ts                # translate() function
    │   ├── en.ts, hi.ts, mr.ts     # 8 language files
    │   ├── ta.ts, te.ts, bn.ts
    │   └── gu.ts, kn.ts
    │
    ├── offline/
    │   ├── db.ts                   # IndexedDB schema (sessions, submissions, syncQueue)
    │   ├── offlineApi.ts           # Offline-first API wrapper
    │   ├── syncManager.ts          # Queue drain + reconciliation + backoff
    │   ├── schemaCache.ts          # Cache form schemas offline
    │   └── index.ts                # Re-exports
    │
    ├── utils/
    │   ├── aadhaarVerhoeff.ts      # Aadhaar 12-digit Verhoeff checksum validation
    │   ├── announceToScreenReader.ts # ARIA live-region announcements
    │   ├── detectLanguage.ts       # Browser language auto-detection
    │   ├── generatePdf.ts          # Client-side PDF generation (jsPDF)
    │   ├── localeFormat.ts         # Indian number/date formatting (lakhs/crores)
    │   └── trackingId.ts           # Unique tracking ID generator
    │
    └── voice/
        └── language.ts             # BCP-47 locale mappings, voice picker

Forms System

Adding a New Form

Create a schema in src/forms/definitions/myForm.ts:

import type { FormSchema } from '../schema';

export const myForm: FormSchema = {
  id: 'my-form',
  title: 'My Government Form',
  description: 'Short description',
  icon: 'FileText',               // Lucide icon name
  iconBgColor: 'bg-teal-50',
  iconColor: 'text-teal-600',
  purpose: 'What this form is for...',
  eligibility: ['Criterion 1', 'Criterion 2'],
  requiredDocuments: ['Aadhaar Card', 'Bank Passbook'],
  estimatedTime: '5–10 minutes',
  fields: [
    {
      key: 'fullName',
      label: 'Full Name',
      type: 'text',
      placeholder: 'e.g., Ramesh Kumar',
      helpText: 'Enter your name as it appears on your Aadhaar.',
      validation: { required: true, minLength: 2, maxLength: 100 },
      group: 'Personal Details',
    },
    // ... more fields
  ],
};

Register it in src/forms/registry.ts:

import { myForm } from './definitions/myForm';
register(myForm);

(Optional) Add translations in src/forms/i18n/myForm/hi.ts, ta.ts, etc.
(Optional) Add grounding documents in server/ai/grounding.ts for AI assistance.

That's it. The form appears in the catalog, renders field-by-field, validates, saves sessions, and generates PDFs — all automatically.

AI Safety & Grounding

Safeguard	Implementation
Grounding-only responses	9-rule system prompt; RAG retrieval required before any generation
Hallucination tagging	Every response marked Verified (shield icon) or Unverified (warning icon)
Refusal on uncertainty	7 refusal-phrase heuristics; refuses rather than guessing
No authority claims	System prompt prohibits "I guarantee", "I confirm", "I certify"
Scam protection	Detects payment-related queries — warns: "Government form filing is free"
Local processing option	Ollama/LM Studio run locally; no user data leaves the device
Input limits	1000-char message limit, 10-message history cap, 512-token response cap
No auto-submission	User must review every field and explicitly confirm before submission
Rate limiting	15 req/min for AI chat, 100 req/min for general API

Offline & Sync Architecture

              ONLINE                              OFFLINE
    ┌──────────────────────┐           ┌──────────────────────┐
    │ offlineApi.ts        │           │ offlineApi.ts        │
    │                      │           │                      │
    │ 1. Call real API     │           │ 1. Read from IDB     │
    │ 2. Cache to IndexedDB│           │ 2. Enqueue write to  │
    │ 3. Return response   │           │    syncQueue          │
    └──────────┬───────────┘           └──────────────────────┘
               │                                  │
               │          ┌───────────┐           │
               │          │  RECONNECT │◀──────────┘
               │          └─────┬─────┘
               │                │
               │    ┌───────────▼──────────┐
               │    │   syncManager.ts     │
               │    │                      │
               │    │ 1. Drain queue FIFO  │
               │    │ 2. POST /api/sync    │
               │    │    (batched actions) │
               │    │ 3. Reconcile server  │
               │    │    state with local  │
               │    │ 4. Exponential       │
               │    │    backoff on fail   │
               │    │    (1s→2s→4s→…→30s)  │
               │    │ 5. Max 5 retries     │
               │    └──────────────────────┘
               │
    ┌──────────▼───────────┐
    │   IndexedDB Stores   │
    │                      │
    │  sessions            │  ← form-filling progress
    │  submissions         │  ← completed forms
    │  syncQueue           │  ← pending API calls
    └──────────────────────┘

Data persistence layers: localStorage (settings) → IndexedDB (sessions/submissions/queue) → Server (SQLite/DynamoDB).

Conflict resolution: Server wins unless the local record has a newer updatedAt timestamp.

Accessibility

Feature	Implementation
Voice as primary input	Large mic button (192px Home, 128px fields) with animated ripple
Text size control	Small / Medium / Large, persisted in localStorage
Dark mode	System preference detection + manual toggle
Volume & speed controls	Adjustable TTS rate (0.5×–2×) with test button
Field help tooltips	Every field has a plain-language explanation
Large touch targets	All interactive elements ≥ 48px, most 64–80px
Press feedback	`active:scale-95` on all buttons
Step progress	Dot indicators, text percentage, animated progress bar
Auto-save	Tab switching during form filling auto-saves
ARIA attributes	`aria-label`, `aria-live="polite"`, `role="status"`
Skip link	"Skip to main content" link in HTML
Screen reader	`announceToScreenReader()` utility for dynamic announcements

API Reference

Core APIs

Method	Path	Purpose
`GET`	`/api/sessions`	List all sessions
`GET`	`/api/sessions/:id`	Get a single session
`POST`	`/api/sessions`	Create a new session
`PUT`	`/api/sessions/:id`	Update a session
`DELETE`	`/api/sessions/:id`	Delete a session
`GET`	`/api/submissions`	List all submissions
`GET`	`/api/submissions/:id`	Get a single submission
`POST`	`/api/submissions`	Create a submission
`POST`	`/api/sync`	Bulk sync (batch actions + full state return)
`POST`	`/api/chat`	AI chat (grounded, multi-turn)
`POST`	`/api/chat/stream`	AI chat with SSE streaming
`GET`	`/api/health`	Health check (includes AWS status)

AWS-Powered APIs

Method	Path	Purpose	AWS Service
`POST`	`/api/ocr`	Document text detection	Textract
`POST`	`/api/ocr/analyze`	Document analysis (FORMS)	Textract
`POST`	`/api/speech/synthesize`	Text-to-speech (MP3)	Polly
`POST`	`/api/speech/transcribe`	Speech-to-text	Transcribe
`POST`	`/api/translate`	Single text translation	Translate
`POST`	`/api/translate/batch`	Batch translation (≤25)	Translate
`POST`	`/api/storage/upload`	File upload	S3
`GET`	`/api/storage/download/*`	File download / presigned URL	S3
`GET`	`/api/storage/list`	List files by prefix	S3
`DELETE`	`/api/storage/*`	Delete file	S3

Rate Limits

Endpoint	Limit
General API (`/api/*`)	100 requests/minute
AI Chat (`/api/chat`)	15 requests/minute

How to Run

Prerequisites

Node.js 18+
AI Model (one of):
- Ollama with llama3 pulled — ollama pull llama3
- LM Studio with a model loaded and local server running
- AWS account with Bedrock access (if using cloud AI)

Quick Start (Local Mode)

# Clone the repository
git clone <repo-url> && cd sarthi-ai

# Install dependencies
npm install

# Start both frontend (port 3000) and backend (port 3001)
npm run dev:all

Quick Start (AWS Mode)

# Set environment variables
export USE_AWS=true
export AWS_REGION=ap-south-1
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AI_PROVIDER=bedrock

# Start
npm run dev:all

All Commands

Command	Description
`npm run dev`	Vite dev server on port 3000
`npm run server`	Express API on port 3001
`npm run dev:all`	Both frontend + backend concurrently
`npm run build`	Production build
`npm run start`	Build + serve production
`npm run lint`	TypeScript type check
`npm run clean`	Remove `dist/` folder

Environment Variables

Variable	Default	Description
`USE_AWS`	`false`	Master switch for all 7 AWS services
`AWS_REGION`	`ap-south-1`	AWS region
`AWS_ACCESS_KEY_ID`	—	IAM access key (optional on EC2/ECS/Lambda)
`AWS_SECRET_ACCESS_KEY`	—	IAM secret key
`AI_PROVIDER`	`ollama`	`ollama`, `lmstudio`, or `bedrock`
`OLLAMA_URL`	`http://localhost:11434`	Ollama server URL
`OLLAMA_MODEL`	`llama3`	Ollama model name
`LMSTUDIO_URL`	`http://localhost:1234`	LM Studio server URL
`LMSTUDIO_MODEL`	`gemma-3-4b`	LM Studio model name
`BEDROCK_MODEL`	`anthropic.claude-3-haiku-20240307-v1:0`	Bedrock model ID
`DB_PROVIDER`	—	Set to `sqlite` to force local DB even with AWS
`DYNAMO_TABLE_PREFIX`	`SarthiAI`	DynamoDB table name prefix
`S3_BUCKET`	`sarthi-ai-documents`	S3 bucket name
`SERVER_PORT`	`3001`	Express server port

Limitations & Disclaimer

Limitations

Prototype / hackathon project — not production-hardened.
6 demo forms. New forms require adding a definition + translations.
No user accounts or authentication.
AI assistant requires a running LLM (local or Bedrock); unavailable offline.
Speech recognition browser support varies (Chrome/Edge recommended; limited on Firefox/Safari).
Does NOT submit forms to any government system — users must submit via official channels.

Disclaimer

SarthiAI is a prototype project. It does not replace official guidance from government agencies. The tool does NOT submit forms automatically — users retain full control. The AI assistant refuses to answer when reliable information cannot be found in verified documents. No middleman fee is required for any government form filing.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.kiro/specs/task-management		.kiro/specs/task-management
api		api
public		public
server		server
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
index.html		index.html
metadata.json		metadata.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.server.json		tsconfig.server.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

SarthiAI

Table of Contents

Problem Statement

Solution

Key Features

Voice-First Interaction

8-Language Support

Schema-Driven Forms

Document Scanning (OCR)

Document-Grounded AI Assistant

Offline & Low-Connectivity

PDF Generation

Architecture Overview

Key Design Decisions

User Flow

AWS Service Integration

Architecture Decision: Local-First with AWS Upgrade

1. Amazon Bedrock — AI/LLM

2. Amazon DynamoDB — Database

3. Amazon S3 — Storage

4. Amazon Textract — OCR

5. Amazon Polly — Text-to-Speech

6. Amazon Transcribe — Speech-to-Text

7. Amazon Translate — Translation

AWS Configuration & Fallback Strategy

Tech Stack

Project Structure

Forms System

Adding a New Form

AI Safety & Grounding

Offline & Sync Architecture

Accessibility

API Reference

Core APIs

AWS-Powered APIs

Rate Limits

How to Run

Prerequisites

Quick Start (Local Mode)

Quick Start (AWS Mode)

All Commands

Environment Variables

Limitations & Disclaimer

Limitations

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages