Skip to content

CodeMaster11000/SarthiAi

Repository files navigation

SarthiAI Logo

SarthiAI

Voice-first, offline-capable form assistant that helps Indian citizens understand and fill government forms in 8 languages — powered by document-grounded AI, AWS cloud services, and accessibility-first design.

React 19 TypeScript Tailwind AWS PWA Languages


Table of Contents


Problem Statement

Government and public-service forms in India are difficult for millions of citizens due to:

Barrier Who it affects
Complex legal language on forms Low-literacy and first-time users
English-only interfaces 900M+ non-English speakers
Small text & dense layouts Elderly citizens, vision-impaired users
No offline access Rural users with intermittent connectivity
AI tools that hallucinate Users who trust incorrect information
Middlemen charging fees Economically vulnerable populations

These barriers disproportionately affect rural users, elderly citizens, differently-abled individuals, and first-time digital users — the very populations these government schemes are designed to help.


Solution

SarthiAI is a voice-first Progressive Web App (PWA) that walks users through government forms field-by-field using speech. An AI assistant explains each field in simple, regional language — grounded entirely in verified documents so it never fabricates information.

Users dictate answers, hear them read back, scan documents to auto-fill fields, and review everything before confirming. The app works fully offline and syncs when connectivity returns. When deployed with AWS, it gains cloud-grade OCR, speech services, translation, and scalable storage.


Key Features

Voice-First Interaction

  • Speech-to-Text — dictate answers via Web Speech Recognition API or AWS Transcribe. Supports continuous and single-utterance modes with real-time interim transcripts.
  • Text-to-Speech — the app speaks field labels, help text, and AI responses via Web Speech Synthesis API or Amazon Polly (natural neural voices).
  • Voice states — clear animated UI states: Idle → Listening (ripple) → Thinking (spinner) → Speaking (pulse) → Error.

8-Language Support

Full UI and form-content translations in English, Hindi (हिन्दी), Marathi (मराठी), Tamil (தமிழ்), Telugu (తెలుగు), Bengali (বাংলা), Gujarati (ગુજરાતી), Kannada (ಕನ್ನಡ).

  • Auto-detects browser language on first visit.
  • Every field label, placeholder, help text, option, and validation message is translated.
  • Locale-aware date/number formatting (Indian numbering: lakhs/crores).
  • Dynamic translation powered by Amazon Translate when AWS is enabled.

Schema-Driven Forms

Forms are defined once as typed TypeScript schemas and rendered dynamically:

Form Description Fields
Old Age Pension (IGNOAPS) Monthly pension for senior citizens 14
Housing Subsidy (PMAY) Housing for All scheme assistance 17
Ration Card (NFSA) Subsidised food grains entitlement 15+
Caste Certificate SC/ST/OBC category proof 12+
Income Certificate Family income verification 12+
PM-KISAN ₹6,000/year farmer income support 14+

Each form includes eligibility criteria, required documents list, purpose description, and estimated completion time. New forms are added by creating a definition file and registering it — zero UI code changes needed.

Supported field types: text, date, select, tel, textarea, number, checkbox, radio — with validation rules (required, min/max length, regex pattern, cross-field validation), conditional visibility, and field grouping.

Document Scanning (OCR)

  • Client-side: Tesseract.js for offline OCR
  • Server-side: Amazon Textract for high-accuracy extraction
  • Auto-detects and extracts: Aadhaar numbers, PAN, dates of birth, phone numbers, IFSC codes, gender, names, PIN codes
  • Scanned data auto-fills matching form fields

Document-Grounded AI Assistant

  • RAG architecture — 7+ embedded knowledge documents covering pension, housing, ration card, caste certificate, income certificate, and PM-KISAN schemes.
  • Strict grounding — 9-rule system prompt prevents hallucination. Every response tagged as Verified or Unverified.
  • Multi-provider support — Ollama (local), LM Studio (local), or Amazon Bedrock (cloud). Supports Claude, Llama, and Titan model families.
  • Streaming responses — tokens appear in real-time via SSE.
  • Context-aware — knows which field the user is filling and provides targeted help.

Offline & Low-Connectivity

  • Full PWA with Service Worker caching (Workbox).
  • IndexedDB local storage (sessions, submissions, sync queue).
  • Incremental sync queue with exponential backoff.
  • Bulk sync endpoint for efficient reconciliation on slow networks.
  • Slow-connection detection (2G/slow-2G) with user-facing banners.
  • Installable to mobile home screens.

PDF Generation

  • Client-side PDF generation via jsPDF (works offline).
  • Includes tracking ID, submission date, all field values, and disclaimer footer.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         Browser (PWA)                               │
│                                                                     │
│   React 19 + Tailwind CSS v4 + Framer Motion                       │
│   ┌────────────┐  ┌────────────┐  ┌───────────────────────────┐    │
│   │ Voice Layer│  │   Form     │  │   Offline Layer           │    │
│   │ STT / TTS  │  │  Context   │  │ IndexedDB + SyncQueue     │    │
│   │ (Browser   │  │ (Single    │  │ ┌─────────┐ ┌──────────┐ │    │
│   │  APIs)     │  │  React     │  │ │sessions │ │syncQueue │ │    │
│   └─────┬──────┘  │  Context)  │  │ └─────────┘ └──────────┘ │    │
│         │         └─────┬──────┘  └────────────┬──────────────┘    │
│         │               │                      │                    │
│   ┌─────┴───────────────┴──────────────────────┴────────┐          │
│   │              offlineApi wrapper                      │          │
│   │    (online → API + cache | offline → IndexedDB)      │          │
│   └─────────────────────┬────────────────────────────────┘          │
│                         │                                           │
│   ┌─────────────────────┴────────────────────────────────┐          │
│   │              Service Worker (Workbox)                 │          │
│   │  Static: CacheFirst | API: NetworkFirst | Chat: NetworkOnly     │
│   └─────────────────────┬────────────────────────────────┘          │
└─────────────────────────┼───────────────────────────────────────────┘
                          │  /api/*
              ┌───────────▼───────────┐
              │    Express Server     │ (port 3001)
              │    ┌───────────────┐  │
              │    │  DB Provider  │──┼──▶ SQLite (local) OR DynamoDB (AWS)
              │    └───────────────┘  │
              │    ┌───────────────┐  │
              │    │   AI Layer    │──┼──▶ Ollama / LM Studio (local) OR Bedrock (AWS)
              │    │  + Grounding  │  │
              │    └───────────────┘  │
              │    ┌───────────────┐  │
              │    │  AWS Services │──┼──▶ S3 · Textract · Polly · Transcribe · Translate
              │    └───────────────┘  │
              └───────────────────────┘

Key Design Decisions

Decision Rationale
State-driven routing (no router library) formPhase × activeTab keeps navigation simple. No URL complexity for target users.
Single React Context for all state Avoids external state management complexity. Single source of truth.
Offline-first API wrapper Every call goes through offlineApi which saves to IndexedDB first, then syncs.
Local-first + AWS upgrade path Runs fully offline with local tools. AWS services layer on when USE_AWS=true.
Schema-driven form rendering Adding new forms requires zero UI changes — just a schema file + registration.
Document-grounded AI only 9-rule system prompt ensures no hallucination. Refuses when unsure.

User Flow

┌──────────────┐     ┌───────────────┐     ┌─────────────────┐
│  1. Welcome  │────▶│  2. Language   │────▶│  3. Home        │
│  (First Visit)│     │  Selection    │     │  (Voice CTA)    │
└──────────────┘     └───────────────┘     └────────┬────────┘
                                                     │
                          ┌──────────────────────────┤
                          ▼                          ▼
                   ┌─────────────┐          ┌──────────────┐
                   │ 4a. Browse  │          │ 4b. Speak to │
                   │   Forms Tab │          │   AI on Home │
                   └──────┬──────┘          └──────────────┘
                          │
                          ▼
                   ┌──────────────┐
                   │ 5. Form Info │
                   │ (Purpose,    │
                   │  Eligibility,│
                   │  Documents)  │
                   └──────┬───────┘
                          │ "Start Filling"
                          ▼
                   ┌──────────────┐     ┌──────────────┐
                   │ 6. Field-by- │────▶│  7. Scan     │
                   │ Field Input  │◀────│  Document    │
                   │ (Voice/Type) │     │  (OCR)       │
                   │              │     └──────────────┘
                   │  ┌────────┐  │
                   │  │AI Help │  │     ← Floating AI chat panel
                   │  │ Panel  │  │       (grounded answers)
                   │  └────────┘  │
                   └──────┬───────┘
                          │ All fields complete
                          ▼
                   ┌──────────────┐
                   │ 8. Review    │
                   │ All Answers  │
                   │ (Edit any)   │
                   └──────┬───────┘
                          │ "Submit"
                          ▼
                   ┌──────────────┐     ┌──────────────┐
                   │ 9. Submitted │────▶│ 10. Download │
                   │ Confirmation │     │  PDF Summary │
                   │ (Tracking ID)│     └──────────────┘
                   └──────────────┘

Step-by-step breakdown:

  1. Welcome Modal — First-time visitors see a language picker. Auto-detects browser language as default.
  2. Language Selection — Choose from 8 languages. All UI, form content, and AI responses switch instantly.
  3. Home Screen — Large microphone button (192px) for voice-first interaction. Users can speak questions or navigate to forms.
  4. Browse Forms — Grid of available government forms with icons, descriptions, and estimated completion times.
  5. Form Info — Before starting, users see: purpose of the form, eligibility criteria, required documents, and estimated time.
  6. Field-by-Field Input — One field at a time with large text, help tooltips, voice dictation, and progress indicators (dots + percentage + animated bar).
  7. Document Scanning — Camera-based OCR scans Aadhaar, PAN, etc. and auto-fills fields. Uses Textract (AWS) or Tesseract.js (offline).
  8. Review — All answers displayed for final verification. Users can tap any field to edit it.
  9. Submission — Confirmation screen with a unique tracking ID. Session marked as completed.
  10. PDF Download — Client-side PDF generated with all form data, tracking ID, and official disclaimer.

Auto-save: Progress is saved automatically when switching tabs or navigating away. Users can resume from the Activity page at any time.


AWS Service Integration

Architecture Decision: Local-First with AWS Upgrade

SarthiAI is designed with a dual-mode architecture — it runs fully offline using local tools (SQLite, Ollama, Tesseract.js, Web Speech APIs), and gains cloud-grade capabilities when AWS is enabled via a single environment variable:

USE_AWS=true   →  All 7 AWS services activate (with automatic local fallbacks on error)
USE_AWS=false  →  Fully local operation (default)

Every AWS service has a local fallback. If a cloud call fails, the system degrades gracefully to the local alternative — the user never sees a broken experience.

┌─────────────────────────────────────────────────────────────────────┐
│                     AWS Service Map                                  │
│                                                                     │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐  │
│  │   Bedrock    │    │  DynamoDB    │    │        S3            │  │
│  │  (AI/LLM)   │    │ (Database)   │    │   (File Storage)     │  │
│  │             │    │              │    │                      │  │
│  │ Claude 3    │    │ Sessions     │    │ Document uploads     │  │
│  │ Llama       │    │ Submissions  │    │ Scanned IDs          │  │
│  │ Titan       │    │ PAY_PER_REQ  │    │ Generated PDFs       │  │
│  └──────┬───────┘    └──────┬───────┘    └──────────┬───────────┘  │
│         │                   │                       │              │
│  ┌──────▼───────┐    ┌──────▼───────┐    ┌──────────▼───────────┐  │
│  │  Textract    │    │   Polly      │    │    Transcribe        │  │
│  │   (OCR)      │    │   (TTS)      │    │     (STT)            │  │
│  │             │    │              │    │                      │  │
│  │ Aadhaar scan │    │ Neural voice │    │ 8 Indian languages   │  │
│  │ PAN scan     │    │ 8 languages  │    │ Streaming support    │  │
│  │ Form fields  │    │ SSML rate    │    │ PCM/OGG input        │  │
│  └──────────────┘    └──────────────┘    └──────────────────────┘  │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                     Translate                                 │   │
│  │                  (Dynamic Translation)                        │   │
│  │                                                               │   │
│  │  Single text + batch (up to 25) · Auto-detect source lang     │   │
│  │  Supports: en, hi, ta, te, mr, bn, gu, kn                    │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

1. Amazon Bedrock — AI/LLM

Purpose: Cloud-hosted foundation model inference for the document-grounded AI assistant, replacing local Ollama/LM Studio.

File: server/aws/bedrock.ts

Aspect Detail
SDK @aws-sdk/client-bedrock-runtime
Operations InvokeModel (non-streaming), InvokeModelWithResponseStream (streaming SSE)
Default Model anthropic.claude-3-haiku-20240307-v1:0
Supported Families Claude (Messages API), Llama (prompt-based), Titan (inputText-based)
Temperature 0.3 (low for factual accuracy)
Max Tokens 512
Local Fallback Ollama (llama3) or LM Studio

How it works:

  1. The chat route checks useBedrock() — if true, delegates to bedrock.ts; otherwise uses ollama.ts.
  2. Knowledge documents are selected based on the active form (getDocumentsForForm(formId)) or all documents.
  3. A system prompt with 9 grounding rules + the full knowledge base is built.
  4. The request body is formatted per model family (Claude Messages API / Llama prompt template / Titan inputText).
  5. The response is parsed per family, tagged with grounding metadata, and checked for refusal phrases.

Streaming flow:

Client POST /api/chat/stream
  → Server builds Bedrock request
  → InvokeModelWithResponseStreamCommand
  → For each chunk: SSE event "token" with partial text
  → Final SSE event "done" with {reply, grounded, sources}

Auto-detection logic:

isClaude(modelId)   modelId.startsWith('anthropic.claude')
isLlama(modelId)    modelId.includes('llama')
isTitan(modelId)    modelId.startsWith('amazon.titan')

2. Amazon DynamoDB — Database

Purpose: Scalable NoSQL database replacing local SQLite for session and submission storage.

File: server/aws/dynamodb.ts

Aspect Detail
SDK @aws-sdk/client-dynamodb + @aws-sdk/lib-dynamodb (DocumentClient)
Tables {prefix}-Sessions, {prefix}-Submissions
Key Schema Single hash key: id (String)
Billing Mode PAY_PER_REQUEST (on-demand, no capacity planning)
Auto-creation Tables are created at server startup if they don't exist
Local Fallback SQLite (better-sqlite3) at data/sarthi.db

Table structure:

{prefix}-Sessions
├── id (PK)        — UUID
├── form_id        — e.g. "pension"
├── form_title     — e.g. "Old Age Pension"
├── status         — "in-progress" | "completed"
├── form_values    — JSON string of all field values
├── current_step   — integer (1-based)
├── created_at     — ISO timestamp
└── updated_at     — ISO timestamp

{prefix}-Submissions
├── id (PK)        — UUID
├── session_id     — linked session (nullable)
├── form_id        — form identifier
├── form_title     — form display name
├── form_values    — JSON string of final values
└── submitted_at   — ISO timestamp

Database Provider Pattern (server/dbProvider.ts):

The provider abstracts the database layer. All route handlers call dbProvider functions — never db.ts directly. The provider checks useDynamoDB() at runtime and delegates accordingly, with automatic fallback:

export async function createSession(data) {
  if (useDynamoDB()) {
    try {
      return await dynamo.createSession(data);  // Cloud
    } catch (err) {
      console.warn('DynamoDB failed, using SQLite');
    }
  }
  return sqliteDb.createSession(data);           // Local fallback
}

3. Amazon S3 — Storage

Purpose: Cloud storage for document uploads (scanned IDs, photos) and generated PDFs.

File: server/aws/s3.ts

Aspect Detail
SDK @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner
Default Bucket sarthi-ai-documents (configurable via S3_BUCKET)
Operations PutObject, GetObject, ListObjectsV2, DeleteObject, HeadBucket, CreateBucket
Presigned URLs 1-hour expiry for secure downloads
Max Upload Size 10 MB
Local Fallback Local filesystem at data/uploads/

API routes (server/routes/storage.ts):

Method Route Description
POST /api/storage/upload Upload file (multipart, ?prefix=sessions/abc)
GET /api/storage/download/* Get presigned URL or direct download
GET /api/storage/list List files by prefix
DELETE /api/storage/* Delete a file

Bootstrap: At server startup, ensureBucket() checks if the bucket exists and creates it if not:

await client.send(new HeadBucketCommand({ Bucket }));
// If 404 → CreateBucketCommand with region constraint

4. Amazon Textract — OCR

Purpose: Server-side document text extraction for scanning Indian government documents (Aadhaar, PAN, etc.).

File: server/aws/textract.ts

Aspect Detail
SDK @aws-sdk/client-textract
Operations DetectDocumentText (basic OCR), AnalyzeDocument (FORMS feature for key-value pairs)
Input Raw image bytes (JPEG, PNG, PDF) — up to 10 MB
Local Fallback Tesseract.js in the browser (src/hooks/useOCR.ts)

Field extraction patterns (regex-based, shared between server and client):

Field Pattern Example
Aadhaar Number \b(\d{4}\s?\d{4}\s?\d{4})\b 1234 5678 9012
PAN Number \b([A-Z]{5}\d{4}[A-Z])\b ABCDE1234F
Date of Birth (\d{2}[\/\-\.]\d{2}[\/\-\.]\d{4}) 15/03/1960
Phone Number \b([6-9]\d{9})\b 9876543210
IFSC Code \b([A-Z]{4}0[A-Za-z0-9]{6})\b SBIN0001234
Gender \b(male|female|पुरुष|महिला)\b Male
PIN Code \b(\d{6})\b 226001

API routes (server/routes/ocr.ts):

Method Route Description
POST /api/ocr Basic text detection
POST /api/ocr/analyze FORMS-based document analysis (richer extraction)

Dual-mode OCR flow:

User takes photo of Aadhaar card
  ├─ AWS enabled  → POST /api/ocr → Textract DetectDocumentText → structured fields
  └─ AWS disabled → Client-side Tesseract.js → regex extraction → auto-fill fields

5. Amazon Polly — Text-to-Speech

Purpose: Server-side neural text-to-speech in Indian languages, replacing browser Speech Synthesis for higher quality output.

File: server/aws/polly.ts

Aspect Detail
SDK @aws-sdk/client-polly
Operation SynthesizeSpeech
Output Format MP3 (audio/mpeg) at 24000 Hz sample rate
Engine Neural (natural-sounding)
Default Voice Kajal (Indian English/Hindi neural voice)
Max Input 3000 characters
Local Fallback Browser Web Speech Synthesis API

Language-to-voice mapping:

Language Voice ID Language Code Engine
English Kajal en-IN Neural
Hindi Kajal hi-IN Neural
Tamil Kajal en-IN Neural
Telugu Kajal en-IN Neural
Marathi Kajal hi-IN Neural
Bengali Kajal en-IN Neural
Gujarati Kajal hi-IN Neural
Kannada Kajal en-IN Neural

SSML support: When speech rate is specified (slow/fast), Polly receives SSML with <prosody rate="..."> tags.

API route: POST /api/speech/synthesize → returns raw MP3 binary with Content-Type: audio/mpeg.


6. Amazon Transcribe — Speech-to-Text

Purpose: Server-side speech recognition in Indian languages, complementing the browser Web Speech Recognition API.

File: server/aws/transcribe.ts

Aspect Detail
SDK @aws-sdk/client-transcribe-streaming
Operation StartStreamTranscription
Input Audio buffer (PCM/WAV or OGG-Opus) up to 5 MB
Default Sample Rate 16000 Hz
Local Fallback Browser Web Speech Recognition API

Language support:

Language Transcribe Code
English en-IN
Hindi hi-IN
Tamil ta-IN
Telugu te-IN
Marathi mr-IN
Bengali bn-IN
Gujarati gu-IN
Kannada kn-IN

API route: POST /api/speech/transcribe (multipart audio upload) → returns { transcript, confidence, provider }.


7. Amazon Translate — Translation

Purpose: Dynamic real-time text translation between SarthiAI's 8 supported languages, supplementing static i18n bundles.

File: server/aws/translate.ts

Aspect Detail
SDK @aws-sdk/client-translate
Operation TranslateText
Auto-detect Source language can be set to "auto"
Batch Parallel TranslateText calls for up to 25 texts
Max Text Length 5000 characters per request
Local Fallback Static i18n translation bundles (pre-built)

API routes (server/routes/translate.ts):

Method Route Description
POST /api/translate Translate single text
POST /api/translate/batch Translate up to 25 texts in parallel

AWS Configuration & Fallback Strategy

Central config: server/aws/config.ts

All AWS services share a common configuration factory:

# Master switch
USE_AWS=true              # Enables all AWS services

# Credentials (optional — SDK falls back to instance roles on EC2/ECS/Lambda)
AWS_REGION=ap-south-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

# Service-specific
BEDROCK_MODEL=anthropic.claude-3-haiku-20240307-v1:0
DYNAMO_TABLE_PREFIX=SarthiAI
S3_BUCKET=sarthi-ai-documents
DB_PROVIDER=dynamodb      # or "sqlite" to force local DB even with USE_AWS=true
AI_PROVIDER=bedrock        # or "ollama" / "lmstudio"

Fallback matrix:

AWS Service Local Fallback Trigger
Bedrock Ollama / LM Studio AI_PROVIDER≠bedrock or Bedrock API error
DynamoDB SQLite (data/sarthi.db) DB_PROVIDER=sqlite or DynamoDB error
S3 Local filesystem (data/uploads/) USE_AWS=false or S3 error
Textract Tesseract.js (client-side) USE_AWS=false
Polly Browser Speech Synthesis USE_AWS=false or Polly error
Transcribe Browser Speech Recognition USE_AWS=false or Transcribe error
Translate Static i18n bundles USE_AWS=false or Translate error

Feature flags (from config.ts):

isAWSEnabled()         // Master switch: USE_AWS=true?
useDynamoDB()          // isAWSEnabled() && DB_PROVIDER !== 'sqlite'
useBedrock()           // isAWSEnabled() && AI_PROVIDER === 'bedrock'
useTextract()          // isAWSEnabled()
usePolly()             // isAWSEnabled()
useTranscribe()        // isAWSEnabled()
useTranslateService()  // isAWSEnabled()

Tech Stack

Layer Technology Purpose
Frontend React 19, TypeScript 5.8 UI framework
Styling Tailwind CSS v4, Framer Motion Responsive design & animations
Voice (Client) Web Speech Recognition API, Web Speech Synthesis API Browser-native speech
Voice (Cloud) Amazon Polly, Amazon Transcribe Neural TTS & multi-language STT
OCR (Client) Tesseract.js Offline document scanning
OCR (Cloud) Amazon Textract High-accuracy document extraction
AI (Local) Ollama (llama3) or LM Studio Local LLM inference
AI (Cloud) Amazon Bedrock (Claude / Llama / Titan) Cloud LLM inference
Translation Amazon Translate + static i18n bundles Dynamic & static translations
Offline IndexedDB (idb), Workbox, vite-plugin-pwa PWA offline support
Backend Express.js, Node.js REST API server
Database (Local) SQLite (better-sqlite3) Local persistence
Database (Cloud) Amazon DynamoDB Scalable cloud persistence
Storage (Cloud) Amazon S3 Document & file storage
PDF jsPDF Client-side PDF generation
Build Vite 6, tsx Dev server & TypeScript execution
Icons Lucide React 30+ accessible icons

Project Structure

sarthi-ai/
├── index.html                      # Entry HTML with PWA meta tags & skip link
├── vite.config.ts                  # Vite + Tailwind + PWA + Workbox config
├── package.json                    # Dependencies & scripts
├── tsconfig.json                   # TypeScript config (client)
├── tsconfig.server.json            # TypeScript config (server)
├── metadata.json                   # App metadata (name, permissions)
│
├── public/icons/                   # PWA icons (192px, 512px)
├── data/                           # Runtime data (auto-created)
│   ├── sarthi.db                   # SQLite database
│   └── uploads/                    # Local file storage fallback
│
├── server/                         # ── Express Backend ──────────────
│   ├── index.ts                    # Server entry, middleware, route mounting
│   ├── db.ts                       # SQLite schema, prepared statements, CRUD
│   ├── dbProvider.ts               # Unified DB interface (SQLite ↔ DynamoDB)
│   │
│   ├── ai/
│   │   ├── grounding.ts            # 7+ RAG knowledge documents (verified text)
│   │   └── ollama.ts               # Ollama / LM Studio client + Bedrock delegation
│   │
│   ├── aws/
│   │   ├── config.ts               # Central AWS config, feature flags, client factory
│   │   ├── bedrock.ts              # Bedrock AI (Claude/Llama/Titan) — chat + streaming
│   │   ├── dynamodb.ts             # DynamoDB sessions & submissions tables
│   │   ├── s3.ts                   # S3 file upload, download, presigned URLs
│   │   ├── polly.ts                # Polly TTS — neural voice synthesis
│   │   ├── textract.ts             # Textract OCR — document text extraction
│   │   ├── transcribe.ts           # Transcribe STT — speech recognition
│   │   └── translate.ts            # Translate — dynamic text translation
│   │
│   └── routes/
│       ├── sessions.ts             # Session CRUD endpoints
│       ├── submissions.ts          # Submission endpoints
│       ├── sync.ts                 # Bulk offline sync endpoint
│       ├── chat.ts                 # AI chat (non-streaming + SSE streaming)
│       ├── ocr.ts                  # Textract OCR endpoints
│       ├── speech.ts               # Polly TTS + Transcribe STT endpoints
│       ├── translate.ts            # Translate endpoints (single + batch)
│       └── storage.ts              # S3 / local file storage endpoints
│
└── src/                            # ── React Frontend ───────────────
    ├── main.tsx                    # App entry, SW registration, sync init
    ├── App.tsx                     # State-driven page router (lazy-loaded)
    ├── index.css                   # Global styles + Tailwind directives
    │
    ├── api/
    │   └── client.ts               # Typed fetch wrapper for all API endpoints
    │
    ├── components/
    │   ├── Layout.tsx              # App shell: top nav + bottom nav + sidebar
    │   ├── Navigation.tsx          # Top bar + bottom tab bar
    │   ├── ContextPanel.tsx        # Desktop sidebar (steps) + mobile slide menu
    │   ├── AIAssistant.tsx         # Floating AI chat panel with streaming
    │   ├── OfflineBanner.tsx       # Connectivity status banners (amber/blue/green)
    │   ├── WelcomeModal.tsx        # First-visit language picker modal
    │   └── UIElements.tsx          # Shared buttons, cards, badges, inputs
    │
    ├── context/
    │   └── FormContext.tsx          # Single React Context for all app state
    │
    ├── pages/
    │   ├── Home.tsx                # Voice-first home screen with large mic button
    │   ├── Forms.tsx               # Form catalog with search
    │   ├── FormInfo.tsx            # Form details (purpose, eligibility, docs)
    │   ├── FieldInput.tsx          # Field-by-field input with voice & OCR
    │   ├── Review.tsx              # Review all answers before submission
    │   ├── Activity.tsx            # Session history with filters
    │   ├── Help.tsx                # FAQ and usage help
    │   ├── Settings.tsx            # Language, theme, voice, text size controls
    │   └── Scan.tsx                # Document scanning UI
    │
    ├── forms/
    │   ├── schema.ts               # FormSchema, FormField, validation type defs
    │   ├── registry.ts             # Central form registry (register + lookup)
    │   ├── validation.ts           # Field validation logic
    │   ├── index.ts                # Public API re-exports
    │   ├── definitions/            # Form schemas (one file per form)
    │   │   ├── pension.ts          # Old Age Pension (IGNOAPS)
    │   │   ├── housing.ts          # Housing Subsidy (PMAY)
    │   │   ├── rationCard.ts       # Ration Card (NFSA)
    │   │   ├── casteCertificate.ts # Caste Certificate
    │   │   ├── incomeCertificate.ts# Income Certificate
    │   │   ├── kisanSammanNidhi.ts # PM-KISAN
    │   │   └── locationFields.ts   # Shared location fields (state, district, etc.)
    │   └── i18n/                   # Per-form translations (7 languages × 6 forms)
    │       ├── index.ts            # Translation loader
    │       ├── types.ts            # Translation type definitions
    │       ├── pension/            # hi, mr, ta, te, bn, gu, kn
    │       ├── housing/
    │       ├── rationCard/
    │       ├── casteCertificate/
    │       ├── incomeCertificate/
    │       └── kisanSammanNidhi/
    │
    ├── hooks/
    │   ├── useAIChat.ts            # AI conversation state + streaming
    │   ├── useSpeechRecognition.ts # Web Speech STT hook
    │   ├── useSpeechSynthesis.ts   # Web Speech TTS hook + Polly integration
    │   ├── useOCR.ts               # Tesseract.js OCR + Textract fallback
    │   ├── useNetworkStatus.ts     # Online/offline + connection quality detection
    │   ├── useResponsiveLayout.ts  # Breakpoint detection
    │   └── useTranslatedSchema.ts  # FormSchema i18n translation hook
    │
    ├── i18n/                       # App-level UI translations
    │   ├── index.ts                # translate() function
    │   ├── en.ts, hi.ts, mr.ts     # 8 language files
    │   ├── ta.ts, te.ts, bn.ts
    │   └── gu.ts, kn.ts
    │
    ├── offline/
    │   ├── db.ts                   # IndexedDB schema (sessions, submissions, syncQueue)
    │   ├── offlineApi.ts           # Offline-first API wrapper
    │   ├── syncManager.ts          # Queue drain + reconciliation + backoff
    │   ├── schemaCache.ts          # Cache form schemas offline
    │   └── index.ts                # Re-exports
    │
    ├── utils/
    │   ├── aadhaarVerhoeff.ts      # Aadhaar 12-digit Verhoeff checksum validation
    │   ├── announceToScreenReader.ts # ARIA live-region announcements
    │   ├── detectLanguage.ts       # Browser language auto-detection
    │   ├── generatePdf.ts          # Client-side PDF generation (jsPDF)
    │   ├── localeFormat.ts         # Indian number/date formatting (lakhs/crores)
    │   └── trackingId.ts           # Unique tracking ID generator
    │
    └── voice/
        └── language.ts             # BCP-47 locale mappings, voice picker

Forms System

Adding a New Form

  1. Create a schema in src/forms/definitions/myForm.ts:
import type { FormSchema } from '../schema';

export const myForm: FormSchema = {
  id: 'my-form',
  title: 'My Government Form',
  description: 'Short description',
  icon: 'FileText',               // Lucide icon name
  iconBgColor: 'bg-teal-50',
  iconColor: 'text-teal-600',
  purpose: 'What this form is for...',
  eligibility: ['Criterion 1', 'Criterion 2'],
  requiredDocuments: ['Aadhaar Card', 'Bank Passbook'],
  estimatedTime: '5–10 minutes',
  fields: [
    {
      key: 'fullName',
      label: 'Full Name',
      type: 'text',
      placeholder: 'e.g., Ramesh Kumar',
      helpText: 'Enter your name as it appears on your Aadhaar.',
      validation: { required: true, minLength: 2, maxLength: 100 },
      group: 'Personal Details',
    },
    // ... more fields
  ],
};
  1. Register it in src/forms/registry.ts:
import { myForm } from './definitions/myForm';
register(myForm);
  1. (Optional) Add translations in src/forms/i18n/myForm/hi.ts, ta.ts, etc.

  2. (Optional) Add grounding documents in server/ai/grounding.ts for AI assistance.

That's it. The form appears in the catalog, renders field-by-field, validates, saves sessions, and generates PDFs — all automatically.


AI Safety & Grounding

Safeguard Implementation
Grounding-only responses 9-rule system prompt; RAG retrieval required before any generation
Hallucination tagging Every response marked Verified (shield icon) or Unverified (warning icon)
Refusal on uncertainty 7 refusal-phrase heuristics; refuses rather than guessing
No authority claims System prompt prohibits "I guarantee", "I confirm", "I certify"
Scam protection Detects payment-related queries — warns: "Government form filing is free"
Local processing option Ollama/LM Studio run locally; no user data leaves the device
Input limits 1000-char message limit, 10-message history cap, 512-token response cap
No auto-submission User must review every field and explicitly confirm before submission
Rate limiting 15 req/min for AI chat, 100 req/min for general API

Offline & Sync Architecture

              ONLINE                              OFFLINE
    ┌──────────────────────┐           ┌──────────────────────┐
    │ offlineApi.ts        │           │ offlineApi.ts        │
    │                      │           │                      │
    │ 1. Call real API     │           │ 1. Read from IDB     │
    │ 2. Cache to IndexedDB│           │ 2. Enqueue write to  │
    │ 3. Return response   │           │    syncQueue          │
    └──────────┬───────────┘           └──────────────────────┘
               │                                  │
               │          ┌───────────┐           │
               │          │  RECONNECT │◀──────────┘
               │          └─────┬─────┘
               │                │
               │    ┌───────────▼──────────┐
               │    │   syncManager.ts     │
               │    │                      │
               │    │ 1. Drain queue FIFO  │
               │    │ 2. POST /api/sync    │
               │    │    (batched actions) │
               │    │ 3. Reconcile server  │
               │    │    state with local  │
               │    │ 4. Exponential       │
               │    │    backoff on fail   │
               │    │    (1s→2s→4s→…→30s)  │
               │    │ 5. Max 5 retries     │
               │    └──────────────────────┘
               │
    ┌──────────▼───────────┐
    │   IndexedDB Stores   │
    │                      │
    │  sessions            │  ← form-filling progress
    │  submissions         │  ← completed forms
    │  syncQueue           │  ← pending API calls
    └──────────────────────┘

Data persistence layers: localStorage (settings) → IndexedDB (sessions/submissions/queue) → Server (SQLite/DynamoDB).

Conflict resolution: Server wins unless the local record has a newer updatedAt timestamp.


Accessibility

Feature Implementation
Voice as primary input Large mic button (192px Home, 128px fields) with animated ripple
Text size control Small / Medium / Large, persisted in localStorage
Dark mode System preference detection + manual toggle
Volume & speed controls Adjustable TTS rate (0.5×–2×) with test button
Field help tooltips Every field has a plain-language explanation
Large touch targets All interactive elements ≥ 48px, most 64–80px
Press feedback active:scale-95 on all buttons
Step progress Dot indicators, text percentage, animated progress bar
Auto-save Tab switching during form filling auto-saves
ARIA attributes aria-label, aria-live="polite", role="status"
Skip link "Skip to main content" link in HTML
Screen reader announceToScreenReader() utility for dynamic announcements

API Reference

Core APIs

Method Path Purpose
GET /api/sessions List all sessions
GET /api/sessions/:id Get a single session
POST /api/sessions Create a new session
PUT /api/sessions/:id Update a session
DELETE /api/sessions/:id Delete a session
GET /api/submissions List all submissions
GET /api/submissions/:id Get a single submission
POST /api/submissions Create a submission
POST /api/sync Bulk sync (batch actions + full state return)
POST /api/chat AI chat (grounded, multi-turn)
POST /api/chat/stream AI chat with SSE streaming
GET /api/health Health check (includes AWS status)

AWS-Powered APIs

Method Path Purpose AWS Service
POST /api/ocr Document text detection Textract
POST /api/ocr/analyze Document analysis (FORMS) Textract
POST /api/speech/synthesize Text-to-speech (MP3) Polly
POST /api/speech/transcribe Speech-to-text Transcribe
POST /api/translate Single text translation Translate
POST /api/translate/batch Batch translation (≤25) Translate
POST /api/storage/upload File upload S3
GET /api/storage/download/* File download / presigned URL S3
GET /api/storage/list List files by prefix S3
DELETE /api/storage/* Delete file S3

Rate Limits

Endpoint Limit
General API (/api/*) 100 requests/minute
AI Chat (/api/chat) 15 requests/minute

How to Run

Prerequisites

  • Node.js 18+
  • AI Model (one of):
    • Ollama with llama3 pulled — ollama pull llama3
    • LM Studio with a model loaded and local server running
    • AWS account with Bedrock access (if using cloud AI)

Quick Start (Local Mode)

# Clone the repository
git clone <repo-url> && cd sarthi-ai

# Install dependencies
npm install

# Start both frontend (port 3000) and backend (port 3001)
npm run dev:all

Quick Start (AWS Mode)

# Set environment variables
export USE_AWS=true
export AWS_REGION=ap-south-1
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export AI_PROVIDER=bedrock

# Start
npm run dev:all

All Commands

Command Description
npm run dev Vite dev server on port 3000
npm run server Express API on port 3001
npm run dev:all Both frontend + backend concurrently
npm run build Production build
npm run start Build + serve production
npm run lint TypeScript type check
npm run clean Remove dist/ folder

Environment Variables

Variable Default Description
USE_AWS false Master switch for all 7 AWS services
AWS_REGION ap-south-1 AWS region
AWS_ACCESS_KEY_ID IAM access key (optional on EC2/ECS/Lambda)
AWS_SECRET_ACCESS_KEY IAM secret key
AI_PROVIDER ollama ollama, lmstudio, or bedrock
OLLAMA_URL http://localhost:11434 Ollama server URL
OLLAMA_MODEL llama3 Ollama model name
LMSTUDIO_URL http://localhost:1234 LM Studio server URL
LMSTUDIO_MODEL gemma-3-4b LM Studio model name
BEDROCK_MODEL anthropic.claude-3-haiku-20240307-v1:0 Bedrock model ID
DB_PROVIDER Set to sqlite to force local DB even with AWS
DYNAMO_TABLE_PREFIX SarthiAI DynamoDB table name prefix
S3_BUCKET sarthi-ai-documents S3 bucket name
SERVER_PORT 3001 Express server port

Limitations & Disclaimer

Limitations

  • Prototype / hackathon project — not production-hardened.
  • 6 demo forms. New forms require adding a definition + translations.
  • No user accounts or authentication.
  • AI assistant requires a running LLM (local or Bedrock); unavailable offline.
  • Speech recognition browser support varies (Chrome/Edge recommended; limited on Firefox/Safari).
  • Does NOT submit forms to any government system — users must submit via official channels.

Disclaimer

SarthiAI is a prototype project. It does not replace official guidance from government agencies. The tool does NOT submit forms automatically — users retain full control. The AI assistant refuses to answer when reliable information cannot be found in verified documents. No middleman fee is required for any government form filing.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages