A production-style Retrieval-Augmented Generation (RAG) system built with FastAPI + React, designed for intelligent PDF understanding through advanced retrieval engineering.
Unlike simple "upload PDF β ask GPT" projects, OctoVector-AI implements a complete retrieval pipeline with:
- Dense Retrieval
- Sparse Retrieval
- Hybrid Search
- Reciprocal Rank Fusion (RRF)
- Cross Encoder Re-ranking
- Grounded Prompt Generation
- Hallucination Reduction
The system focuses on one principle:
Better retrieval = better answers.
- PDF text extraction
- Text cleaning and normalization
- Semantic chunking
- Sentence-aware segmentation
- SentenceTransformer embeddings
- Precomputed embedding storage
- Efficient lifecycle management
- Dense retrieval using FAISS
- Sparse retrieval using BM25
- Hybrid retrieval pipeline
- Reciprocal Rank Fusion (RRF)
- Cross Encoder reranking
- Heuristic relevance boosting
- Precision-focused retrieval optimization
- Gemini-powered response generation
- Grounded prompt construction
- Context injection
- Hallucination reduction
- FastAPI architecture
- Async APIs
- CORS support
- Logging
- Persistent upload storage
- React upload interface
- Question-answer workflow
- Real-time interaction
βββββββββββββββββββββββββββ
β React Frontend β
β Upload + Ask Questions β
ββββββββββββββ¬βββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β FastAPI Backend β
ββββββββββββββ¬βββββββββββββ
β
βββββββββββββββββββββββββ΄βββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β /upload API β β /query API β
ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββββββ
β PDF Ingestion β β Query Processing β
β Cleaning β β Hybrid Retrieval β
β Chunking β β Reranking β
ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββββββ
β Embedding Model β β Context Selection β
β SentenceTransformβ ββββββββββ¬ββββββββββββββ
ββββββββββ¬ββββββββββ β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββββββ
β FAISS Vector DB β β Prompt Construction β
ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββββββ
β Stored Embeddingsβ β Gemini Generation β
ββββββββββββββββββββ ββββββββββ¬ββββββββββββββ
βΌ
ββββββββββββββββββββββββ
β Grounded Response β
ββββββββββββββββββββββββ
POST /uploadHandles:
- PDF upload
- Text extraction
- Cleaning
- Semantic chunking
- Embedding generation
- Vector preparation
POST /queryHandles:
- Hybrid retrieval
- Reranking
- Context selection
- Prompt construction
- Gemini generation
- Answer delivery
PDF Upload
β
Disk Persistence
β
Text Extraction
β
Cleaning
β
Chunking
β
Embedding Generation
β
Store Chunks + Embeddings
User Question
β
retrieve_chunks()
β
Hybrid Retrieval
β
RRF Fusion
β
Cross Encoder Reranking
β
generate_response()
β
Gemini Answer
Responsible for:
- PDF extraction
- Normalization
- Semantic chunking
Responsible for:
- Dense retrieval
- Sparse retrieval
- FAISS search
- BM25 search
Responsible for:
- RRF Fusion
- Cross Encoder reranking
- Heuristic boosting
Responsible for:
- Prompt construction
- Citation-aware context injection
- Hallucination reduction
Responsible for:
- Gemini interaction
- Response synthesis
Responsible for:
- API orchestration
- Frontend integration
- Lifecycle management
Combines:
- Semantic search
- Lexical search
- Retrieval fusion
Provides significantly better results than standalone retrieval approaches.
Improves ranking quality after retrieval and increases answer relevance.
More context-preserving than fixed-size chunking.
Advanced retrieval engineering technique for combining ranking systems.
Reduces hallucinations and improves answer reliability.
- FastAPI
- Python
- SentenceTransformers
- FAISS
- BM25
- Gemini API
- React
- Dense Retrieval
- Sparse Retrieval
- Hybrid Search
- Cross Encoder Reranking
- RRF Fusion
project/
β
βββ frontend/
β βββ React UI
β
βββ main.py
βββ ingestion/
βββ retrieval/
βββ generation/
βββ embeddings/
βββ uploads/
β
βββ vector_store/
β
βββ README.md
This system follows a retrieval-first architecture:
The quality of generated answers depends primarily on retrieval quality.
Instead of relying solely on an LLM, the system prioritizes strong information retrieval before generation.
- Multi-document support
- Streaming responses
- Persistent vector database
- User authentication
- Conversation memory
- Citation highlighting
- Kubernetes deployment
- Semantic chunking
- Embeddings
- Text preprocessing
- BM25
- FAISS
- Hybrid Retrieval
- RRF
- Reranking
- Prompt engineering
- Grounding
- Hallucination control
- FastAPI
- Modular architecture
- Logging
- Persistence
- React integration
- Async workflows
Built with retrieval engineering at the core.
The name OctoVector AI combines two ideas that represent the systemβs core architecture.
Octo is inspired by the Octopus, symbolizing intelligence, adaptability, and multiple components working together in parallelβsimilar to the systemβs multi-stage pipeline involving retrieval, fusion, reranking, and generation.
Vector represents embedding vectors, the foundation of semantic search, enabling the system to understand meaning and context beyond simple keyword matching.
Together, OctoVector AI represents:
- Parallel Intelligence β multiple retrieval strategies working together
- Semantic Depth β understanding meaning through embeddings
- Precision & Adaptability β refining and retrieving the most relevant information
The name reflects a system designed to intelligently navigate and refine knowledge across multiple layers to deliver accurate, context-aware answers.