Bytes is a hybrid AI assistant designed to streamline access to company knowledge across Jira, GitHub, Notion, and internal website content. Users can also upload PDFs for ad-hoc Q&A. It is built in Python with Solara UI, LangChain retrieval, and an agentic runtime that can orchestrate runtime tool calls.
- Multi-source Data Retrieval: Access data from Jira, Github, Notion, and your company's website seamlessly within the chat interface. Users can enable/disable specific sources as per their preference.
- Natural Language Understanding: Utilizes advanced NLP techniques to understand user queries and provide relevant responses.
- Intuitive UI: Built on Solara for a user-friendly interface, making interaction with the assistant smooth and intuitive.
- Hybrid Retrieval + Agentic Orchestration: Combines FAISS-based RAG with agentic tool orchestration for live, freshness-sensitive queries.
- Multiple Open Source Models: Supports multiple open source models from the Llama Family by Meta, Qwen (
qwen/qwen3-32b), and Kimi (moonshotai/kimi-k2-instruct) via Groq, allowing users to switch between LLMs based on their requirements. - Source Information URLs: Provides URLs for the source information the assistant uses for its answer to promote transparency and traceability.
- Agentic Mode (RAG + MCP-style tools): Adds an orchestrated agent loop that can route between local corpus retrieval and live read-only tools for GitHub, Jira, and Notion.
- Release Readiness Brief: Specialized agent workflow to summarize release status, blockers, risks, and next actions with evidence links.
- Tool Timeline Transparency: UI panel shows planning/tool/synthesis steps with status and duration for interview walkthroughs.
- RAG is snapshot-based: A local vector store is excellent for stable documentation, but it can lag behind current Jira/GitHub/Notion state unless constantly re-indexed.
- Operational questions are cross-system: Queries like release readiness require combining multiple systems (tickets, PRs, commits, docs), not just nearest text chunks.
- Freshness matters: Questions containing "latest", "recent", or "today" are better answered via runtime reads from live sources.
- Traceability matters: Agentic mode provides tool timeline, confidence, and citation output so you can explain how the answer was produced.
- Best of both worlds: RAG remains the fast, grounded path for static/internal knowledge; agentic mode is used when orchestration and live evidence are required.
After cloning the repository, create a .env file in the project directory with the following environment variables:
JIRA_API_TOKENOPENAI_API_KEYJIRA_USERNAMEJIRA_INSTANCE_URLGOOGLE_API_KEYGITHUB_ACCESS_TOKENNOTION_API_KEYGROQ_API_KEY
Obtaining Environment Variables
JIRA_API_TOKENandJIRA_USERNAMEcan be obtained from any Jira company account with read access.OPENAI_API_KEYcan be obtained from the OpenAI API website.GITHUB_ACCESS_TOKENcan be obtained from any GitHub account.NOTION_API_KEYcan be obtained by creating an integration in Notion as an admin and using the secret key. Integrate the integration in all Notion pages you want to retrieve data from.GROQ_API_KEYcan be obtained from Groq Cloud (for LLaMA, Qwen, and Kimi models).
Create a Conda environment using the environment.yml file in the project directory:
conda env create -f environment.yml
conda activate <env_name>
Run the following scripts to extract data from various sources:
python data_extraction/extract_corpus_notion.py
python data_extraction/extract_corpus_github.py
python data_extraction/extract_corpus_jira.py
Run the following script to create and store vector stores and embeddings for all data sources locally:
python save_vector_store.py
Run the assistant using Solara:
solara run ai_assistant.py
That's it! You should now have the chatbot up and running.
Run deterministic interview scenarios and export a markdown demo report:
python evaluate_agentic.py --model gemini --sources notion jira github --output agentic_demo_report.md
This report tracks tool-choice correctness, citation presence, freshness behavior, and hallucination heuristics.
Bytes now follows a hybrid architecture:
- RAG mode for high-precision answers from local indexed knowledge.
- Agentic mode for runtime tool orchestration across Jira, GitHub, and Notion with citation transparency.
This makes the assistant suitable for both:
- stable documentation Q&A (RAG),
- freshness-sensitive operational queries like release readiness (Agentic).
Bytes is composed of modular components that support both retrieval and agent orchestration.
- Corpus Extraction Scripts: Source-specific scripts fetch data from Notion, GitHub, and Jira and store it as JSON.
- Flattening and Normalization: Helper utilities convert nested API payloads into searchable textual chunks with metadata.
- Chunking: Text is chunked before embedding to improve retrieval quality.
- Embedding Generation:
OpenAIEmbeddingstransforms chunks into vectors. - FAISS Vector Store: Embeddings are stored locally and loaded at runtime for retrieval.
- Source Filtering: Documents retain source metadata (
notion,jira,github,website) used for filter-aware retrieval.
- Retrieval Chain: LangChain retrieval chain fetches relevant chunks from FAISS.
- Strict Context Prompting: Primary prompt enforces context-only answers and rejects unsupported responses.
- Fallback Context Synthesis: If strict mode returns "no context" despite retrieved chunks, a fallback context-grounded answer pass is used.
- URL Extraction and Ranking: Related links are deduplicated and ranked before rendering.
- Intent Classification: Query is classified into categories such as knowledge lookup, freshness lookup, release readiness, or write-like intent.
- Strategy Selection: Runtime chooses
corpus_first,live_first,hybrid, orreject(for write requests in read-only mode). - Tool Execution Layer:
search_corpusfor local retrieval,mcp_jira_read,mcp_github_read,mcp_notion_readfor live reads.
- MCP-style Client Wrapper: External tool calls include allowlisting, retries, and timeout control.
- Synthesis and Citations: Final answer is generated from collected evidence with links, confidence, and an optional timeline of steps.
- Guardrails: Write-like actions are blocked by policy and converted into safe read-only responses.
- Mode Switching: User can switch between
ragandagenticmodes from the Solara sidebar. - Model Switching: Runtime model switching supports Gemini, Llama family, Qwen, and Kimi options.
- Tool Timeline View: Agentic mode can show planning/tool/synthesis steps with durations.
- PDF Path: Uploaded PDF creates an in-memory/local vector retrieval chain for ad-hoc Q&A.
- Startup
- Load environment variables and initialize embeddings + default LLM.
- Load FAISS vector store and prepare retrieval chain.
- User Query
- User selects model, data sources, and assistant mode (RAG or Agentic).
- Execution Path
- RAG mode: retrieve -> answer -> fallback (if needed) -> link ranking.
- Agentic mode: classify -> decide strategy -> call tools -> synthesize -> cite -> timeline.
- Response Rendering
- UI displays answer text, related links, optional timeline, and confidence metadata (agentic path).
- Deterministic Scenario Pack:
evaluate_agentic.pyruns structured prompts that validate routing/tool behavior. - Scoring Dimensions:
- tool-choice correctness,
- citation presence,
- freshness correctness,
- hallucination heuristics.
- Report Export: Generates a markdown report for interview demonstration.
This methodology enables Bytes to operate as a practical production-style assistant: reliable for static knowledge retrieval while also capable of runtime, multi-tool reasoning for dynamic business questions.