Skip to content

This project enhances a Reactome-based RAG chatbot by adding query un…#9

Open
shivanshuyadav921 wants to merge 1 commit intoreactome:mainfrom
shivanshuyadav921:ragllm
Open

This project enhances a Reactome-based RAG chatbot by adding query un…#9
shivanshuyadav921 wants to merge 1 commit intoreactome:mainfrom
shivanshuyadav921:ragllm

Conversation

@shivanshuyadav921
Copy link
Copy Markdown

@shivanshuyadav921 shivanshuyadav921 commented Mar 29, 2026

understanding for better retrieval and a citation verification layer to ensure scientifically accurate, verifiable answers.

This project develops an enhanced RAG-based chatbot for biological pathway analysis using Reactome as the knowledge source. The system improves traditional RAG pipelines by introducing two key components:

Query Understanding Layer — Uses an LLM to classify user intent and decompose complex queries into smaller, independent sub-queries. This enables more precise and multi-step retrieval of relevant biological pathways, reactions, and relationships. Citation & Verification Layer — Ensures that all generated answers are grounded in Reactome data by enforcing citation of pathway IDs (e.g., R-HSA-XXXXX). Extracted IDs are validated using the Reactome API, and missing citations are automatically injected based on retrieved context.

By combining LLM reasoning with structured biological knowledge retrieval and verification, the system produces accurate, explainable, and scientifically reliable answers, addressing key limitations of standard LLM-based systems such as hallucination and lack of traceability.

…derstanding for better retrieval and a citation verification layer to ensure scientifically accurate, verifiable answers.

This project develops an enhanced RAG-based chatbot for biological pathway analysis using Reactome as the knowledge source. The system improves traditional RAG pipelines by introducing two key components:

Query Understanding Layer — Uses an LLM to classify user intent and decompose complex queries into smaller, independent sub-queries. This enables more precise and multi-step retrieval of relevant biological pathways, reactions, and relationships.
Citation & Verification Layer — Ensures that all generated answers are grounded in Reactome data by enforcing citation of pathway IDs (e.g., R-HSA-XXXXX). Extracted IDs are validated using the Reactome API, and missing citations are automatically injected based on retrieved context.

By combining LLM reasoning with structured biological knowledge retrieval and verification, the system produces accurate, explainable, and scientifically reliable answers, addressing key limitations of standard LLM-based systems such as hallucination and lack of traceability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant