Skip to content

feat: grep_rag implementation#428

Open
mBerasategui-ehu wants to merge 1 commit into
Lamb-Project:devfrom
mBerasategui-ehu:grep_RAG
Open

feat: grep_rag implementation#428
mBerasategui-ehu wants to merge 1 commit into
Lamb-Project:devfrom
mBerasategui-ehu:grep_RAG

Conversation

@mBerasategui-ehu

Copy link
Copy Markdown
Collaborator

Grep RAG — Implementation


Overview

Grep RAG is a complementary search layer that works alongside any embedding-based RAG processor. A nano LLM (small-fast-model) drives iterative grep/egrep/ripgrep searches across all files in the assistant's knowledge bases. All searches run in-memory via Python's re module — no shelling out.

While embeddings find semantically similar chunks, grep finds exact keyword matches — including terms the embedding model might miss, synonyms the nano model iteratively discovers, and content in rarely-accessed sections of the KB.

Two Operating Modes

Mode Behavior Companion RAG
hybrid Run grep AND simple_rag in parallel, merge results Always simple_rag (hardcoded — grep handles precision; simple semantic coverage suffices)
primary Grep runs first. If zero matches after max tries → fall back to user-selected RAG User picks from KB-based RAGs (simple_rag, context_aware_rag, hierarchical_rag)

Backend: grep_rag.py

Architecture

rag_processor(messages, assistant, request)
  │
  ├─ _parse_config()           → Read grep_mode, max_tries, etc. from metadata JSON
  ├─ _extract_user_question()  → Get last user message from conversation
  ├─ _resolve_kb_documents()   → Fetch KB file list + content via KB Server API
  │
  ├─ MODE: primary
  │   ├─ _run_grep_search()    → Iterative nano LLM + regex loop
  │   ├─ matches found?        → _build_grep_response()
  │   └─ no matches?           → _run_fallback_rag(fallback_rag_name)
  │
  └─ MODE: hybrid
      ├─ _run_grep_search()    → Async grep loop
      ├─ _run_fallback_rag("simple_rag") → Parallel embedding RAG (always simple_rag)
      └─ _merge_contexts()     → Combine both result sets

Key Functions

_resolve_kb_documents(assistant)

  1. Calls GET /collections/{id}/files on the KB server for each attached KB
  2. For each file entry, tries to fetch content from:
    • processing_stats.output_files.markdown_url (converted markdown)
    • file_url (original file, may be binary)
    • processing_stats.markdown_preview (first ~2000 chars, last resort)
  3. Filters out binary content via _is_text_content() heuristic
  4. Returns list of {file_path, original_filename, content, metadata, collection_id}

_fetch_text_content(url, headers)

Handles three URL types:

  • localhost:* URLs — Docker container can't reach them. Tries local filesystem first, then rewrites to kb:9090 (KB server's internal Docker hostname)
  • Relative URLs (/static/...) — prefixed with KB server base URL
  • Absolute URLs — fetched directly via HTTP

_run_grep_search(user_question, documents, ...)

The nano model search loop:

For each attempt (1..grep_max_tries):
  1. _ask_nano_model()         → LLM proposes TOOL + PATTERN + FLAGS + REASON
  2. Skip duplicate (tool, pattern) combos
  3. _search_across_documents() → Python re.search across all KB file content
  4. Accumulate matches
  5. If nano says DONE → break

Fallback when no nano model configured: single grep with user-question keyword extraction.

_search_across_documents(documents, pattern, tool, context_lines)

  • Compiles regex with re.IGNORECASE | re.MULTILINE (matching grep -i)
  • All three tools (grep, egrep, ripgrep) use Python re internally — no shelling out
  • Invalid regex → caught, returns [], nano model retries
  • Context lines extracted around each match

_deduplicate_matches(matches, context_lines)

  • Groups matches by file_path
  • Merges overlapping context ranges within each file
  • Cross-file matches kept separate

_build_grep_response(matches, documents, max_total_chars)

Formats grep results as markdown blocks with source headers:

### {source_label} ({source_url})
{context_lines}

Truncates if total exceeds grep_max_total_chars.

_merge_contexts(grep_response, rag_response, max_total_chars)

In hybrid mode, concatenates RAG chunks first, then appends grep results under:

---
## Exact Keyword Matches

Deduplicates sources by URL.

Nano Model Prompts

System prompt: Instructs the nano model to respond with structured TOOL:, PATTERN:, FLAGS:, REASON: lines, or DONE: when satisfied. Provides regex tips and tool selection guidance.

User prompt: Includes the user's question and a formatted search history showing previous tries, match counts, and content previews.

Configuration (assistant metadata)

{
    "rag_processor": "grep_rag",
    "grep_mode": "hybrid",
    "grep_fallback_rag": "simple_rag",
    "grep_max_tries": 5,
    "grep_context_lines": 3,
    "grep_max_total_chars": 8000
}
Field Default Description
grep_mode "hybrid" "hybrid" or "primary"
grep_fallback_rag "simple_rag" Used only in primary mode; hybrid always uses simple_rag
grep_max_tries 5 Max search iterations (1 LLM call each)
grep_context_lines 3 Lines of context before/after each match
grep_max_total_chars 8000 Max total chars sent to main LLM

Plugin Auto-Discovery

No changes to main.py needed. The existing load_plugins('rag') system auto-discovers .py files in backend/lamb/completions/rag/. The function rag_processor() is detected as async and awaited accordingly.

Edge Cases

Case Behavior
No KBs attached Returns error, falls back to embedding RAG
No small-fast-model Single grep with keyword extraction
Nano returns garbage Only structured responses parsed; invalid ignored
Max tries exhausted (primary) Falls back to configured grep_fallback_rag
Max tries exhausted (hybrid) Returns found matches + RAG results
No matches at all Primary → fallback RAG; Hybrid → RAG only
Invalid regex from nano re.error caught → nano retries
Duplicate pattern Tracked (tool, pattern) set → skipped
Results > max chars Truncated
Binary content Filtered by _is_text_content()
localhost URLs in Docker Rewritten to kb:9090
File deleted from KB Handled gracefully

Frontend

Files Changed

File Changes
src/lib/utils/ragProcessorHelpers.js Added GREP: ['grep_rag'] type, isGrepRag(), isGrepBasedRag() helpers
src/lib/stores/assistantConfigStore.js Added grep_rag to fallback capabilities
src/lib/components/assistants/logic/assistantFormState.svelte.js 5 grep state fields + populate/clear logic + isGrepRag import
src/lib/components/assistants/logic/assistantFormSubmit.js Serializes grep config into metadata; sends RAG_collections for grep_rag
src/lib/components/assistants/logic/assistantFormFetchers.js KB fetch also triggered for grep_rag
src/lib/components/assistants/components/RagOptionsPanel.svelte Full grep config UI + conditional fallback selector
src/lib/components/assistants/components/ConfigurationPanel.svelte Passes grep props through
src/lib/components/assistants/AssistantForm.svelte Wires grep state; KB fetch on grep_rag select; import handler for grep_rag
src/routes/assistants/+page.svelte Detail view shows grep config + KBs; hides Top K for single_file_rag/rubric_rag
src/lib/components/AssistantsList.svelte Table expansion shows KB info for grep_rag
src/lib/locales/{en,es,ca,eu}.json 11 new i18n keys under assistants.form.grepRag.*

RAG Processor Classification (ragProcessorHelpers.js)

export const RAG_TYPES = Object.freeze({
    KB_BASED:   ['simple_rag', 'context_aware_rag', 'hierarchical_rag'],
    SINGLE_FILE: ['single_file_rag'],
    RUBRIC:      ['rubric_rag'],
    GREP:        ['grep_rag'],        // ← NEW
    NONE:        ['no_rag']
});

export function isGrepRag(processor)   { return RAG_TYPES.GREP.includes(processor); }
export function isGrepBasedRag(processor) { return isGrepRag(processor); }

Form State (assistantFormState.svelte.js)

Five new reactive fields added to the form state object:

grepMode:          'hybrid',       // 'hybrid' | 'primary'
grepFallbackRag:   'simple_rag',   // Used only in primary mode
grepMaxTries:      5,              // 1-10
grepContextLines:  3,              // 1-10
grepMaxTotalChars: 8000,           // 1000-32000
  • populateFormFields() — reads grep metadata when editing an existing assistant
  • clearRagDependentState() — resets grep fields to defaults when switching away from grep_rag

Form Submission (assistantFormSubmit.js)

When isGrepRag() returns true:

  • Serializes all 5 grep fields into metadataObj
  • Sends RAG_collections (KBs are shared with the companion RAG)

KB Fetching (assistantFormFetchers.js)

Guard updated so KB fetching also triggers for grep_rag:

if (!isKbBasedRag(form.selectedRagProcessor) && !isGrepRag(form.selectedRagProcessor)) return;

Configuration UI (RagOptionsPanel.svelte)

When grep_rag is selected, shows:

Field Control Visibility
Mode Dropdown: Hybrid / Primary Always
Fallback RAG Dropdown (KB-based RAGs only) Only in Primary mode
Max Search Tries Number input (1–10) Always
Context Lines Number input (1–10) Always
Max Result Characters Number input (1000–32000, step 1000) Always

Additionally, KB selector and RAG Top K are shown (shared with companion RAG).

Detail View (+page.svelte)

In the read-only detail view:

  • Knowledge bases displayed for grep_rag (same as simple_rag)
  • RAG Top K shown (companion RAG uses it)
  • Grep configuration section displays mode, fallback RAG, max tries, context lines, max chars

Knowledge Bases Integration

When grep_rag is selected:

  • KB fetching is triggered (same flow as simple_rag)
  • KB selections are stored in RAG_collections (same DB field)
  • The detail view renders KB names

i18n Strings

11 new keys added in all 4 locale files:

Key English
assistants.form.grepRag.sectionTitle "Grep RAG Configuration"
assistants.form.grepRag.mode.label "Mode"
assistants.form.grepRag.mode.description "How grep interacts with embedding-based RAG"
assistants.form.grepRag.mode.hybrid "Hybrid (grep + RAG in parallel)"
assistants.form.grepRag.mode.primary "Primary (grep first, RAG fallback)"
assistants.form.grepRag.fallbackRag.label "Fallback RAG"
assistants.form.grepRag.fallbackRag.description "Embedding RAG to fall back to when grep finds no matches"
assistants.form.grepRag.maxTries.label "Max Search Tries"
assistants.form.grepRag.maxTries.description "Maximum number of search iterations (1-10)"
assistants.form.grepRag.contextLines.label "Context Lines"
assistants.form.grepRag.contextLines.description "Lines of context before/after each match (1-10)"
assistants.form.grepRag.maxTotalChars.label "Max Result Characters"
assistants.form.grepRag.maxTotalChars.description "Max total characters of grep results sent to main LLM (1000-32000)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant