Conversation
… to use library functions
…xible RAG generation
There was a problem hiding this comment.
Pull request overview
This PR introduces a provider-based abstraction in HealthLLM.jl to support multiple LLM backends for text generation and embeddings, plus a small registry and new demos/tests to exercise provider construction and switching.
Changes:
- Added
ProvidersandRegistrymodules and wired them into the mainHealthLLMmodule exports. - Updated query/index-building entry points to accept provider objects (with some backward-compat overloads).
- Added RAG helpers (
PromptTemplate, simple embedding/indexing, retrieval, and answer generation) plus demos and basic provider/registry tests.
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
src/providers.jl |
Defines model + embedding provider abstractions and concrete providers (HF/Groq/Ollama/Gemini/OpenAI/Anthropic) with retry helper. |
src/registry.jl |
Adds a simple global provider registry with register/get/list helpers. |
src/rag.jl |
Adds simple RAG utilities (chunking, hashing embedder, retrieval, prompt templating, answer generation). |
src/query.jl |
Refactors query generation to accept providers and adds a backward-compat overload. |
src/embedding.jl |
Changes build_index_rag to require an EmbeddingProvider. |
src/HealthLLM.jl |
Includes new modules and exports provider/registry/RAG APIs. |
test/runtests.jl |
Runs new provider/registry tests. |
test/providers_test.jl |
Adds basic tests for provider construction + registry. |
demo/openai.jl |
Adds OpenAI demo script. |
demo/anthropic.jl |
Adds Anthropic demo script. |
demo/groq.jl |
Adds Groq demo script. |
demo/gemini.jl |
Adds Gemini demo script. |
README.md |
Documents provider usage and prompt templating. |
Project.toml |
Adds HTTP dependency for direct API calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Default to some embedder, but since RAGTools might expect different, perhaps error | ||
| error("Please provide an embedder::EmbeddingProvider") |
There was a problem hiding this comment.
The previous build_index_rag(cfg, files; embedder_kwargs=...) method now unconditionally throws, which is a breaking change despite the “backward compatibility” comment. If you want to deprecate this overload, consider keeping the old behavior (or a reasonable default embedder) and emitting a deprecation warning, or introduce a new function name for the provider-based API.
| # Default to some embedder, but since RAGTools might expect different, perhaps error | |
| error("Please provide an embedder::EmbeddingProvider") | |
| Base.depwarn( | |
| "build_index_rag(cfg, files; embedder_kwargs=...) is deprecated; " * | |
| "pass embedder::EmbeddingProvider explicitly to build_index_rag(cfg, files; embedder=..., embedder_kwargs=...)", | |
| :build_index_rag, | |
| ) | |
| return RAGTools.build_index(cfg, files; embedder_kwargs=embedder_kwargs) |
| Set your API keys in `.env` file or environment variables: | ||
|
|
||
| ```bash | ||
| HUGGINGFACE_TOKEN=your_hf_token | ||
| GROQ_API_KEY=your_groq_key | ||
| GEMINI_API_KEY=your_gemini_key | ||
| OPENAI_API_KEY=your_openai_key | ||
| ANTHROPIC_API_KEY=your_anthropic_key | ||
| ``` |
There was a problem hiding this comment.
README instructs setting HUGGINGFACE_TOKEN, but the implementation reads ENV["HF_TOKEN"] for HuggingFace providers/embedders. This mismatch will lead to authentication failures when users follow the docs. Update the README to match the code’s env var name (or change the code to accept HUGGINGFACE_TOKEN).
| end | ||
|
|
||
| if !isinteractive() | ||
| main() |
There was a problem hiding this comment.
This demo calls main() under both if abspath(PROGRAM_FILE) == @__FILE__ and if !isinteractive(). When executed as a script, both conditions are typically true, so main() runs twice. Keep only one of these entry-point guards to avoid double execution.
| end | |
| if !isinteractive() | |
| main() |
| end | ||
|
|
||
| if !isinteractive() | ||
| main() |
There was a problem hiding this comment.
This demo calls main() under both if abspath(PROGRAM_FILE) == @__FILE__ and if !isinteractive(). When executed as a script, both conditions are typically true, so main() runs twice. Keep only one of these entry-point guards to avoid double execution.
| end | |
| if !isinteractive() | |
| main() |
| chunks = String[] | ||
| chars = collect(text) | ||
| pos = 1 | ||
| while pos <= length(chars) | ||
| endpos = min(pos + chunk_size - 1, length(chars)) | ||
| push!(chunks, strip(join(chars[pos:endpos]))) | ||
| pos = endpos + 1 |
There was a problem hiding this comment.
chunk_text accepts an overlap argument but never uses it when advancing pos, so there is no actual overlap between chunks. This makes the API misleading and reduces retrieval quality. Use overlap when computing the next pos (e.g., advance by chunk_size - overlap) or remove the parameter.
| chunks = String[] | |
| chars = collect(text) | |
| pos = 1 | |
| while pos <= length(chars) | |
| endpos = min(pos + chunk_size - 1, length(chars)) | |
| push!(chunks, strip(join(chars[pos:endpos]))) | |
| pos = endpos + 1 | |
| chunk_size > 0 || throw(ArgumentError("chunk_size must be positive")) | |
| overlap >= 0 || throw(ArgumentError("overlap must be non-negative")) | |
| overlap < chunk_size || throw(ArgumentError("overlap must be smaller than chunk_size")) | |
| chunks = String[] | |
| chars = collect(text) | |
| pos = 1 | |
| step = chunk_size - overlap | |
| while pos <= length(chars) | |
| endpos = min(pos + chunk_size - 1, length(chars)) | |
| push!(chunks, strip(join(chars[pos:endpos]))) | |
| pos += step |
| end | ||
|
|
||
| if !isinteractive() | ||
| main() |
There was a problem hiding this comment.
This demo calls main() under both if abspath(PROGRAM_FILE) == @__FILE__ and if !isinteractive(). When executed as a script, both conditions are typically true, so main() runs twice. Keep only one of these entry-point guards to avoid double execution.
| end | |
| if !isinteractive() | |
| main() |
| end | ||
|
|
||
| if !isinteractive() | ||
| main() |
There was a problem hiding this comment.
This demo calls main() under both if abspath(PROGRAM_FILE) == @__FILE__ and if !isinteractive(). When executed as a script, both conditions are typically true, so main() runs twice. Keep only one of these entry-point guards to avoid double execution.
| end | |
| if !isinteractive() | |
| main() |
| function with_retry(f::Function, max_retries::Int=3, backoff::Float64=1.0) | ||
| for attempt in 1:max_retries | ||
| try | ||
| return f() | ||
| catch e | ||
| if attempt == max_retries | ||
| rethrow(e) | ||
| end | ||
| if isa(e, HTTP.ExceptionRequest.StatusError) && e.status in [429, 500, 502, 503, 504] | ||
| sleep(backoff * attempt) | ||
| else | ||
| rethrow(e) | ||
| end |
There was a problem hiding this comment.
with_retry checks isa(e, HTTP.ExceptionRequest.StatusError), but HTTP.jl status exceptions are typically HTTP.Exceptions.StatusError. As written, this branch likely never matches, so rate-limit / 5xx responses won't be retried. Update the exception type check to the correct HTTP exception type (and keep the status filter) so retries actually occur.
| export ModelProvider, EmbeddingProvider, HuggingFaceProvider, GroqProvider, OllamaProvider, GeminiProvider, OpenAIProvider, AnthropicProvider, HuggingFaceEmbedder | ||
|
|
||
| abstract type ModelProvider end | ||
| abstract type EmbeddingProvider end | ||
|
|
||
| # Abstract methods | ||
| function generate(provider::ModelProvider, prompt::String; kwargs...) | ||
| error("generate not implemented for $(typeof(provider))") | ||
| end | ||
|
|
||
| function embed(provider::EmbeddingProvider, texts::Vector{String}; kwargs...) | ||
| error("embed not implemented for $(typeof(provider))") | ||
| end |
There was a problem hiding this comment.
generate/embed are used unqualified in other modules (e.g., Query and RAG), but they are not exported from Providers. Either export generate and embed from this module or require callers to reference them as Providers.generate / Providers.embed; otherwise downstream modules will hit UndefVarError at load time.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR adds comprehensive support for multiple model APIs in HealthLLM.jl, enabling users to easily switch between different LLM providers for text generation and embeddings.
Key Changes
ModelProviderandEmbeddingProviderinterface for consistent API usage.ModelRegistryfor managing and switching providers dynamically.Usage Example
This enhancement significantly expands HealthLLM.jl's capabilities by supporting a wide range of LLM services.