Repo-aware engineering assistant workflows plus utilities for preparing codebase-specific LoRA datasets.
The goal is not to dump a repository directly into training. The goal is to curate examples that teach a model how to work in a repo: conventions, tool patterns, feature-request-to-patch behavior, bug-fix behavior, tests, and explanations.
The LoRA model is not the whole app. The app is a deterministic workflow around repo scanning, retrieval, patch planning, review, test execution, CI reporting, and adapter metadata. LoRA adapters are used as reasoning/review components inside that workflow.
Problem: General code models often miss local repo conventions, overlook test evidence, and produce generic reviews. This project explores how to make code assistants more reliable by combining deterministic repo workflows, retrieval, test execution, and targeted LoRA reviewers.
What I built: A repo-aware assistant toolkit with workflows for patch planning, structured review, changed-file testing, SFT dataset preparation, reviewer evaluation, and CI reporting. The workflow treats fine-tuned adapters as one component inside a larger engineering loop, not as a replacement for retrieval or tests.
Architecture: A source repo is scanned into structured chunks, curated examples are converted into train/eval JSONL, adapters are trained for specific review or style tasks, and CI workflows run deterministic review/test checks that emit JSON or GitHub-style findings.
Technical depth: The repo includes dataset builders, reviewer SFT preparation, audit scripts, regression evals, Spark-specific setup, a Gradio dashboard, GitHub Actions, Jenkins support, and tests for SFT generation, reviewer-data auditing, and adapter evaluation.
Proof: Evaluation fixtures and logs are versioned under evals/ and logs/, with summaries for Python, TypeScript, React, Vue, SQL, Docker/CI, orchestration, and testing reviewer experiments. The repo also includes CI entry points and a Jenkinsfile for repeatable checks.
Tradeoffs: Retrieval and test execution remain the source of truth. LoRA is used for repeated style and review patterns where smaller specialist models can cheaply flag likely issues before a larger model or human integrates the final change.
The CLI exposes three first-class workflows:
code-assistant write --repo . --task "Add support for X"
code-assistant review --repo . --base origin/main --head HEAD
code-assistant test --repo . --base origin/main --target changedwrite accepts a repo path, task, and optional constraints file. It retrieves relevant repository context and writes:
artifacts/code-assistant/write-context.json
artifacts/code-assistant/patch-plan.md
This is intentionally deterministic first. The retrieved context and patch plan are the boundary where a patch-author model or LoRA-backed assistant can be inserted.
review accepts a branch, commit range, working tree diff, or raw diff file and emits structured findings:
code-assistant review --repo . --base origin/main --format json
code-assistant review --repo . --diff-file patch.diff --format githubThe JSON report includes severity, file, line, rationale, and suggested fix. CI can upload artifacts/code-assistant/review.json or print GitHub annotations.
test detects common package managers and test frameworks, then records test results and logs:
code-assistant test --repo . --target changed
code-assistant test --repo . --target changed --dry-runReports are written to:
artifacts/code-assistant/test-report.json
artifacts/code-assistant/test-*.log
The repository includes GitHub Actions and Jenkins pipeline support for:
- linting and unit tests
- deterministic reviewer checks
- reviewer regression fixtures
- code-assistant artifacts for CI review/test output
Workflows and CI entry points:
.github/workflows/ci.yml
.github/workflows/reviewer-eval.yml
Jenkinsfile
docs/jenkins.md
Regression fixtures live in:
evals/reviewer_regression/
Adapter metadata should be versioned with the checkpoint being promoted. See:
configs/adapter.metadata.example.json
- Scan a source repo for candidate files:
python -m code_assistant_lora.scan_repo /path/to/repo --out data/raw/repo_chunks.jsonl- Curate examples by hand in
data/curated/examples.jsonl.
Each record should describe a real task and the desired answer or patch:
{"instruction":"Add a MAUDE tool that lists shared files.","input":"Relevant existing tool conventions and file context...","output":"Correct implementation or patch..."}- Convert curated examples into train/eval JSONL:
python -m code_assistant_lora.build_sft data/curated/examples.jsonl --train data/processed/train.jsonl --eval data/processed/eval.jsonl- Train a LoRA with a code-capable base model using PyTorch, Transformers, PEFT, Accelerate, and optionally bitsandbytes for QLoRA.
The priority is correct code, not just a fast model that sounds right. This project should treat the code model as one component in a larger orchestration process.
Initial model direction:
- Coding model:
google/gemma-4-31B-it - Adapter goal: teach repo-specific coding style, conventions, naming, structure, and review behavior
- Verification model: a separate model or route that reviews proposed changes for correctness, missing tests, broken assumptions, and regressions
- Grounding: repo retrieval, direct file context, tool use, and test execution remain part of the workflow
The intended loop:
- Retrieve relevant files, docs, and tests.
- Ask the 31B coding model to propose the implementation.
- Run formatters, type checks, unit tests, and targeted validation.
- Ask a verifier model to review the patch and test evidence.
- Revise until the implementation and verification both pass.
Speed optimizations are acceptable only when they do not materially reduce code accuracy. For now, keep the 31B model as the baseline coding model and use smaller models only for experiments, triage, or helper tasks.
Planned fast-agent architecture:
router -> retriever -> planner -> patcher -> language specialists -> tests -> verifier
The general coding model remains the main patch author. It should understand the request, use retrieved repo context, plan the change, write the first patch, and integrate feedback. Smaller specialist models can be trained and routed by language or responsibility:
- Python/backend specialist: env boundaries, service structure, tests, error handling.
- TypeScript/React specialist: components, hooks, typed API clients, UI state boundaries.
- SQL/Supabase specialist: schema wording, Postgres behavior, RLS, data-access boundaries.
- Docker/CI specialist: runtime environment, secrets, build/test/deploy checks.
- Test specialist: missing coverage, failure reproduction, targeted test plans.
Specialists should be fast reviewers and patch advisors, not final authorities. The orchestrator should send them the task, retrieved context, and current diff, then ask narrow questions such as "does this React hook/API-client boundary match repo style?" or "are Docker secrets handled safely?" The general coding model resolves conflicting advice and applies revisions.
Target loop:
- Route the task by touched files, language, and risk.
- Retrieve current files, docs, tests, and conventions.
- Let the 31B coding model produce a candidate patch.
- Send the diff and evidence to the relevant specialist model or models.
- Apply specialist feedback through the general model.
- Run automated checks.
- Have the verifier review the final diff, test output, and remaining risk.
This keeps broad reasoning in the larger model while using smaller LoRAs for cheap, opinionated language-specific review.
Problems this architecture is meant to solve:
- Stale repo knowledge: retrieval supplies current files, docs, tests, and APIs instead of relying on training memory.
- Cross-language mistakes: specialists can review the parts they understand best while the general model keeps the full task in view.
- Generic coding advice: LoRA-trained specialists can enforce local conventions such as typed API clients, startup config validation, RLS wording, and Docker secret handling.
- Silent regressions: automated checks and verifier review make tests, type errors, and missing coverage part of the loop.
- Overloaded context windows: routing lets each specialist inspect a small, relevant slice instead of forcing one model to carry every file and convention.
- Slow all-purpose review: small specialists can give fast narrow feedback before the expensive general model revises the patch.
- Conflicting recommendations: the general coding model remains responsible for integrating specialist feedback and resolving tradeoffs.
- Overfitting one model to every domain: language-specific adapters can be trained on cleaner, narrower data without making the main coding model brittle.
DGX Spark runs CUDA 13, so do not rely on a generic PyTorch install. Use the Spark-specific environment script:
scripts/setup_spark_env.shThis creates .venv-spark, installs the project with the spark, qlora, and dev extras, and verifies that PyTorch can see the GB10 GPU.
To run tools from that environment:
source .venv-spark/bin/activate
python -m code_assistant_lora.build_sft data/curated/python_style_round_001.jsonl \
--train data/processed/python_train.jsonl \
--eval data/eval/python_eval.jsonlThe Gradio dashboard presents the training process as a demo-friendly workflow: curated data, train/eval split generation, Spark environment checks, and the training command.
Install dashboard dependencies into the Spark environment:
.venv-spark/bin/python -m pip install -e ".[dashboard]"Start the dashboard:
scripts/start_dashboard.shOpen http://localhost:7860.
code-assistant-lora/
scripts/
setup_spark_env.sh
start_dashboard.sh
configs/
dataset.example.json
data/
curated/
raw/
processed/
eval/
src/code_assistant_lora/
scan_repo.py
build_sft.py
examples/
curated_examples.jsonl
Good examples:
- Feature request -> patch
- Bug report -> fix
- Test failure -> corrected code
- Existing file context -> edited file
- Repo convention question -> answer
- Code snippet -> explanation
- Review comment -> improved implementation
Avoid:
- Secrets, tokens,
.envfiles, credentials node_modules, virtualenvs, caches, generated build output- Low-quality experiments or abandoned code
- Repeated examples
- Whole-repo dumps with no task framing
For a code assistant, retrieval and tools usually matter more than fine-tuning. RAG keeps the assistant grounded in the current repo. LoRA is useful for teaching style, conventions, and repeated task patterns. The strongest system combines repo retrieval, tool access, tests, and optional LoRA.