code-assistant-LoRA

Repo-aware engineering assistant workflows plus utilities for preparing codebase-specific LoRA datasets.

The goal is not to dump a repository directly into training. The goal is to curate examples that teach a model how to work in a repo: conventions, tool patterns, feature-request-to-patch behavior, bug-fix behavior, tests, and explanations.

The LoRA model is not the whole app. The app is a deterministic workflow around repo scanning, retrieval, patch planning, review, test execution, CI reporting, and adapter metadata. LoRA adapters are used as reasoning/review components inside that workflow.

Case Study

Problem: General code models often miss local repo conventions, overlook test evidence, and produce generic reviews. This project explores how to make code assistants more reliable by combining deterministic repo workflows, retrieval, test execution, and targeted LoRA reviewers.

What I built: A repo-aware assistant toolkit with workflows for patch planning, structured review, changed-file testing, SFT dataset preparation, reviewer evaluation, and CI reporting. The workflow treats fine-tuned adapters as one component inside a larger engineering loop, not as a replacement for retrieval or tests.

Architecture: A source repo is scanned into structured chunks, curated examples are converted into train/eval JSONL, adapters are trained for specific review or style tasks, and CI workflows run deterministic review/test checks that emit JSON or GitHub-style findings.

Technical depth: The repo includes dataset builders, reviewer SFT preparation, audit scripts, regression evals, Spark-specific setup, a Gradio dashboard, GitHub Actions, Jenkins support, and tests for SFT generation, reviewer-data auditing, and adapter evaluation.

Proof: Evaluation fixtures and logs are versioned under evals/ and logs/, with summaries for Python, TypeScript, React, Vue, SQL, Docker/CI, orchestration, and testing reviewer experiments. The repo also includes CI entry points and a Jenkinsfile for repeatable checks.

Tradeoffs: Retrieval and test execution remain the source of truth. LoRA is used for repeated style and review patterns where smaller specialist models can cheaply flag likely issues before a larger model or human integrates the final change.

App Workflows

The CLI exposes three first-class workflows:

code-assistant write --repo . --task "Add support for X"
code-assistant review --repo . --base origin/main --head HEAD
code-assistant test --repo . --base origin/main --target changed

Writing

write accepts a repo path, task, and optional constraints file. It retrieves relevant repository context and writes:

artifacts/code-assistant/write-context.json
artifacts/code-assistant/patch-plan.md

This is intentionally deterministic first. The retrieved context and patch plan are the boundary where a patch-author model or LoRA-backed assistant can be inserted.

Review

review accepts a branch, commit range, working tree diff, or raw diff file and emits structured findings:

code-assistant review --repo . --base origin/main --format json
code-assistant review --repo . --diff-file patch.diff --format github

The JSON report includes severity, file, line, rationale, and suggested fix. CI can upload artifacts/code-assistant/review.json or print GitHub annotations.

Testing

test detects common package managers and test frameworks, then records test results and logs:

code-assistant test --repo . --target changed
code-assistant test --repo . --target changed --dry-run

Reports are written to:

artifacts/code-assistant/test-report.json
artifacts/code-assistant/test-*.log

CI/CD

The repository includes GitHub Actions and Jenkins pipeline support for:

linting and unit tests
deterministic reviewer checks
reviewer regression fixtures
code-assistant artifacts for CI review/test output

Workflows and CI entry points:

.github/workflows/ci.yml
.github/workflows/reviewer-eval.yml
Jenkinsfile
docs/jenkins.md

Regression fixtures live in:

evals/reviewer_regression/

Adapter metadata should be versioned with the checkpoint being promoted. See:

configs/adapter.metadata.example.json

Workflow

Scan a source repo for candidate files:

python -m code_assistant_lora.scan_repo /path/to/repo --out data/raw/repo_chunks.jsonl

Curate examples by hand in data/curated/examples.jsonl.

Each record should describe a real task and the desired answer or patch:

{"instruction":"Add a MAUDE tool that lists shared files.","input":"Relevant existing tool conventions and file context...","output":"Correct implementation or patch..."}

Convert curated examples into train/eval JSONL:

python -m code_assistant_lora.build_sft data/curated/examples.jsonl --train data/processed/train.jsonl --eval data/processed/eval.jsonl

Train a LoRA with a code-capable base model using PyTorch, Transformers, PEFT, Accelerate, and optionally bitsandbytes for QLoRA.

Project Plan

The priority is correct code, not just a fast model that sounds right. This project should treat the code model as one component in a larger orchestration process.

Initial model direction:

Coding model: google/gemma-4-31B-it
Adapter goal: teach repo-specific coding style, conventions, naming, structure, and review behavior
Verification model: a separate model or route that reviews proposed changes for correctness, missing tests, broken assumptions, and regressions
Grounding: repo retrieval, direct file context, tool use, and test execution remain part of the workflow

The intended loop:

Retrieve relevant files, docs, and tests.
Ask the 31B coding model to propose the implementation.
Run formatters, type checks, unit tests, and targeted validation.
Ask a verifier model to review the patch and test evidence.
Revise until the implementation and verification both pass.

Speed optimizations are acceptable only when they do not materially reduce code accuracy. For now, keep the 31B model as the baseline coding model and use smaller models only for experiments, triage, or helper tasks.

Planned fast-agent architecture:

router -> retriever -> planner -> patcher -> language specialists -> tests -> verifier

The general coding model remains the main patch author. It should understand the request, use retrieved repo context, plan the change, write the first patch, and integrate feedback. Smaller specialist models can be trained and routed by language or responsibility:

Python/backend specialist: env boundaries, service structure, tests, error handling.
TypeScript/React specialist: components, hooks, typed API clients, UI state boundaries.
SQL/Supabase specialist: schema wording, Postgres behavior, RLS, data-access boundaries.
Docker/CI specialist: runtime environment, secrets, build/test/deploy checks.
Test specialist: missing coverage, failure reproduction, targeted test plans.

Specialists should be fast reviewers and patch advisors, not final authorities. The orchestrator should send them the task, retrieved context, and current diff, then ask narrow questions such as "does this React hook/API-client boundary match repo style?" or "are Docker secrets handled safely?" The general coding model resolves conflicting advice and applies revisions.

Target loop:

Route the task by touched files, language, and risk.
Retrieve current files, docs, tests, and conventions.
Let the 31B coding model produce a candidate patch.
Send the diff and evidence to the relevant specialist model or models.
Apply specialist feedback through the general model.
Run automated checks.
Have the verifier review the final diff, test output, and remaining risk.

This keeps broad reasoning in the larger model while using smaller LoRAs for cheap, opinionated language-specific review.

Problems this architecture is meant to solve:

Stale repo knowledge: retrieval supplies current files, docs, tests, and APIs instead of relying on training memory.
Cross-language mistakes: specialists can review the parts they understand best while the general model keeps the full task in view.
Generic coding advice: LoRA-trained specialists can enforce local conventions such as typed API clients, startup config validation, RLS wording, and Docker secret handling.
Silent regressions: automated checks and verifier review make tests, type errors, and missing coverage part of the loop.
Overloaded context windows: routing lets each specialist inspect a small, relevant slice instead of forcing one model to carry every file and convention.
Slow all-purpose review: small specialists can give fast narrow feedback before the expensive general model revises the patch.
Conflicting recommendations: the general coding model remains responsible for integrating specialist feedback and resolving tradeoffs.
Overfitting one model to every domain: language-specific adapters can be trained on cleaner, narrower data without making the main coding model brittle.

Spark Setup

DGX Spark runs CUDA 13, so do not rely on a generic PyTorch install. Use the Spark-specific environment script:

scripts/setup_spark_env.sh

This creates .venv-spark, installs the project with the spark, qlora, and dev extras, and verifies that PyTorch can see the GB10 GPU.

To run tools from that environment:

source .venv-spark/bin/activate
python -m code_assistant_lora.build_sft data/curated/python_style_round_001.jsonl \
  --train data/processed/python_train.jsonl \
  --eval data/eval/python_eval.jsonl

Training Dashboard

The Gradio dashboard presents the training process as a demo-friendly workflow: curated data, train/eval split generation, Spark environment checks, and the training command.

Install dashboard dependencies into the Spark environment:

.venv-spark/bin/python -m pip install -e ".[dashboard]"

Start the dashboard:

scripts/start_dashboard.sh

Open http://localhost:7860.

Repo Layout

code-assistant-lora/
  scripts/
    setup_spark_env.sh
    start_dashboard.sh
  configs/
    dataset.example.json
  data/
    curated/
    raw/
    processed/
    eval/
  src/code_assistant_lora/
    scan_repo.py
    build_sft.py
  examples/
    curated_examples.jsonl

What Belongs In Training Data

Good examples:

Feature request -> patch
Bug report -> fix
Test failure -> corrected code
Existing file context -> edited file
Repo convention question -> answer
Code snippet -> explanation
Review comment -> improved implementation

Avoid:

Secrets, tokens, .env files, credentials
node_modules, virtualenvs, caches, generated build output
Low-quality experiments or abandoned code
Repeated examples
Whole-repo dumps with no task framing

Design Notes

For a code assistant, retrieval and tools usually matter more than fine-tuning. RAG keeps the assistant grounded in the current repo. LoRA is useful for teaching style, conventions, and repeated task patterns. The strongest system combines repo retrieval, tool access, tests, and optional LoRA.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
configs		configs
data		data
docs		docs
evals		evals
examples		examples
logs		logs
scripts		scripts
src/code_assistant_lora		src/code_assistant_lora
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
AGENTS.md		AGENTS.md
Jenkinsfile		Jenkinsfile
README.gitlab-template.md		README.gitlab-template.md
README.md		README.md
REVIEWER_DATA_V2_PLAN.md		REVIEWER_DATA_V2_PLAN.md
REVIEWER_TRAINING_PLAN.md		REVIEWER_TRAINING_PLAN.md
TRAINING.md		TRAINING.md
TRAINING_FINDINGS.md		TRAINING_FINDINGS.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

code-assistant-LoRA

Case Study

App Workflows

Writing

Review

Testing

CI/CD

Workflow

Project Plan

Spark Setup

Training Dashboard

Repo Layout

What Belongs In Training Data

Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

code-assistant-LoRA

Case Study

App Workflows

Writing

Review

Testing

CI/CD

Workflow

Project Plan

Spark Setup

Training Dashboard

Repo Layout

What Belongs In Training Data

Design Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages