Convert raw user feedback (app reviews, support tickets, NPS comments) into structured engineering insights using a two-agent AI pipeline. The project evaluates baseline performance with DeepEval, fine-tunes both agents independently, and measures quality improvements after training.
Important Links:
https://github.com/aws-samples/amazon-sagemaker-generativeai
https://huggingface.co/docs/transformers/en/training
https://deepeval.com/docs/metrics-llm-evals
This project implements a two-agent workflow in n8n:
User Feedback
↓
Feedback Classifier (Agent 1)
↓
Engineering Insight Writer (Agent 2)
↓
Engineering Ticket
Reads raw user feedback and extracts:
- Category (bug, feature request, performance issue, unclear)
- Severity
- Affected system
- Platform
- Environment
- Exact key phrases
- Reproduction hints
- Missing technical context
Transforms the classification into a structured engineering ticket containing:
- Title
- Ticket type
- Technical summary
- Affected components
- Reproduction steps
- Investigation checklist
- Data gaps
- Priority recommendation
When the base model (meta-llama/Llama-3.1-8B-Instruct) receives vague feedback such as:
"This app is garbage"
it frequently hallucinates technical details, including:
- Fabricated system names
- Imaginary reproduction steps
- Made-up root causes
- Unsupported engineering conclusions
This produces professional-looking but inaccurate engineering tickets.
Fine-tuning teaches each agent the correct behavior.
Instead of inventing information, it learns to:
- Mark feedback as
UNCLEAR - Set
AFFECTED_SYSTEM=UNKNOWN - Identify missing context
Instead of fabricating tickets, it learns to:
- Output "Insufficient feedback for engineering action"
- Highlight missing information
- Recommend contacting the user for clarification
Before starting, ensure you have:
- Python 3.12 or higher
- pip
Run:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Run:
brew install uvRun:
curl -LsSf https://astral.sh/uv/install.sh | shNavigate to your project directory:
cd <project_directory>Initialize the project:
uv venvActivate the virtual environment:
source .venv/bin/activateuv pip install huggingface-hub torch transformers fastapi typing-extensions typing-inspection peft trl datasets deepeval openai python-dotenv bitsandbytes uvicorn- Meta requires users to accept licensing terms before downloading or deploying the model.
- Llama models are gated on Hugging Face and require approval before download.
- Approval enables Hugging Face access tokens to authenticate model downloads and usage.
-
Visit the model page:
-
Review the license agreement.
-
Complete the required form.
-
Submit and wait for approval.
export OPENAI_API_KEY=<YOUR_OPENAI_TOKEN>
export HF_TOKEN=<YOUR_HF_TOKEN>import os
os.environ["OPENAI_API_KEY"] = "<YOUR_OPENAI_TOKEN>"
os.environ["HF_TOKEN"] = "<YOUR_HF_TOKEN>"python download_hf_model.pyEstimated download time: ~15 minutes
python load_run_local_model.pyReference:
https://dashboard.ngrok.com/get-started/setup/mac-os
Install:
brew install ngrokConfigure:
ngrok config add-authtoken $YOUR_AUTHTOKENExpose the API:
ngrok http --host-header=rewrite http://127.0.0.1:8005-
Create a free hosted instance:
No local installation is required.
-
Import the n8n workflow from
Fine Tuning n8n - using local LLM.jsonfile -
Set the credential that points to the local model for
Classifier LLM&Insight LLMnodes. Use the ngrok exposed url. E.g.,https://spinner-tighten-jeep.ngrok-free.dev -
Test the n8n setup by posting "hello" message.
- Submit all 8 feedback examples through the n8n chat interface.
- Wait for each run to finish (approximately 30–60 seconds).
- Save every generated output for later evaluation.
-
Open:
test_feedback_engineering.py -
Locate the
PIPELINE_OUTPUTSsection. -
Replace every placeholder with the corresponding n8n output.
-
Save the file.
python test_feedback_engineering.pyFine-tuning uses:
- LoRA
- 4-bit quantization
python fine_tune_training.pyEstimated training time: ~2 hours for each model
python merge_to_base_model.pyThis step:
- Loads the base model
- Merges LoRA adapters
- Saves the merged model
- Optionally uploads the model to Hugging Face
python load_run_local_finetune_model.pyngrok http --host-header=rewrite http://127.0.0.1:8006Update the Classifier LLM credential to: E.g.,
https://spinner-tighten-jeep.ngrok-free.dev/v2
Update the Insight LLM credential to:
https://spinner-tighten-jeep.ngrok-free.dev/v3
- Run feedback items F1–F8 again.
- Save all outputs.
- Open
test_feedback_engineering.py. - Replace all entries in
PIPELINE_OUTPUTSwith the new responses.
Run:
python test_feedback_engineering.pyAfter fine-tuning:
- Fewer hallucinated engineering details
- Improved handling of ambiguous feedback
- Better identification of missing information
- More reliable engineering ticket generation
- Higher DeepEval evaluation scores
- Python
- Hugging Face Transformers
- Meta Llama 3.1
- PEFT
- LoRA
- TRL
- Datasets
- DeepEval
- FastAPI
- ngrok
- n8n
Raw Feedback
↓
Feedback Classifier
↓
Structured Classification
↓
Engineering Insight Writer
↓
Engineering Ticket
↓
DeepEval Evaluation
↓
Fine-Tuning
↓
Re-Evaluation