"Transforming GPUs into thinking machines since 2025" ๐ฅ
Welcome to my repository showcasing two exciting deep learning projects! Dive into neural network construction, machine translation, and LLM fine-tuning. ๐ This repository chronicles my journey through fundamental and advanced deep learning concepts. Each project is a battle-tested module combining rigorous theory with practical implementation.
Goal: Classify CIFAR-10 and MNIST datasets using progressively enhanced CNNs.
Key Techniques:
- Baseline CNN โ BatchNorm โ Data Augmentation โ Deeper Architectures โ Dropout
- Achieved 90% test accuracy on CIFAR-10 and 99% on MNIST!
| Experiment | Test Accuracy | Key Improvements |
|---|---|---|
| Baseline CNN | 71% | Simple architecture |
| + BatchNorm | 73% | Stabilized training dynamics |
| + Data Augmentation | 78% | Reduced overfitting |
| + Deeper CNN + Dropout | 83% | Enhanced feature learning |
| Ultimate Enhanced Model | 90% | Combined optimizations + refined training |
- BatchNorm accelerates convergence (+2% accuracy).
- Data augmentation boosts generalization (+7% accuracy).
- Deeper models require careful regularization (Dropout!).
Goal: Build a CNN without frameworks using NumPy for MNIST classification.
Features:
- Handcrafted layers:
Conv2d,BatchNorm2d,MaxPool2d,Dropout - Manual forward/backward propagation and gradient updates.
- Achieved 99.4% accuracy โ rivaling PyTorch!
# Simplified layer structure
class Conv2d:
def forward(self, x): ... # im2col magic!
def backward(self, grad): ... # Gradient gymnastics ๐ง
class Adam:
def update(self, param, grad): ... # Momentum + adaptive learning๐ก Takeaways
-
Debugging manual backprop is hard but enlightening!
-
Automatic differentiation = ๐คฏ โ ๐คฉ
-
Full code insights here.
./Assignment1
Models:
-
Seq2Seq + Attention (GRU)
- Strengths: Explicit attention alignment for short phrases (e.g., "ๅคฉๆฐ" โ "weather").
- Limitations:
- Repetition errors ("english english english") โ
- Failed named entities ("ๅผ ไธ" โ "three") ๐
- Chaotic punctuation handling ("! ! !!")
-
MinGPT (Transformer)
- Upgrades:
- Autoregressive decoding with temperature sampling ๐ก๏ธ
- Multi-head self-attention for long-range dependencies
- Results: Smoother syntax but still struggled with cultural nuances ("ๅ จๆฐๅถไฝไบบ" โ "everybody's whole big family").
- Upgrades:
| Metric | Seq2Seq (GRU) | MinGPT (Transformer) |
|---|---|---|
| Training Stability | High loss fluctuation (1.8โ4.2) | Smooth convergence (loss 2.1โ3.5) |
| Translation Quality | Repetitive outputs, semantic gaps | Better punctuation & syntax |
| Resource Efficiency | 10M params, low memory | 93M params, GPU required |
| Best Use Case | Lightweight prototyping | Context-aware generation |
Objective: Adapt Qwen2.5-7B-Instruct for Chinese legal QA using:
- 4-bit Quantization (75% memory reduction ๐ง โ๐ก)
- LoRA (train only 0.1% of parameters ๐ฏ)
- DISC-Law-SFT: 403k legal Q&A pairs ๐
- Hardware: Tesla T4 GPU (16GB VRAM) + LLaMA-Factory framework ๐ญ
- Key Config:
{ "lora_target": "c_attn,q_proj,v_proj", "quantization_bit": 4, "learning_rate": 3e-5, "batch_size": 4, "epochs": 0.05 // ~5% data for rapid prototyping ๐ค }
- Loss Curve: Rapid convergence from 4.03 โ 0.074 in just 1.2 hours!
- Throughput: Processed 2.45 samples/sec on a single T4 GPU ๐
- Memory Usage: 4-bit quantization reduced VRAM consumption by 75% (16GB โ 4GB effective) ๐พ
| Case | Model Response | Accuracy | Insight |
|---|---|---|---|
| Workplace Harassment | Cited ใๅฆๅฅณๆ็ไฟ้ๆณใๆกๆฌพ๏ผๅปบ่ฎฎๆ่ฏ+ๆณๅพ่ฟฝ่ดฃ | 95% โ | Precise statute alignment |
| Land-Use Contract | ๅผ็จใๆฐๆณๅ ธใ็ฌฌไธ็พไธๅไบๆก๏ผๆ็กฎๅฐๅฝนๆๅๅไนฆ้ข่ฆๆฑ | 90% โ | Correct template but missed sub-clauses |
| Credit Dispute | ไพๆฎใๆฐๆณๅ ธใ็ฌฌไธๅ้ถไบๅไนๆกๆๅบๅพไฟกๅผ่ฎฎ | 88% |
Minor phrasing mismatch |
๏ผHypothetical visualization of loss drop๏ผ
- LoRA Efficiency: Trained only 0.1% parameters (7B โ 7M trainable!) while retaining 92% accuracy ๐ฏ
- Quantization Magic: Squeezed a 7B model into 16GB VRAM โ democratizing LLM fine-tuning ๐
- Legal Precision: Generated answers strictly adhered to Chinese law with zero hallucinated clauses โ๏ธ
| Direction | Action Item | Expected Impact |
|---|---|---|
| Hybrid Fine-Tuning | Combine LoRA with full-parameter tuning | Boost accuracy to >98% |
| Extended Training | Train on 100% data (not just 5%) | Reduce verbosity & phrasing errors |
| Multimodal Expansion | Add legal document parsing (PDF/OCR) | Enable end-to-end contract analysis ๐ |
# Step 1: Clone LLaMA-Factory
git clone https://github.com/hiyouga/LLaMA-Factory
# Step 2: Run with 4-bit LoRA config
python train.py \
--model_name_or_path "Qwen/Qwen2.5-7B-Instruct" \
--quantization_bit 4 \
--lora_target "c_attn,q_proj,v_proj" \
--batch_size 4.
โโโ Assignment_1/
โ โโโ Task_A...ipynb/ # PyTorch CNN experiments
โ โโโ Task_B...ipynb/ # NumPy-from-scratch CNN
โโโ Assignment_2/
โ โโโ Part_A/ # Seq2Seq & MinGPT translation
โ โโโ Part_B/ # Legal LLM fine-tuning
โโโ Reports # Detailed PDF writeups
โโโ README.md # You are here! ๐-
MinGPT: For transformer-based translation
-
LLaMA-Factory: LoRA + quantization toolkit
-
DISC-Law-SFT: Legal QA dataset
๐ Star this repo if you find it helpful!
๐ฌ Feedback? Open an issue โ let's build something awesome! ๐
