VisionForge-AI is a diffusion-based AI image generation studio with self-refining generation, automatic output scoring, reference-image matching, portrait evaluation, benchmark tools, session exploration, and experiment reporting.
The project started as a portfolio-oriented image generation app and evolved into a broader image-generation system that can generate, evaluate, compare, refine, and document visual outputs.
VisionForge-AI is designed as a local AI image-generation workflow for software project visuals, creative image experiments, portrait reference matching, and iterative output improvement.
Instead of only generating one image from one prompt, the system can:
- Generate multiple candidates
- Evaluate each output automatically
- Select the best result
- Use the best result as the next input
- Mutate prompts to explore different visual directions
- Keep a reference image as an anchor
- Score portrait and face-like outputs
- Compare model presets
- Export experiment reports
- Explore self-refinement sessions visually
Generate images from structured prompts using diffusion models.
Features:
- Project-specific prompt presets
- Style presets
- Output-type presets
- Negative prompt builder
- Seed-based reproducibility
- Batch variations
- Metadata saving
Upload a reference image and guide the model with a prompt.
Features:
- Strength control
- Reference-image transformation
- Metadata tracking
- Local output storage
The self-refining generation loop is one of the main advanced features of this project.
Workflow:
- Generate multiple candidate images
- Score each candidate
- Select the best output
- Use the selected output as the next image-to-image input
- Repeat the process for multiple iterations
The system stores each iteration, candidate, score, prompt variant, parent source, and selected output.
To avoid getting stuck in one visual pattern, VisionForge-AI can generate prompt variants automatically.
Example prompt mutation directions:
- Network-control
- Molecular-glucose
- Minimal-control
- Data-field
- Holographic-grid
- Portrait-clean
- Portrait-studio
- Reference-structure
- Reference-colors
- Reference-detail
This makes the generation loop more exploratory and helps the system search for better visual directions.
VisionForge-AI supports optional CLIP-based scoring for:
- Prompt-image alignment
- Reference-image similarity
- Semantic matching
This helps the system rank outputs using both visual heuristics and semantic similarity.
Reference Match V2 improves reference-based generation by keeping the original image as an anchor during refinement.
The system can generate candidates from:
- The best previous output
- The original reference image
- A fresh prompt-based exploration path
This helps the loop stay closer to the target image instead of drifting too far away.
A dedicated workflow for portrait and face-like reference matching.
Features:
- Upload portrait reference image
- Generate multiple candidates
- Keep reference anchor in every iteration
- Use portrait-specific prompt mutation
- Score face quality
- Score face-reference similarity
- Track the best output over iterations
A dedicated evaluator page for portrait outputs.
It estimates:
- Face count
- Face quality
- Face-reference similarity
- Visual quality
- Final ranking score
The evaluator uses OpenCV-based face detection and heuristic visual scoring.
Note: this tool estimates visual similarity and face quality. It does not identify people.
Benchmark Studio compares different model presets or settings on the same prompt.
It can:
- Run the same prompt on multiple model presets
- Generate several candidates per model
- Score all outputs
- Show the best output per model
- Compare model performance in a table
VisionForge-AI can export experiment reports from generated outputs and metadata.
Supported exports:
- Markdown report
- JSON report
The report includes:
- Best output
- Top scored outputs
- Prompts
- Scores
- Reference similarity
- Face quality
- Session information
- Image paths
Session Explorer visualizes self-refinement runs.
It shows:
- Session IDs
- Iteration timeline
- Best output per iteration
- Candidate outputs
- Score changes
- Prompt labels
- Parent sources
- Raw metadata
The Project Dashboard provides a high-level overview of the whole system.
It shows:
- Total generated outputs
- Scored outputs
- Self-refinement sessions
- Face-aware outputs
- Best current output
- Project summaries
- Prompt variant summaries
- Session summaries
- System health table
VisionForge-AI currently includes presets for:
- GlucoPilot-RL
- ChessRL-Agent
- Habit Tracker
- MarketBoard
- VisionForge-AI
- Custom AI projects
The app supports:
- SD 1.5 Quality
- SD Turbo Fast
- Small SD model
- Technical test model
- Custom Hugging Face model ID
The technical test model is only for verifying that the pipeline works. For real outputs, use a production-quality model such as SD 1.5 Quality or another compatible diffusion model.
Available size presets include:
- GPU Quick Test
- Square Cover
- GitHub README Banner
- LinkedIn Post
- App Icon
- Website Hero
- Custom Size
The Streamlit app includes:
- Project Dashboard
- Generate
- Image-to-Image
- Output Gallery
- Prompt Lab
- Experiments
- Self-Refining Generation
- Face Evaluator
- Portrait Reference Studio
- Experiment Report
- Benchmark Studio
- Session Explorer
- Python
- PyTorch
- Hugging Face Diffusers
- Transformers
- CLIP
- Streamlit
- Pillow
- NumPy
- OpenCV
- Python-dotenv
Clone the repository:
git clone https://github.com/mohammad-azimi/VisionForge-AI.git
cd VisionForge-AICreate and activate a virtual environment:
python -m venv .venv
.venv\Scripts\activateInstall dependencies:
python -m pip install --upgrade pip
python -m pip install -r requirements.txtFor NVIDIA GPU acceleration, install a CUDA-compatible PyTorch build.
Example:
python -m pip uninstall torch torchvision torchaudio -y
python -m pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu118Verify CUDA:
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU')"python -m streamlit run app.pyThe app opens locally in the browser.
Model preset: SD Turbo Fast
Size preset: GPU Quick Test
Inference steps: 4
Guidance scale: 0
Model preset: SD 1.5 Quality
Size preset: Square Cover
Inference steps: 30-35
Guidance scale: 7.5
Refinement iterations: 3
Candidates per iteration: 4
Use CLIP semantic evaluator: enabled
Enable prompt mutation: enabled
Fresh exploration candidates per iteration: 1-2
Evaluation profile: Reference Match
Use CLIP semantic evaluator: enabled
Portrait reference mode: enabled
Keep reference anchor every iteration: enabled
Reference-anchor candidates per iteration: 2
Fresh exploration candidates per iteration: 0
Generated images and metadata are saved locally in:
outputs/
This folder is ignored by Git because generated outputs can become large.
Selected showcase examples can be copied into:
assets/examples/
VisionForge-AI/
├── app.py
├── README.md
├── requirements.txt
├── assets/
│ └── examples/
├── pages/
│ ├── 0_Project_Dashboard.py
│ ├── 1_Self_Refining_Generation.py
│ ├── 2_Face_Evaluator.py
│ ├── 3_Portrait_Reference_Studio.py
│ ├── 4_Experiment_Report.py
│ ├── 5_Benchmark_Studio.py
│ └── 6_Session_Explorer.py
└── src/
└── visionforge/
├── evaluator.py
├── face_evaluator.py
├── generator.py
├── history.py
├── presets.py
├── prompt_builder.py
├── prompt_mutator.py
├── prompt_tools.py
├── reporting.py
└── self_refiner.py
A typical self-refining workflow:
- Select a project or write a custom prompt
- Generate several candidates
- Automatically score the outputs
- Select the best candidate
- Refine the selected image through image-to-image
- Mutate prompts to explore alternatives
- Repeat for several iterations
- Review the session in Session Explorer
- Export the experiment report
VisionForge-AI is currently a functional MVP+ project.
It supports real diffusion models, GPU-accelerated generation, iterative self-refinement, prompt mutation, reference-aware refinement, portrait evaluation, benchmarking, session exploration, and report exporting.
Planned improvements:
- Face embedding similarity using a dedicated face-recognition model
- ControlNet support for layout-guided generation
- LoRA fine-tuning for a custom visual style
- Better artifact detection
- More advanced automatic prompt rewriting
- FastAPI backend version
- Model performance tracking over time
- Exportable HTML reports
- Optional cloud deployment
This project is intended for experimentation, portfolio visuals, and research-style image generation workflows.
For real-person portraits, use only images you own or have permission to use.
This project is licensed under the MIT License. See the LICENSE file for details.
