PortraitRetouch is an intelligent portrait retouching agent powered by Gemini-compatible image generation endpoints. It combines an interactive browser workspace with agent-style planning, aesthetic analysis, visual prompting, local retouch processors, and short-term session memory.
The system is designed for iterative portrait improvement rather than one-shot filter application. A user can describe an edit, mark a local region, select aesthetic priorities, or provide a reference style; the backend then builds the appropriate prompt and processing path for the selected workflow.
The main application is served by server.py and the static frontend in frontend/. The older demo.py Gradio interface is kept as an early prototype and is not the primary runtime.
- Instruction-driven retouching: Convert natural language requests into professional portrait-editing prompts.
- Brush-guided visual prompting: Draw over target regions to guide localized cleanup or retouching.
- Aesthetic agent mode: Analyze portraits across HumanAesExpert-style dimensions and improve low-scoring areas.
- Region-aware enhancement: Prioritize facial brightness, facial skin tone, body skin, facial structure, outfit, body shape, and environment.
- Reference-based style transfer: Apply the visual style of a reference image while preserving the subject's identity.
- Iterative editing loop: Chain each result as the next input for multi-step refinement.
- Short-term session memory: Carry recent edit context forward during a retouching session.
The project is organized around a small agent pipeline:
- Frontend interaction layer:
frontend/collects user instructions, brush masks, selected regions, reference images, and session state. - Request router:
server.pychooses the correct workflow endpoint: visual retouching, aesthetic enhancement, style transfer, or compatibility standard mode. - Planning and analysis:
src/agents/builds retouch prompts, evaluates portrait aesthetics, loads HumanAesExpert skill guidance, and renders memory context. - Image processing tools:
src/processors/handles resizing, parsing, local crop retouching, visual-mask processing, and deterministic body-skin correction. - Model client:
src/clients/gemini_client.pysends Gemini-compatiblegenerateContentrequests and normalizes returned images.
PortraitRetouch/
|-- server.py # FastAPI backend and API routes
|-- frontend/ # Browser UI served by FastAPI
| |-- index.html
| |-- app.js
| `-- style.css
|-- src/
| |-- clients/
| | `-- gemini_client.py # Gemini-compatible API wrapper
| |-- agents/
| | |-- retouch_planner.py # Prompt planning for standard edits
| | |-- aesthetic_analyzer.py # Aesthetic scoring logic
| | |-- aesthetic_analyzer_with_skills.py
| | `-- interaction_memory.py # Short-term session memory
| |-- processors/
| | |-- visual_crop_retouch.py # Mask/crop-based visual prompting
| | |-- face_local_retouch.py
| | |-- body_skin_adjust.py
| | |-- image_handler.py
| | `-- result_parser.py
| `-- prompts/
| `-- system_prompts.py
|-- skills/
| `-- human-aes-expert/ # Local aesthetic analysis skill material
|-- eval/
| `-- compute_metrics_by_mode.py # Evaluation summary utilities
|-- parallel_batch_processing.py # Batch generation workflow
|-- demo.py # Legacy Gradio prototype
|-- requirements.txt
`-- .env # Local API configuration, not committed
- Python 3.10 or newer
- A Gemini-compatible image generation API endpoint
- A valid API key for that endpoint
Install Python dependencies:
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txtCopy .env.example to .env, then replace the placeholder values.
Recommended variable names:
GEMINI_API_KEY=your_api_key_here
GEMINI_API_BASE=https://your-provider.example/v1beta
GEMINI_MODEL=gemini-2.5-flash-image
GEMINI_TIMEOUT_SECONDS=90
GEMINI_CONNECT_TIMEOUT_SECONDS=10
GEMINI_RETRIES=1
GEMINI_MAX_SIDE=1024
GEMINI_IMAGE_FORMAT=JPEG
GEMINI_IMAGE_QUALITY=90The loader also supports the legacy names used by earlier experiments:
API_KEY=your_api_key_here
ENDPOINT_BASE_URL=https://your-provider.example/v1beta
GeneralModel=gemini-2.5-flash-imageDo not commit real API keys. Keep provider-specific keys and endpoints in your local .env.
Start the FastAPI server:
.\.venv\Scripts\activate
python .\server.pyOpen:
http://127.0.0.1:8000
The frontend is served directly by FastAPI when the frontend/ directory exists.
| Endpoint | Method | Purpose |
|---|---|---|
/api/visual |
POST |
Main normal-mode endpoint. Handles full-image retouching and brush-guided visual prompting. |
/api/aesthetic |
POST |
Runs aesthetic analysis and optional automatic enhancement. |
/api/style-transfer |
POST |
Transfers style from one or more reference images to the target portrait. |
/api/standard |
POST |
Earlier standard retouching route retained for compatibility. |
- Upload a target portrait.
- Enter the desired edit, such as skin smoothing, blemish removal, lighting correction, or a stylistic adjustment.
- Optionally choose a style preset.
- Apply retouching.
If no marks are drawn, the agent performs a full-image edit. If marks are drawn on the canvas, the backend treats the canvas as a visual prompt and focuses on the marked region.
- Upload a target portrait.
- Enable or disable automatic enhancement.
- Set the threshold and intensity.
- Optionally select priority regions.
- Apply retouching.
The agent analyzes portrait quality dimensions, constructs a targeted improvement prompt, and applies deterministic post-processing for body-skin matching when that region is explicitly selected.
- Upload the target portrait.
- Upload a reference image.
- Add optional instructions.
- Apply style transfer.
The target image remains the identity anchor, while reference images provide style guidance.
parallel_batch_processing.py supports batch generation workflows for dataset creation and controlled experiments.
Evaluation outputs and helpers are stored in:
eval/compute_metrics_by_mode.pyevaluation_by_mode.csvevaluation_by_mode_summary.csvexperiment_log.csv
server.pyis the main local entry point for the intelligent retouching agent.frontend/contains the maintained browser interface.src/clients/gemini_client.pybuilds Gemini-compatiblegenerateContentrequests and handles retries, timeouts, response parsing, and image resizing.demo.pyis retained only as a historical Gradio prototype..envvalues are loaded throughutils/env_loader.py, with.envtaking precedence over existing shell variables for supported Gemini-related keys.
If Python reports that it cannot open a file named run, use:
python .\server.pyDo not use python run .\server.py; Python will interpret run as the script name.
If python-dotenv is missing, make sure the virtual environment is active and dependencies are installed:
.\.venv\Scripts\activate
pip install -r requirements.txt