Intelligent Portrait Retouching Agent

PortraitRetouch is an intelligent portrait retouching agent powered by Gemini-compatible image generation endpoints. It combines an interactive browser workspace with agent-style planning, aesthetic analysis, visual prompting, local retouch processors, and short-term session memory.

The system is designed for iterative portrait improvement rather than one-shot filter application. A user can describe an edit, mark a local region, select aesthetic priorities, or provide a reference style; the backend then builds the appropriate prompt and processing path for the selected workflow.

The main application is served by server.py and the static frontend in frontend/. The older demo.py Gradio interface is kept as an early prototype and is not the primary runtime.

Features

Instruction-driven retouching: Convert natural language requests into professional portrait-editing prompts.
Brush-guided visual prompting: Draw over target regions to guide localized cleanup or retouching.
Aesthetic agent mode: Analyze portraits across HumanAesExpert-style dimensions and improve low-scoring areas.
Region-aware enhancement: Prioritize facial brightness, facial skin tone, body skin, facial structure, outfit, body shape, and environment.
Reference-based style transfer: Apply the visual style of a reference image while preserving the subject's identity.
Iterative editing loop: Chain each result as the next input for multi-step refinement.
Short-term session memory: Carry recent edit context forward during a retouching session.

Agent Design

The project is organized around a small agent pipeline:

Frontend interaction layer: frontend/ collects user instructions, brush masks, selected regions, reference images, and session state.
Request router: server.py chooses the correct workflow endpoint: visual retouching, aesthetic enhancement, style transfer, or compatibility standard mode.
Planning and analysis: src/agents/ builds retouch prompts, evaluates portrait aesthetics, loads HumanAesExpert skill guidance, and renders memory context.
Image processing tools: src/processors/ handles resizing, parsing, local crop retouching, visual-mask processing, and deterministic body-skin correction.
Model client: src/clients/gemini_client.py sends Gemini-compatible generateContent requests and normalizes returned images.

Project Structure

PortraitRetouch/
|-- server.py                         # FastAPI backend and API routes
|-- frontend/                         # Browser UI served by FastAPI
|   |-- index.html
|   |-- app.js
|   `-- style.css
|-- src/
|   |-- clients/
|   |   `-- gemini_client.py           # Gemini-compatible API wrapper
|   |-- agents/
|   |   |-- retouch_planner.py         # Prompt planning for standard edits
|   |   |-- aesthetic_analyzer.py      # Aesthetic scoring logic
|   |   |-- aesthetic_analyzer_with_skills.py
|   |   `-- interaction_memory.py      # Short-term session memory
|   |-- processors/
|   |   |-- visual_crop_retouch.py     # Mask/crop-based visual prompting
|   |   |-- face_local_retouch.py
|   |   |-- body_skin_adjust.py
|   |   |-- image_handler.py
|   |   `-- result_parser.py
|   `-- prompts/
|       `-- system_prompts.py
|-- skills/
|   `-- human-aes-expert/              # Local aesthetic analysis skill material
|-- eval/
|   `-- compute_metrics_by_mode.py     # Evaluation summary utilities
|-- parallel_batch_processing.py       # Batch generation workflow
|-- demo.py                            # Legacy Gradio prototype
|-- requirements.txt
`-- .env                              # Local API configuration, not committed

Requirements

Python 3.10 or newer
A Gemini-compatible image generation API endpoint
A valid API key for that endpoint

Install Python dependencies:

python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt

Configuration

Copy .env.example to .env, then replace the placeholder values.

Recommended variable names:

GEMINI_API_KEY=your_api_key_here
GEMINI_API_BASE=https://your-provider.example/v1beta
GEMINI_MODEL=gemini-2.5-flash-image
GEMINI_TIMEOUT_SECONDS=90
GEMINI_CONNECT_TIMEOUT_SECONDS=10
GEMINI_RETRIES=1
GEMINI_MAX_SIDE=1024
GEMINI_IMAGE_FORMAT=JPEG
GEMINI_IMAGE_QUALITY=90

The loader also supports the legacy names used by earlier experiments:

API_KEY=your_api_key_here
ENDPOINT_BASE_URL=https://your-provider.example/v1beta
GeneralModel=gemini-2.5-flash-image

Do not commit real API keys. Keep provider-specific keys and endpoints in your local .env.

Running The Application

Start the FastAPI server:

.\.venv\Scripts\activate
python .\server.py

Open:

http://127.0.0.1:8000

The frontend is served directly by FastAPI when the frontend/ directory exists.

API Endpoints

Endpoint	Method	Purpose
`/api/visual`	`POST`	Main normal-mode endpoint. Handles full-image retouching and brush-guided visual prompting.
`/api/aesthetic`	`POST`	Runs aesthetic analysis and optional automatic enhancement.
`/api/style-transfer`	`POST`	Transfers style from one or more reference images to the target portrait.
`/api/standard`	`POST`	Earlier standard retouching route retained for compatibility.

Agent Workflows

Normal Retouching

Upload a target portrait.
Enter the desired edit, such as skin smoothing, blemish removal, lighting correction, or a stylistic adjustment.
Optionally choose a style preset.
Apply retouching.

If no marks are drawn, the agent performs a full-image edit. If marks are drawn on the canvas, the backend treats the canvas as a visual prompt and focuses on the marked region.

Aesthetic Enhancement

Upload a target portrait.
Enable or disable automatic enhancement.
Set the threshold and intensity.
Optionally select priority regions.
Apply retouching.

The agent analyzes portrait quality dimensions, constructs a targeted improvement prompt, and applies deterministic post-processing for body-skin matching when that region is explicitly selected.

Style Transfer

Upload the target portrait.
Upload a reference image.
Add optional instructions.
Apply style transfer.

The target image remains the identity anchor, while reference images provide style guidance.

Batch Processing And Evaluation

parallel_batch_processing.py supports batch generation workflows for dataset creation and controlled experiments.

Evaluation outputs and helpers are stored in:

eval/compute_metrics_by_mode.py
evaluation_by_mode.csv
evaluation_by_mode_summary.csv
experiment_log.csv

Development Notes

server.py is the main local entry point for the intelligent retouching agent.
frontend/ contains the maintained browser interface.
src/clients/gemini_client.py builds Gemini-compatible generateContent requests and handles retries, timeouts, response parsing, and image resizing.
demo.py is retained only as a historical Gradio prototype.
.env values are loaded through utils/env_loader.py, with .env taking precedence over existing shell variables for supported Gemini-related keys.

Troubleshooting

If Python reports that it cannot open a file named run, use:

python .\server.py

Do not use python run .\server.py; Python will interpret run as the script name.

If python-dotenv is missing, make sure the virtual environment is active and dependencies are installed:

.\.venv\Scripts\activate
pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intelligent Portrait Retouching Agent

Features

Agent Design

Project Structure

Requirements

Configuration

Running The Application

API Endpoints

Agent Workflows

Normal Retouching

Aesthetic Enhancement

Style Transfer

Batch Processing And Evaluation

Development Notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
eval		eval
frontend		frontend
skills/human-aes-expert		skills/human-aes-expert
src		src
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
evaluation_by_mode.csv		evaluation_by_mode.csv
evaluation_by_mode_summary.csv		evaluation_by_mode_summary.csv
experiment_log.csv		experiment_log.csv
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

Intelligent Portrait Retouching Agent

Features

Agent Design

Project Structure

Requirements

Configuration

Running The Application

API Endpoints

Agent Workflows

Normal Retouching

Aesthetic Enhancement

Style Transfer

Batch Processing And Evaluation

Development Notes

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages