Skip to content

WWIIITT/PortraitRetouch

Repository files navigation

Intelligent Portrait Retouching Agent

PortraitRetouch is an intelligent portrait retouching agent powered by Gemini-compatible image generation endpoints. It combines an interactive browser workspace with agent-style planning, aesthetic analysis, visual prompting, local retouch processors, and short-term session memory.

The system is designed for iterative portrait improvement rather than one-shot filter application. A user can describe an edit, mark a local region, select aesthetic priorities, or provide a reference style; the backend then builds the appropriate prompt and processing path for the selected workflow.

The main application is served by server.py and the static frontend in frontend/. The older demo.py Gradio interface is kept as an early prototype and is not the primary runtime.

Features

  • Instruction-driven retouching: Convert natural language requests into professional portrait-editing prompts.
  • Brush-guided visual prompting: Draw over target regions to guide localized cleanup or retouching.
  • Aesthetic agent mode: Analyze portraits across HumanAesExpert-style dimensions and improve low-scoring areas.
  • Region-aware enhancement: Prioritize facial brightness, facial skin tone, body skin, facial structure, outfit, body shape, and environment.
  • Reference-based style transfer: Apply the visual style of a reference image while preserving the subject's identity.
  • Iterative editing loop: Chain each result as the next input for multi-step refinement.
  • Short-term session memory: Carry recent edit context forward during a retouching session.

Agent Design

The project is organized around a small agent pipeline:

  1. Frontend interaction layer: frontend/ collects user instructions, brush masks, selected regions, reference images, and session state.
  2. Request router: server.py chooses the correct workflow endpoint: visual retouching, aesthetic enhancement, style transfer, or compatibility standard mode.
  3. Planning and analysis: src/agents/ builds retouch prompts, evaluates portrait aesthetics, loads HumanAesExpert skill guidance, and renders memory context.
  4. Image processing tools: src/processors/ handles resizing, parsing, local crop retouching, visual-mask processing, and deterministic body-skin correction.
  5. Model client: src/clients/gemini_client.py sends Gemini-compatible generateContent requests and normalizes returned images.

Project Structure

PortraitRetouch/
|-- server.py                         # FastAPI backend and API routes
|-- frontend/                         # Browser UI served by FastAPI
|   |-- index.html
|   |-- app.js
|   `-- style.css
|-- src/
|   |-- clients/
|   |   `-- gemini_client.py           # Gemini-compatible API wrapper
|   |-- agents/
|   |   |-- retouch_planner.py         # Prompt planning for standard edits
|   |   |-- aesthetic_analyzer.py      # Aesthetic scoring logic
|   |   |-- aesthetic_analyzer_with_skills.py
|   |   `-- interaction_memory.py      # Short-term session memory
|   |-- processors/
|   |   |-- visual_crop_retouch.py     # Mask/crop-based visual prompting
|   |   |-- face_local_retouch.py
|   |   |-- body_skin_adjust.py
|   |   |-- image_handler.py
|   |   `-- result_parser.py
|   `-- prompts/
|       `-- system_prompts.py
|-- skills/
|   `-- human-aes-expert/              # Local aesthetic analysis skill material
|-- eval/
|   `-- compute_metrics_by_mode.py     # Evaluation summary utilities
|-- parallel_batch_processing.py       # Batch generation workflow
|-- demo.py                            # Legacy Gradio prototype
|-- requirements.txt
`-- .env                              # Local API configuration, not committed

Requirements

  • Python 3.10 or newer
  • A Gemini-compatible image generation API endpoint
  • A valid API key for that endpoint

Install Python dependencies:

python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt

Configuration

Copy .env.example to .env, then replace the placeholder values.

Recommended variable names:

GEMINI_API_KEY=your_api_key_here
GEMINI_API_BASE=https://your-provider.example/v1beta
GEMINI_MODEL=gemini-2.5-flash-image
GEMINI_TIMEOUT_SECONDS=90
GEMINI_CONNECT_TIMEOUT_SECONDS=10
GEMINI_RETRIES=1
GEMINI_MAX_SIDE=1024
GEMINI_IMAGE_FORMAT=JPEG
GEMINI_IMAGE_QUALITY=90

The loader also supports the legacy names used by earlier experiments:

API_KEY=your_api_key_here
ENDPOINT_BASE_URL=https://your-provider.example/v1beta
GeneralModel=gemini-2.5-flash-image

Do not commit real API keys. Keep provider-specific keys and endpoints in your local .env.

Running The Application

Start the FastAPI server:

.\.venv\Scripts\activate
python .\server.py

Open:

http://127.0.0.1:8000

The frontend is served directly by FastAPI when the frontend/ directory exists.

API Endpoints

Endpoint Method Purpose
/api/visual POST Main normal-mode endpoint. Handles full-image retouching and brush-guided visual prompting.
/api/aesthetic POST Runs aesthetic analysis and optional automatic enhancement.
/api/style-transfer POST Transfers style from one or more reference images to the target portrait.
/api/standard POST Earlier standard retouching route retained for compatibility.

Agent Workflows

Normal Retouching

  1. Upload a target portrait.
  2. Enter the desired edit, such as skin smoothing, blemish removal, lighting correction, or a stylistic adjustment.
  3. Optionally choose a style preset.
  4. Apply retouching.

If no marks are drawn, the agent performs a full-image edit. If marks are drawn on the canvas, the backend treats the canvas as a visual prompt and focuses on the marked region.

Aesthetic Enhancement

  1. Upload a target portrait.
  2. Enable or disable automatic enhancement.
  3. Set the threshold and intensity.
  4. Optionally select priority regions.
  5. Apply retouching.

The agent analyzes portrait quality dimensions, constructs a targeted improvement prompt, and applies deterministic post-processing for body-skin matching when that region is explicitly selected.

Style Transfer

  1. Upload the target portrait.
  2. Upload a reference image.
  3. Add optional instructions.
  4. Apply style transfer.

The target image remains the identity anchor, while reference images provide style guidance.

Batch Processing And Evaluation

parallel_batch_processing.py supports batch generation workflows for dataset creation and controlled experiments.

Evaluation outputs and helpers are stored in:

  • eval/compute_metrics_by_mode.py
  • evaluation_by_mode.csv
  • evaluation_by_mode_summary.csv
  • experiment_log.csv

Development Notes

  • server.py is the main local entry point for the intelligent retouching agent.
  • frontend/ contains the maintained browser interface.
  • src/clients/gemini_client.py builds Gemini-compatible generateContent requests and handles retries, timeouts, response parsing, and image resizing.
  • demo.py is retained only as a historical Gradio prototype.
  • .env values are loaded through utils/env_loader.py, with .env taking precedence over existing shell variables for supported Gemini-related keys.

Troubleshooting

If Python reports that it cannot open a file named run, use:

python .\server.py

Do not use python run .\server.py; Python will interpret run as the script name.

If python-dotenv is missing, make sure the virtual environment is active and dependencies are installed:

.\.venv\Scripts\activate
pip install -r requirements.txt

About

An intelligent portrait retouching agent with a FastAPI backend and browser UI. It supports instruction-based portrait editing, brush-guided local retouching, aesthetic analysis-driven enhancement, style transfer, region-specific improvements, and iterative session memory using Gemini-compatible image generation APIs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors