Skip to content

aka-vm/ram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAM+

Python 3.12+ License: MIT Paper

A minimal, inference-only Python package for RAM++ (Recognize Anything Plus Model) — an open-set image tagger that can recognize any category with high accuracy using zero-shot generalization.

Based on the original recognize-anything by Xinyu Huang et al.

What is RAM++?

RAM++ is a vision-language model that generates semantic tags for images. It covers 4,585 common categories out of the box and generalizes to open-set categories it has never seen during training — significantly outperforming CLIP on tag recognition tasks.

This package strips out training, finetuning, and demo code — leaving a clean API for running inference.

Installation

pip install git+https://github.com/aka-vm/ram.git

With uv:

uv pip install git+https://github.com/aka-vm/ram.git

From source:

git clone https://github.com/aka-vm/ram
cd ram
uv sync          # or: pip install -e .

Model Download

The model (~850 MB) is hosted on HuggingFace and can be downloaded automatically:

from ram_plus import download_model

model_path = download_model()          # saves to ~/.cache/ram_plus/
model_path = download_model("./models")  # or a custom directory

Or manually from HuggingFace.

Usage

import cv2
from ram_plus import RamTagGenerator

# Auto-downloads model if model_path is not provided
generator = RamTagGenerator(device="cuda")

# Or point to a local checkpoint
generator = RamTagGenerator(model_path="./models/ram_plus_swin_large_14m.pth", device="cuda")

# Run on a single image (numpy HWC BGR, as returned by cv2)
image = cv2.imread("photo.jpg")
tags = generator(image)
print(tags)  # ['dog', 'grass', 'outdoors', ...]

# Batch inference
images = [cv2.imread(p) for p in image_paths]
batch_tags = generator(images)

# Sort tags by confidence
generator = RamTagGenerator(device="cuda", sort_tags=True)

# Pass a pre-normalized torch.Tensor directly (NCHW, float32, ImageNet-normalized)
tags = generator(tensor_batch)

Input Formats

Input Supported
np.ndarray (HWC BGR uint8) Single image
List[np.ndarray] Batch
torch.Tensor (NCHW float32, ImageNet-normalized) Pre-processed batch

Acknowledgements

Releases

No releases published

Packages

 
 
 

Contributors

Languages