Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/gemini-omni-flash-video.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai-gemini': minor
---

Add Gemini Omni Flash (`gemini-omni-flash-preview`) video generation via the Interactions API. Omni only serves the Interactions API (`generateContent` rejects it), so the video adapter now routes by model: Veo models keep the `:predictLongRunning` operations flow, while `geminiVideo('gemini-omni-flash-preview')` creates a background interaction with `response_modalities: ['video']`, polls it by id, and returns the inline base64 MP4 as a `data:` URL (Files-API URI delivery passes through). Usage is mapped from the interaction's `output_tokens_by_modality`. Image and video prompt parts are sent as interaction content blocks, and `modelOptions.previous_interaction_id` chains a new prompt onto a prior Omni generation for conversational video editing. The top-level `size` option maps onto `response_format.aspect_ratio` (`'16:9' | '9:16'`) and `duration` onto `response_format.duration` — any value in the 3–10 second range (fractional seconds included, verified against the live API), defaulting to a 10-second clip when omitted. Raises the `@google/genai` floor to `^2.10.0` for the Interactions API surface.
5 changes: 5 additions & 0 deletions .changeset/video-adapter-duration-constraint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@tanstack/ai': patch
---

Fix `generateVideo` / `getVideoJobStatus` rejecting video adapters that declare a narrowed per-model duration union (e.g. Gemini's `4 | 6 | 8` for Veo or `10` for Omni Flash) at the type level. The activity's `TAdapter extends VideoAdapter<string, any, any, any>` constraints left the input-modality and duration generics at their defaults, so `duration?: number` failed contravariance against the adapter's literal union. All video-activity constraints and helper conditionals now span all six `VideoAdapter` generics.
2 changes: 1 addition & 1 deletion docs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@
"label": "Video Generation",
"to": "media/video-generation",
"addedAt": "2026-04-15",
"updatedAt": "2026-07-01"
"updatedAt": "2026-07-02"
},
{
"label": "Generation Hooks",
Expand Down
85 changes: 83 additions & 2 deletions docs/media/video-generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
title: Video Generation
id: video-generation
order: 6
description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
description: "Generate video from text prompts with OpenAI Sora, Google Veo, Gemini Omni Flash, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
keywords:
- tanstack ai
- video generation
- sora
- veo
- omni flash
- interactions api
- gemini
- grok imagine
- fal
Expand Down Expand Up @@ -40,7 +42,7 @@ TanStack AI provides experimental support for video generation through dedicated

Currently supported:
- **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
- **Google Gemini**: Veo 3.1 models (via the long-running operations API), and Gemini Omni Flash (via the Interactions API)
- **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
- **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models

Expand Down Expand Up @@ -569,6 +571,85 @@ Adapters that haven't declared a per-model duration map keep the plain
> Files API and requires your API key to download (send it as an
> `x-goog-api-key` header or `key` query parameter).

### Gemini Omni Flash (Interactions API) Model Options

Gemini Omni Flash (`gemini-omni-flash-preview`) is Google's multimodal
video-generation model with conversational editing. It only serves the
[Interactions API](https://ai.google.dev/gemini-api/docs/omni) — the same
`geminiVideo()` adapter routes it automatically: `generateVideo` creates a
background interaction, `getVideoJobStatus` polls it by id, and the
finished clip comes back **inline as a `data:video/mp4;base64,…` URL**
(when Google delivers by reference instead, the Files API URI passes
through and needs your API key to download, like Veo).

Clips are 720p at 24 FPS, and `duration` accepts any value in the **3–10
second** range (fractional seconds included), defaulting to 10 seconds when
omitted. `availableDurations()` reports
`{ kind: 'range', min: 3, max: 10, unit: 'seconds' }`; out-of-range
`duration` values are rejected at job creation, and `snapDuration(n)` snaps
raw seconds into the range (clamping to its bounds and rounding to whole
seconds). The `size` option maps onto the interaction's output aspect
ratio:

```typescript ignore
import { generateVideo, getVideoJobStatus } from '@tanstack/ai'
import { geminiVideo } from '@tanstack/ai-gemini'

const adapter = geminiVideo('gemini-omni-flash-preview')

const { jobId } = await generateVideo({
adapter,
prompt: 'A woman playing violin outdoors at golden hour',
size: '9:16', // aspect ratio: '16:9' (default) or '9:16'
duration: 6, // 3-10 seconds; omit for the 10s default
})

const status = await getVideoJobStatus({ adapter, jobId })
// status.url → 'data:video/mp4;base64,…' once completed
```

Image and video prompt parts are sent to the interaction as content blocks
— grouped as images, then videos, then the text prompt (Omni doesn't use
Veo's `metadata.role` routing) — so you can condition the generation on
stills or short reference clips. `data` sources
are sent inline as base64; `url` sources pass through as-is — the adapter
never downloads them, so use Gemini Files API URIs (upload large media via
the Files API first).

#### Conversational video editing

Omni's headline capability is iterative refinement: pass the interaction id
of a prior generation (its `jobId`) as
`modelOptions.previous_interaction_id` and describe the change — the model
edits the video while preserving everything you didn't mention:

```typescript ignore
import { generateVideo } from '@tanstack/ai'
import { geminiVideo } from '@tanstack/ai-gemini'

const adapter = geminiVideo('gemini-omni-flash-preview')

// Turn 1: generate
const first = await generateVideo({
adapter,
prompt: 'A woman playing violin outdoors at golden hour',
})

// …poll first.jobId to completion, then…

// Turn 2: edit the result conversationally
const second = await generateVideo({
adapter,
prompt: 'Make the violin invisible',
modelOptions: { previous_interaction_id: first.jobId },
})
```

`modelOptions` also passes through the Interactions API's request fields
(e.g. `generation_config.video_config.task` to pin
`'text_to_video' | 'image_to_video' | 'reference_to_video' | 'edit'`
instead of letting the model infer the task mode).

### Grok (xAI Imagine) Model Options

Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long.
Expand Down
8 changes: 4 additions & 4 deletions examples/ts-react-media/src/components/ImageGenerator.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ import type { MediaPrompt } from '@tanstack/ai/client'
import { generateImageFn } from '@/lib/server-functions'
import { getRandomImagePrompt } from '@/lib/prompts'
import { IMAGE_MODELS } from '@/lib/models'
import { readImageFile, toImagePart } from '@/lib/media'
import type { AttachedImage } from '@/lib/media'
import { readMediaFile, toImagePart } from '@/lib/media'
import type { AttachedMedia } from '@/lib/media'

interface ImageGeneratorProps {
onImageGenerated?: (imageUrl: string) => void
Expand Down Expand Up @@ -36,7 +36,7 @@ export default function ImageGenerator({
const [selectedModel, setSelectedModel] = useState<string>('all')
const [isLoading, setIsLoading] = useState(false)
const [results, setResults] = useState<Record<string, ModelResult>>({})
const [images, setImages] = useState<Array<AttachedImage>>([])
const [images, setImages] = useState<Array<AttachedMedia>>([])
const fileInputRef = useRef<HTMLInputElement>(null)

const currentModel = IMAGE_MODELS.find((m) => m.id === selectedModel)
Expand All @@ -56,7 +56,7 @@ export default function ImageGenerator({
const files = Array.from(e.target.files ?? [])
if (fileInputRef.current) fileInputRef.current.value = ''
if (files.length === 0) return
const attached = await Promise.all(files.map((file) => readImageFile(file)))
const attached = await Promise.all(files.map((file) => readMediaFile(file)))
setImages((prev) => [...prev, ...attached])
}

Expand Down
Loading
Loading