TanStack · tombeckenham · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 3, 2026
diff --git a/.changeset/gemini-omni-flash-video.md b/.changeset/gemini-omni-flash-video.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-gemini': minor
+---
+
+Add Gemini Omni Flash (`gemini-omni-flash-preview`) video generation via the Interactions API. Omni only serves the Interactions API (`generateContent` rejects it), so the video adapter now routes by model: Veo models keep the `:predictLongRunning` operations flow, while `geminiVideo('gemini-omni-flash-preview')` creates a background interaction with `response_modalities: ['video']`, polls it by id, and returns the inline base64 MP4 as a `data:` URL (Files-API URI delivery passes through). Usage is mapped from the interaction's `output_tokens_by_modality`. Image and video prompt parts are sent as interaction content blocks, and `modelOptions.previous_interaction_id` chains a new prompt onto a prior Omni generation for conversational video editing. The top-level `size` option maps onto `response_format.aspect_ratio` (`'16:9' | '9:16'`) and `duration` onto `response_format.duration` — any value in the 3–10 second range (fractional seconds included, verified against the live API), defaulting to a 10-second clip when omitted. Raises the `@google/genai` floor to `^2.10.0` for the Interactions API surface.
diff --git a/.changeset/video-adapter-duration-constraint.md b/.changeset/video-adapter-duration-constraint.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai': patch
+---
+
+Fix `generateVideo` / `getVideoJobStatus` rejecting video adapters that declare a narrowed per-model duration union (e.g. Gemini's `4 | 6 | 8` for Veo or `10` for Omni Flash) at the type level. The activity's `TAdapter extends VideoAdapter<string, any, any, any>` constraints left the input-modality and duration generics at their defaults, so `duration?: number` failed contravariance against the adapter's literal union. All video-activity constraints and helper conditionals now span all six `VideoAdapter` generics.
diff --git a/docs/config.json b/docs/config.json
@@ -282,7 +282,7 @@
           "label": "Video Generation",
           "to": "media/video-generation",
           "addedAt": "2026-04-15",
-          "updatedAt": "2026-07-01"
+          "updatedAt": "2026-07-02"
         },
         {
           "label": "Generation Hooks",

diff --git a/docs/media/video-generation.md b/docs/media/video-generation.md
@@ -2,12 +2,14 @@
 title: Video Generation
 id: video-generation
 order: 6
-description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
+description: "Generate video from text prompts with OpenAI Sora, Google Veo, Gemini Omni Flash, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
 keywords:
   - tanstack ai
   - video generation
   - sora
   - veo
+  - omni flash
+  - interactions api
   - gemini
   - grok imagine
   - fal
@@ -40,7 +42,7 @@ TanStack AI provides experimental support for video generation through dedicated
 
 Currently supported:
 - **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
-- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
+- **Google Gemini**: Veo 3.1 models (via the long-running operations API), and Gemini Omni Flash (via the Interactions API)
 - **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
 - **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models
 
@@ -569,6 +571,85 @@ Adapters that haven't declared a per-model duration map keep the plain
 > Files API and requires your API key to download (send it as an
 > `x-goog-api-key` header or `key` query parameter).
 
+### Gemini Omni Flash (Interactions API) Model Options
+
+Gemini Omni Flash (`gemini-omni-flash-preview`) is Google's multimodal
+video-generation model with conversational editing. It only serves the
+[Interactions API](https://ai.google.dev/gemini-api/docs/omni) — the same
+`geminiVideo()` adapter routes it automatically: `generateVideo` creates a
+background interaction, `getVideoJobStatus` polls it by id, and the
+finished clip comes back **inline as a `data:video/mp4;base64,…` URL**
+(when Google delivers by reference instead, the Files API URI passes
+through and needs your API key to download, like Veo).
+
+Clips are 720p at 24 FPS, and `duration` accepts any value in the **3–10
+second** range (fractional seconds included), defaulting to 10 seconds when
+omitted. `availableDurations()` reports
+`{ kind: 'range', min: 3, max: 10, unit: 'seconds' }`; out-of-range
+`duration` values are rejected at job creation, and `snapDuration(n)` snaps
+raw seconds into the range (clamping to its bounds and rounding to whole
+seconds). The `size` option maps onto the interaction's output aspect
+ratio:
+
+```typescript ignore
+import { generateVideo, getVideoJobStatus } from '@tanstack/ai'
+import { geminiVideo } from '@tanstack/ai-gemini'
+
+const adapter = geminiVideo('gemini-omni-flash-preview')
+
+const { jobId } = await generateVideo({
+  adapter,
+  prompt: 'A woman playing violin outdoors at golden hour',
+  size: '9:16', // aspect ratio: '16:9' (default) or '9:16'
+  duration: 6, // 3-10 seconds; omit for the 10s default
+})
+
+const status = await getVideoJobStatus({ adapter, jobId })
+// status.url → 'data:video/mp4;base64,…' once completed
+```
+
+Image and video prompt parts are sent to the interaction as content blocks
+— grouped as images, then videos, then the text prompt (Omni doesn't use
+Veo's `metadata.role` routing) — so you can condition the generation on
+stills or short reference clips. `data` sources
+are sent inline as base64; `url` sources pass through as-is — the adapter
+never downloads them, so use Gemini Files API URIs (upload large media via
+the Files API first).
+
+#### Conversational video editing
+
+Omni's headline capability is iterative refinement: pass the interaction id
+of a prior generation (its `jobId`) as
+`modelOptions.previous_interaction_id` and describe the change — the model
+edits the video while preserving everything you didn't mention:
+
+```typescript ignore
+import { generateVideo } from '@tanstack/ai'
+import { geminiVideo } from '@tanstack/ai-gemini'
+
+const adapter = geminiVideo('gemini-omni-flash-preview')
+
+// Turn 1: generate
+const first = await generateVideo({
+  adapter,
+  prompt: 'A woman playing violin outdoors at golden hour',
+})
+
+// …poll first.jobId to completion, then…
+
+// Turn 2: edit the result conversationally
+const second = await generateVideo({
+  adapter,
+  prompt: 'Make the violin invisible',
+  modelOptions: { previous_interaction_id: first.jobId },
+})
+```
+
+`modelOptions` also passes through the Interactions API's request fields
+(e.g. `generation_config.video_config.task` to pin
+`'text_to_video' | 'image_to_video' | 'reference_to_video' | 'edit'`
+instead of letting the model infer the task mode).
+
 ### Grok (xAI Imagine) Model Options
 
 Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long.

diff --git a/examples/ts-react-media/src/components/ImageGenerator.tsx b/examples/ts-react-media/src/components/ImageGenerator.tsx
@@ -6,8 +6,8 @@ import type { MediaPrompt } from '@tanstack/ai/client'
 import { generateImageFn } from '@/lib/server-functions'
 import { getRandomImagePrompt } from '@/lib/prompts'
 import { IMAGE_MODELS } from '@/lib/models'
-import { readImageFile, toImagePart } from '@/lib/media'
-import type { AttachedImage } from '@/lib/media'
+import { readMediaFile, toImagePart } from '@/lib/media'
+import type { AttachedMedia } from '@/lib/media'
 
 interface ImageGeneratorProps {
   onImageGenerated?: (imageUrl: string) => void
@@ -36,7 +36,7 @@ export default function ImageGenerator({
   const [selectedModel, setSelectedModel] = useState<string>('all')
   const [isLoading, setIsLoading] = useState(false)
   const [results, setResults] = useState<Record<string, ModelResult>>({})
-  const [images, setImages] = useState<Array<AttachedImage>>([])
+  const [images, setImages] = useState<Array<AttachedMedia>>([])
   const fileInputRef = useRef<HTMLInputElement>(null)
 
   const currentModel = IMAGE_MODELS.find((m) => m.id === selectedModel)
@@ -56,7 +56,7 @@ export default function ImageGenerator({
     const files = Array.from(e.target.files ?? [])
     if (fileInputRef.current) fileInputRef.current.value = ''
     if (files.length === 0) return
-    const attached = await Promise.all(files.map((file) => readImageFile(file)))
+    const attached = await Promise.all(files.map((file) => readMediaFile(file)))
     setImages((prev) => [...prev, ...attached])
   }