From deb1bc7183a6fdf8bfa14e44de97f8fa2796e31d Mon Sep 17 00:00:00 2001
From: Tom Beckenham <34339192+tombeckenham@users.noreply.github.com>
Date: Thu, 2 Jul 2026 20:30:33 +1000
Subject: [PATCH 1/4] feat(ai-gemini): support Gemini Omni Flash video
 generation via the Interactions API

Add gemini-omni-flash-preview to the Gemini video adapter. Omni only
serves the Interactions API (generateContent rejects it with 400), so
the adapter now routes by model: Veo models keep the
:predictLongRunning operations flow, while Omni creates a background
interaction with response_modalities: ['video'], polls it by id, and
returns the inline base64 MP4 as a data: URL (Files-API URI delivery
passes through). Usage maps from output_tokens_by_modality, size maps
onto response_format.aspect_ratio, and
modelOptions.previous_interaction_id chains conversational video edits.

- model-meta: GEMINI_OMNI_FLASH_PREVIEW ($0.10/sec video+audio output)
  + GEMINI_INTERACTIONS_VIDEO_MODELS
- provider options: GeminiOmniVideoProviderOptions derived from the
  SDK's CreateModelInteractionParamsNonStreaming; per-model input
  modalities (Omni accepts image+video parts) and fixed 10s duration
- @google/genai floor bumped to ^2.10.0 for the interactions surface
- 17 new unit tests; new interactions-video E2E feature backed by a
  dedicated aimock mount (native interactions text handling untouched)
- docs/media/video-generation.md + media-generation skill updates

Verified live against the Gemini API: background job completed in ~45s
and returned a valid MP4 with video-modality usage; the SDK's typed
interactions.create works with Step-list input, so no raw REST
fallback is needed.

Closes #871

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .changeset/gemini-omni-flash-video.md         |   5 +
 docs/config.json                              |   2 +-
 docs/media/video-generation.md                |  78 +++-
 packages/ai-gemini/package.json               |   2 +-
 packages/ai-gemini/src/adapters/video.ts      | 273 ++++++++++++-
 packages/ai-gemini/src/index.ts               |   8 +-
 packages/ai-gemini/src/model-meta.ts          |  47 ++-
 .../src/video/video-provider-options.ts       | 102 ++++-
 .../ai-gemini/tests/video-adapter.test.ts     | 368 ++++++++++++++++++
 .../skills/ai-core/media-generation/SKILL.md  |  27 ++
 pnpm-lock.yaml                                |   2 +-
 testing/e2e/global-setup.ts                   |  92 +++++
 testing/e2e/src/components/VideoGenUI.tsx     |   9 +-
 testing/e2e/src/lib/feature-support.ts        |   5 +
 testing/e2e/src/lib/features.ts               |   4 +
 testing/e2e/src/lib/media-providers.ts        |  13 +
 testing/e2e/src/lib/server-functions.ts       |   2 +
 testing/e2e/src/lib/types.ts                  |   2 +
 testing/e2e/src/routes/$provider/$feature.tsx |  11 +
 testing/e2e/src/routes/api.video.stream.ts    |  12 +-
 testing/e2e/src/routes/api.video.ts           |  12 +-
 testing/e2e/tests/interactions-video.spec.ts  |  76 ++++
 22 files changed, 1107 insertions(+), 45 deletions(-)
 create mode 100644 .changeset/gemini-omni-flash-video.md
 create mode 100644 testing/e2e/tests/interactions-video.spec.ts

diff --git a/.changeset/gemini-omni-flash-video.md b/.changeset/gemini-omni-flash-video.md
new file mode 100644
index 000000000..dbd040bd3
--- /dev/null
+++ b/.changeset/gemini-omni-flash-video.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai-gemini': minor
+---
+
+Add Gemini Omni Flash (`gemini-omni-flash-preview`) video generation via the Interactions API. Omni only serves the Interactions API (`generateContent` rejects it), so the video adapter now routes by model: Veo models keep the `:predictLongRunning` operations flow, while `geminiVideo('gemini-omni-flash-preview')` creates a background interaction with `response_modalities: ['video']`, polls it by id, and returns the inline base64 MP4 as a `data:` URL (Files-API URI delivery passes through). Usage is mapped from the interaction's `output_tokens_by_modality`. Image and video prompt parts are sent as interaction content blocks, and `modelOptions.previous_interaction_id` chains a new prompt onto a prior Omni generation for conversational video editing. The top-level `size` option maps onto `response_format.aspect_ratio` (`'16:9' | '9:16'`); clips are a fixed 10 seconds today. Raises the `@google/genai` floor to `^2.10.0` for the Interactions API surface.
diff --git a/docs/config.json b/docs/config.json
index 0e8982869..7a52e405c 100644
--- a/docs/config.json
+++ b/docs/config.json
@@ -282,7 +282,7 @@
           "label": "Video Generation",
           "to": "media/video-generation",
           "addedAt": "2026-04-15",
-          "updatedAt": "2026-07-01"
+          "updatedAt": "2026-07-02"
         },
         {
           "label": "Generation Hooks",
diff --git a/docs/media/video-generation.md b/docs/media/video-generation.md
index 408bae527..2b4645986 100644
--- a/docs/media/video-generation.md
+++ b/docs/media/video-generation.md
@@ -2,12 +2,14 @@
 title: Video Generation
 id: video-generation
 order: 6
-description: "Generate video from text prompts with OpenAI Sora, Google Veo, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
+description: "Generate video from text prompts with OpenAI Sora, Google Veo, Gemini Omni Flash, xAI Grok Imagine, or fal.ai using TanStack AI's experimental generateVideo() jobs/polling API."
 keywords:
   - tanstack ai
   - video generation
   - sora
   - veo
+  - omni flash
+  - interactions api
   - gemini
   - grok imagine
   - fal
@@ -40,7 +42,7 @@ TanStack AI provides experimental support for video generation through dedicated
 
 Currently supported:
 - **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
-- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API)
+- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API), and Gemini Omni Flash (via the Interactions API)
 - **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
 - **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models
 
@@ -569,6 +571,78 @@ Adapters that haven't declared a per-model duration map keep the plain
 > Files API and requires your API key to download (send it as an
 > `x-goog-api-key` header or `key` query parameter).
 
+### Gemini Omni Flash (Interactions API) Model Options
+
+Gemini Omni Flash (`gemini-omni-flash-preview`) is Google's multimodal
+video-generation model with conversational editing. It only serves the
+[Interactions API](https://ai.google.dev/gemini-api/docs/omni) — the same
+`geminiVideo()` adapter routes it automatically: `generateVideo` creates a
+background interaction, `getVideoJobStatus` polls it by id, and the
+finished clip comes back **inline as a `data:video/mp4;base64,…` URL**
+(when Google delivers by reference instead, the Files API URI passes
+through and needs your API key to download, like Veo).
+
+Clips are 720p at 24 FPS and a fixed **10 seconds** today (`duration` is
+typed as `10`; `snapDuration(n)` always returns `10`). The `size` option
+maps onto the interaction's output aspect ratio:
+
+```typescript ignore
+import { generateVideo, getVideoJobStatus } from '@tanstack/ai'
+import { geminiVideo } from '@tanstack/ai-gemini'
+
+const adapter = geminiVideo('gemini-omni-flash-preview')
+
+const { jobId } = await generateVideo({
+  adapter,
+  prompt: 'A woman playing violin outdoors at golden hour',
+  size: '9:16', // aspect ratio: '16:9' (default) or '9:16'
+})
+
+const status = await getVideoJobStatus({ adapter, jobId })
+// status.url → 'data:video/mp4;base64,…' once completed
+```
+
+Image and video prompt parts are sent to the interaction as content blocks
+in order (Omni doesn't use Veo's `metadata.role` routing), so you can
+condition the generation on stills or short reference clips. `data` sources
+are sent inline as base64; `url` sources pass through as-is — the adapter
+never downloads them, so use Gemini Files API URIs (upload large media via
+the Files API first).
+
+#### Conversational video editing
+
+Omni's headline capability is iterative refinement: pass the interaction id
+of a prior generation (its `jobId`) as
+`modelOptions.previous_interaction_id` and describe the change — the model
+edits the video while preserving everything you didn't mention:
+
+```typescript ignore
+import { generateVideo } from '@tanstack/ai'
+import { geminiVideo } from '@tanstack/ai-gemini'
+
+const adapter = geminiVideo('gemini-omni-flash-preview')
+
+// Turn 1: generate
+const first = await generateVideo({
+  adapter,
+  prompt: 'A woman playing violin outdoors at golden hour',
+})
+
+// …poll first.jobId to completion, then…
+
+// Turn 2: edit the result conversationally
+const second = await generateVideo({
+  adapter,
+  prompt: 'Make the violin invisible',
+  modelOptions: { previous_interaction_id: first.jobId },
+})
+```
+
+`modelOptions` also passes through the Interactions API's request fields
+(e.g. `generation_config.video_config.task` to pin
+`'text_to_video' | 'image_to_video' | 'reference_to_video' | 'edit'`
+instead of letting the model infer the task mode).
+
 ### Grok (xAI Imagine) Model Options
 
 Based on the [xAI video generation API](https://docs.x.ai/docs/guides/video-generations). Two models are available: `grok-imagine-video` (v1.0) supports **text-to-video and image-to-video**, while `grok-imagine-video-1.5` is **image-to-video only** (a text-only prompt is rejected by the API; the adapter throws a clear error pointing you at `grok-imagine-video`). Both are aspect-ratio sized — the generic `size` option takes an `aspectRatio_resolution` template (like the Grok Imagine image models), and clips can be 1–15 seconds long.
diff --git a/packages/ai-gemini/package.json b/packages/ai-gemini/package.json
index f1b18bdd7..456ac00c5 100644
--- a/packages/ai-gemini/package.json
+++ b/packages/ai-gemini/package.json
@@ -64,7 +64,7 @@
     "text-to-speech"
   ],
   "dependencies": {
-    "@google/genai": "^2.8.0",
+    "@google/genai": "^2.10.0",
     "@tanstack/ai-utils": "workspace:*",
     "partial-json": "^0.1.7"
   },
diff --git a/packages/ai-gemini/src/adapters/video.ts b/packages/ai-gemini/src/adapters/video.ts
index b6935e503..75c25a69f 100644
--- a/packages/ai-gemini/src/adapters/video.ts
+++ b/packages/ai-gemini/src/adapters/video.ts
@@ -6,13 +6,18 @@ import { resolveMediaPrompt } from '@tanstack/ai'
 import { BaseVideoAdapter, snapToDurationOption } from '@tanstack/ai/adapters'
 import { arrayBufferToBase64 } from '@tanstack/ai-utils'
 import { createGeminiClient, getGeminiApiKeyFromEnv } from '../utils'
-import { getGeminiVideoDurationOptions } from '../video/video-provider-options'
+import {
+  getGeminiVideoDurationOptions,
+  isInteractionsVideoModel,
+} from '../video/video-provider-options'
 import type { DurationOptions } from '@tanstack/ai/adapters'
 import type {
   ImagePart,
   MediaInputMetadata,
+  TokenUsage,
   VideoGenerationOptions,
   VideoJobResult,
+  VideoPart,
   VideoStatusResult,
   VideoUrlResult,
 } from '@tanstack/ai'
@@ -20,9 +25,11 @@ import type {
   GenerateVideosConfig,
   GoogleGenAI,
   Image,
+  Interactions,
   VideoGenerationReferenceImage,
 } from '@google/genai'
 import type {
+  GeminiOmniVideoProviderOptions,
   GeminiVideoModel,
   GeminiVideoModelDurationByName,
   GeminiVideoModelInputModalitiesByName,
@@ -33,6 +40,9 @@ import type {
 } from '../video/video-provider-options'
 import type { GeminiClientConfig } from '../utils'
 
+type Interaction = Interactions.Interaction
+type InteractionContent = Interactions.Content
+
 /**
  * Configuration for Gemini video adapter.
  *
@@ -99,31 +109,114 @@ async function imagePartToVeoImage(
 }
 
 /**
- * Gemini Veo Video Generation Adapter
+ * Convert an image or video prompt part into an Interactions API content
+ * block. Data sources become inline base64 `data`; URL sources pass through
+ * as `uri` (Files API URIs — mirrors the Interactions text adapter).
+ */
+function mediaPartToInteractionsContent(
+  part: ImagePart<MediaInputMetadata> | VideoPart<MediaInputMetadata>,
+): InteractionContent {
+  const mimeType = part.source.mimeType
+  if (part.type === 'image') {
+    return part.source.type === 'data'
+      ? { type: 'image', data: part.source.value, mime_type: mimeType }
+      : { type: 'image', uri: part.source.value, mime_type: mimeType }
+  }
+  return part.source.type === 'data'
+    ? { type: 'video', data: part.source.value, mime_type: mimeType }
+    : { type: 'video', uri: part.source.value, mime_type: mimeType }
+}
+
+/**
+ * Pull the generated video out of a completed interaction. Prefers the
+ * SDK's `output_video` sugar, then walks `steps` back-to-front for the last
+ * `model_output` step carrying a video content block (the wire shape the
+ * raw REST response uses).
+ */
+function extractInteractionVideo(
+  interaction: Interaction,
+): { data?: string; uri?: string; mimeType: string } | undefined {
+  const direct = interaction.output_video
+  if (direct && (direct.data || direct.uri)) {
+    return {
+      data: direct.data,
+      uri: direct.uri,
+      mimeType: direct.mime_type || 'video/mp4',
+    }
+  }
+  const steps = interaction.steps ?? []
+  for (let i = steps.length - 1; i >= 0; i--) {
+    const step = steps[i]
+    if (step?.type !== 'model_output') continue
+    for (const block of step.content ?? []) {
+      if (block.type === 'video' && (block.data || block.uri)) {
+        return {
+          data: block.data,
+          uri: block.uri,
+          mimeType: block.mime_type || 'video/mp4',
+        }
+      }
+    }
+  }
+  return undefined
+}
+
+/**
+ * Map Interactions usage onto the canonical TokenUsage shape. Omni reports
+ * video output via `output_tokens_by_modality`; fall back to the video
+ * modality entry when the total is absent.
+ */
+function interactionUsageToTokenUsage(
+  usage: Interaction['usage'],
+): TokenUsage | undefined {
+  if (!usage) return undefined
+  const videoTokens = usage.output_tokens_by_modality?.find(
+    (entry) => entry.modality === 'video',
+  )?.tokens
+  const promptTokens = usage.total_input_tokens ?? 0
+  const completionTokens = usage.total_output_tokens ?? videoTokens ?? 0
+  return {
+    promptTokens,
+    completionTokens,
+    totalTokens: usage.total_tokens ?? promptTokens + completionTokens,
+  }
+}
+
+/**
+ * Gemini Video Generation Adapter (Veo + Gemini Omni Flash)
  *
- * Tree-shakeable adapter for Google Veo video generation. Veo runs as a
- * long-running operation: `createVideoJob` starts the operation via the
- * `:predictLongRunning` endpoint, `getVideoStatus` polls it, and
- * `getVideoUrl` extracts the generated video's URI once it completes.
+ * Tree-shakeable adapter for Google video generation, routing by model:
  *
- * Image prompt parts are routed by `metadata.role`:
+ * **Veo models** run as a long-running operation: `createVideoJob` starts
+ * the operation via the `:predictLongRunning` endpoint, `getVideoStatus`
+ * polls it, and `getVideoUrl` extracts the generated video's URI once it
+ * completes. Image prompt parts are routed by `metadata.role`:
  * - `'start_frame'` (or the first un-roled image) → the input image the
  *   video starts from
  * - `'end_frame'` → `lastFrame` (the frame the video ends on)
  * - `'reference'` / `'character'` → `referenceImages` (asset references,
  *   Veo 3.1)
  *
- * Note: the returned video URI is served by the Gemini Files API and
+ * Note: the returned Veo video URI is served by the Gemini Files API and
  * requires the API key (`x-goog-api-key` header or `?key=` query
  * parameter) to download.
  *
+ * **Gemini Omni Flash** (`gemini-omni-flash-preview`) only serves the
+ * Interactions API: `createVideoJob` creates a background interaction with
+ * `response_modalities: ['video']`, `getVideoStatus` polls it by id, and
+ * `getVideoUrl` returns the inline base64 MP4 as a `data:` URL (or the
+ * Files API URI when the server delivers by reference). Image and video
+ * prompt parts are sent as interaction content blocks in order; pass
+ * `modelOptions.previous_interaction_id` to conversationally edit a prior
+ * Omni generation.
+ *
  * @experimental Video generation is an experimental feature and may change.
  */
 export class GeminiVideoAdapter<
   TModel extends GeminiVideoModel,
 > extends BaseVideoAdapter<
   TModel,
-  GeminiVideoProviderOptions,
+  GeminiVideoModelProviderOptionsByName[TModel],
   GeminiVideoModelProviderOptionsByName,
   GeminiVideoModelSizeByName,
   GeminiVideoModelInputModalitiesByName,
@@ -140,18 +233,25 @@ export class GeminiVideoAdapter<
 
   async createVideoJob(
     options: VideoGenerationOptions<
-      GeminiVideoProviderOptions,
+      GeminiVideoModelProviderOptionsByName[TModel],
       GeminiVideoSize,
       GeminiVideoModelDurationByName[TModel]
     >,
   ): Promise<VideoJobResult> {
-    const { prompt, size, duration, modelOptions, logger } = options
+    const { prompt, size, duration, logger } = options
 
     logger.request(
       `activity=video.create provider=${this.name} model=${this.model} size=${size ?? 'default'} duration=${duration ?? 'default'}`,
       { provider: this.name, model: this.model },
     )
 
+    if (isInteractionsVideoModel(this.model)) {
+      return await this.createInteractionsVideoJob(options)
+    }
+    const modelOptions = options.modelOptions as
+      | GeminiVideoProviderOptions
+      | undefined
+
     try {
       const resolved = resolveMediaPrompt(prompt)
 
@@ -201,6 +301,75 @@ export class GeminiVideoAdapter<
     }
   }
 
+  /**
+   * Gemini Omni Flash job creation via the Interactions API. Creates a
+   * background interaction requesting video output; the interaction id is
+   * the job id polled by `getVideoStatus` / `getVideoUrl`.
+   */
+  private async createInteractionsVideoJob(
+    options: VideoGenerationOptions<
+      GeminiVideoModelProviderOptionsByName[TModel],
+      GeminiVideoSize,
+      GeminiVideoModelDurationByName[TModel]
+    >,
+  ): Promise<VideoJobResult> {
+    const { prompt, size, logger } = options
+    const modelOptions = options.modelOptions as
+      | GeminiOmniVideoProviderOptions
+      | undefined
+
+    try {
+      const resolved = resolveMediaPrompt(prompt)
+
+      if (resolved.audios.length > 0) {
+        throw new Error(
+          `${this.name}.createVideoJob does not support audio prompt parts (model: ${this.model}).`,
+        )
+      }
+
+      const content: Array<InteractionContent> = [
+        ...resolved.images.map(mediaPartToInteractionsContent),
+        ...resolved.videos.map(mediaPartToInteractionsContent),
+      ]
+      if (resolved.text) {
+        content.push({ type: 'text', text: resolved.text })
+      }
+      if (content.length === 0) {
+        throw new Error(
+          `${this.name}.createVideoJob: the prompt produced no content to send (model: ${this.model}).`,
+        )
+      }
+
+      const interaction = await this.client.interactions.create({
+        ...modelOptions,
+        model: this.model,
+        input: [{ type: 'user_input', content }],
+        response_modalities: ['video'],
+        background: true,
+        // Omni's clip length is fixed (10s) and not a request field, so the
+        // typed `duration` option is compile-time-only here. Aspect ratio is
+        // the one output knob the API exposes today.
+        ...(size !== undefined && {
+          response_format: { type: 'video' as const, aspect_ratio: size },
+        }),
+      })
+
+      if (!interaction.id) {
+        throw new Error(
+          'Gemini Omni did not return an interaction id for the video generation job.',
+        )
+      }
+
+      return { jobId: interaction.id, model: this.model }
+    } catch (error) {
+      logger.errors(`${this.name}.createVideoJob fatal`, {
+        error,
+        source: `${this.name}.createVideoJob`,
+      })
+      throw error
+    }
+  }
+
   /**
    * Route image prompt parts onto Veo's request fields by `metadata.role`.
    */
@@ -257,6 +426,9 @@ export class GeminiVideoAdapter<
   }
 
   async getVideoStatus(jobId: string): Promise<VideoStatusResult> {
+    if (isInteractionsVideoModel(this.model)) {
+      return await this.getInteractionsVideoStatus(jobId)
+    }
     const operation = await this.getOperation(jobId)
 
     if (!operation.done) {
@@ -289,7 +461,43 @@ export class GeminiVideoAdapter<
     return { jobId, status: 'completed' }
   }
 
+  /**
+   * Poll an Omni background interaction. `in_progress` maps to
+   * 'processing'; a `completed` interaction with no video content (e.g.
+   * filtered output) is surfaced as a failure so `getVideoUrl` doesn't
+   * throw on an empty response.
+   */
+  private async getInteractionsVideoStatus(
+    jobId: string,
+  ): Promise<VideoStatusResult> {
+    const interaction = await this.getInteraction(jobId)
+    const status = interaction.status
+
+    if (status === 'in_progress' || status === 'requires_action') {
+      return { jobId, status: 'processing' }
+    }
+    if (status === 'completed') {
+      if (!extractInteractionVideo(interaction)) {
+        return {
+          jobId,
+          status: 'failed',
+          error:
+            'Gemini Omni completed the interaction without returning a video (the output may have been filtered).',
+        }
+      }
+      return { jobId, status: 'completed' }
+    }
+    return {
+      jobId,
+      status: 'failed',
+      error: `Gemini Omni video generation ended with status "${status}".`,
+    }
+  }
+
   async getVideoUrl(jobId: string): Promise<VideoUrlResult> {
+    if (isInteractionsVideoModel(this.model)) {
+      return await this.getInteractionsVideoUrl(jobId)
+    }
     const operation = await this.getOperation(jobId)
 
     if (!operation.done) {
@@ -317,6 +525,42 @@ export class GeminiVideoAdapter<
     return { jobId, url: uri }
   }
 
+  /**
+   * Extract the finished Omni video. Inline base64 output (the API default)
+   * becomes a `data:` URL — matching the OpenAI Sora adapter's inline
+   * delivery — and URI delivery passes through (Files API URIs need the API
+   * key to download, like Veo). Usage carries the video-modality output
+   * tokens (Omni bills per second of video, reported as tokens).
+   */
+  private async getInteractionsVideoUrl(
+    jobId: string,
+  ): Promise<VideoUrlResult> {
+    const interaction = await this.getInteraction(jobId)
+    const status = interaction.status
+
+    if (status === 'in_progress' || status === 'requires_action') {
+      throw new Error(
+        `Video is not ready yet. Check status first. Job ID: ${jobId}`,
+      )
+    }
+    if (status !== 'completed') {
+      throw new Error(
+        `Video generation failed: Gemini Omni interaction ended with status "${status}". Job ID: ${jobId}`,
+      )
+    }
+
+    const video = extractInteractionVideo(interaction)
+    if (!video) {
+      throw new Error(
+        `Video not found in interaction response (the output may have been filtered). Job ID: ${jobId}`,
+      )
+    }
+
+    const usage = interactionUsageToTokenUsage(interaction.usage)
+    const url = video.uri ?? `data:${video.mimeType};base64,${video.data}`
+    return { jobId, url, ...(usage && { usage }) }
+  }
+
   override availableDurations(): DurationOptions<
     GeminiVideoModelDurationByName[TModel]
   > {
@@ -340,6 +584,13 @@ export class GeminiVideoAdapter<
     operation.name = jobId
     return await this.client.operations.getVideosOperation({ operation })
   }
+
+  /**
+   * Fetch an Omni background interaction by id.
+   */
+  private async getInteraction(jobId: string): Promise<Interaction> {
+    return await this.client.interactions.get(jobId)
+  }
 }
 
 /**
diff --git a/packages/ai-gemini/src/index.ts b/packages/ai-gemini/src/index.ts
index 462de4067..d8733709e 100644
--- a/packages/ai-gemini/src/index.ts
+++ b/packages/ai-gemini/src/index.ts
@@ -61,9 +61,9 @@ export {
   type GeminiAudioProviderOptions,
 } from './adapters/audio'
 
-// Video / Veo generation adapter (experimental)
+// Video generation adapter — Veo + Gemini Omni Flash (experimental)
 /**
- * @experimental Veo video generation is an experimental feature and may change.
+ * @experimental Video generation is an experimental feature and may change.
  */
 export {
   GeminiVideoAdapter,
@@ -74,8 +74,11 @@ export {
 export {
   GEMINI_VIDEO_DURATIONS,
   getGeminiVideoDurationOptions,
+  isInteractionsVideoModel,
 } from './video/video-provider-options'
 export type {
+  GeminiInteractionsVideoModel,
+  GeminiOmniVideoProviderOptions,
   GeminiVideoModel,
   GeminiVideoModelDurationByName,
   GeminiVideoModelInputModalitiesByName,
@@ -96,6 +99,7 @@ export { GEMINI_TTS_MODELS as GeminiTTSModels } from './model-meta'
 export { GEMINI_TTS_VOICES as GeminiTTSVoices } from './model-meta'
 export { GEMINI_AUDIO_MODELS as GeminiAudioModels } from './model-meta'
 export { GEMINI_VIDEO_MODELS as GeminiVideoModels } from './model-meta'
+export { GEMINI_INTERACTIONS_VIDEO_MODELS as GeminiInteractionsVideoModels } from './model-meta'
 export type { GeminiModels as GeminiTextModel } from './model-meta'
 export type { GeminiImageModels as GeminiImageModel } from './model-meta'
 export type { GeminiTTSVoice } from './model-meta'
diff --git a/packages/ai-gemini/src/model-meta.ts b/packages/ai-gemini/src/model-meta.ts
index 67a7fc574..7174d38a2 100644
--- a/packages/ai-gemini/src/model-meta.ts
+++ b/packages/ai-gemini/src/model-meta.ts
@@ -712,6 +712,37 @@ const VEO_3_1_LITE_PREVIEW = {
     GeminiCachedContentOptions
 >
 
+/**
+ * Gemini Omni Flash — multimodal video generation with conversational
+ * editing. Serves only the Interactions API (`generateContent` rejects it),
+ * so it routes through the interactions-based path of the video adapter,
+ * not Veo's `:predictLongRunning` flow. Pricing is per second of generated
+ * video ($0.10/sec). 720p / 24 FPS, 10-second clips.
+ * @experimental Omni video generation is an experimental feature and may change.
+ */
+const GEMINI_OMNI_FLASH_PREVIEW = {
+  name: 'gemini-omni-flash-preview',
+  max_input_tokens: 1_048_576,
+  max_output_tokens: 1,
+  supports: {
+    input: ['text', 'image', 'video'],
+    output: ['video', 'audio'],
+  },
+  pricing: {
+    input: {
+      normal: 0,
+    },
+    output: {
+      normal: 0.1,
+    },
+  },
+} as const satisfies ModelMeta<
+  GeminiToolConfigOptions &
+    GeminiSafetyOptions &
+    GeminiCommonConfigOptions &
+    GeminiCachedContentOptions
+>
+
 const GEMINI_3_5_FLASH = {
   name: 'gemini-3.5-flash',
   max_input_tokens: 1_048_576,
@@ -845,13 +876,25 @@ export const GEMINI_TTS_VOICES = [
 export type GeminiTTSVoice = (typeof GEMINI_TTS_VOICES)[number]
 
 /**
- * Veo video generation models.
- * @experimental Veo video generation is an experimental feature and may change.
+ * Video generation models. Veo models run on the long-running
+ * `:predictLongRunning` flow; Gemini Omni Flash runs on the Interactions
+ * API — the video adapter routes by model.
+ * @experimental Video generation is an experimental feature and may change.
  */
 export const GEMINI_VIDEO_MODELS = [
   VEO_3_1_PREVIEW.name,
   VEO_3_1_FAST_PREVIEW.name,
   VEO_3_1_LITE_PREVIEW.name,
+  GEMINI_OMNI_FLASH_PREVIEW.name,
+] as const
+
+/**
+ * Video models served by the Interactions API rather than Veo's
+ * `:predictLongRunning` operations flow.
+ * @experimental Omni video generation is an experimental feature and may change.
+ */
+export const GEMINI_INTERACTIONS_VIDEO_MODELS = [
+  GEMINI_OMNI_FLASH_PREVIEW.name,
 ] as const
 
 // Manual type map for per-model provider options
diff --git a/packages/ai-gemini/src/video/video-provider-options.ts b/packages/ai-gemini/src/video/video-provider-options.ts
index 1daee974b..a99ae4d6c 100644
--- a/packages/ai-gemini/src/video/video-provider-options.ts
+++ b/packages/ai-gemini/src/video/video-provider-options.ts
@@ -1,25 +1,50 @@
 /**
- * Gemini Veo Video Generation Provider Options
+ * Gemini Video Generation Provider Options
  *
- * Based on https://ai.google.dev/gemini-api/docs/video
+ * Covers two request paths behind the one video adapter:
+ * - Veo models — long-running operations via `:predictLongRunning`
+ *   (https://ai.google.dev/gemini-api/docs/video)
+ * - Gemini Omni Flash — background jobs via the Interactions API
+ *   (https://ai.google.dev/gemini-api/docs/omni)
  *
  * @experimental Video generation is an experimental feature and may change.
  */
+import { GEMINI_INTERACTIONS_VIDEO_MODELS } from '../model-meta'
 import type { DurationOptions } from '@tanstack/ai/adapters'
-import type { GenerateVideosConfig } from '@google/genai'
+import type { GenerateVideosConfig, Interactions } from '@google/genai'
 import type { GEMINI_VIDEO_MODELS } from '../model-meta'
 
 /**
- * Model type for Gemini Veo video generation.
+ * Model type for Gemini video generation (Veo + Omni Flash).
  * @experimental Video generation is an experimental feature and may change.
  */
 export type GeminiVideoModel = (typeof GEMINI_VIDEO_MODELS)[number]
 
 /**
- * Supported aspect ratios for Veo video generation. This is the `size` value
- * for the Gemini video adapter — Veo expresses output shape as an aspect
- * ratio (plus an optional `resolution` in `modelOptions`), not pixel
- * dimensions.
+ * Video models served by the Interactions API (Gemini Omni Flash) rather
+ * than Veo's `:predictLongRunning` operations flow.
+ * @experimental Omni video generation is an experimental feature and may change.
+ */
+export type GeminiInteractionsVideoModel =
+  (typeof GEMINI_INTERACTIONS_VIDEO_MODELS)[number]
+
+/**
+ * Runtime guard for the Interactions-served video models.
+ * @experimental Omni video generation is an experimental feature and may change.
+ */
+export function isInteractionsVideoModel(
+  model: GeminiVideoModel,
+): model is GeminiInteractionsVideoModel {
+  return (GEMINI_INTERACTIONS_VIDEO_MODELS as ReadonlyArray<string>).includes(
+    model,
+  )
+}
+
+/**
+ * Supported aspect ratios for Gemini video generation. This is the `size`
+ * value for the Gemini video adapter — both Veo and Omni Flash express
+ * output shape as an aspect ratio (plus an optional `resolution` in Veo's
+ * `modelOptions`), not pixel dimensions.
  *
  * @experimental Video generation is an experimental feature and may change.
  */
@@ -49,13 +74,50 @@ export type GeminiVideoProviderOptions = Omit<
   | 'abortSignal'
 >
 
+/**
+ * Provider-specific options for Gemini Omni Flash video generation on the
+ * Interactions API.
+ *
+ * Derived from the SDK's `Interactions.CreateModelInteractionParamsNonStreaming`,
+ * minus the fields the adapter manages itself:
+ * - `model` / `input` — set from the adapter's model and the `prompt`
+ * - `stream` / `background` — the adapter always creates a background job
+ *   and polls it through the `generateVideo` jobs API
+ * - `response_modalities` / `response_format` — the adapter requests video
+ *   output and maps the top-level `size` option onto
+ *   `response_format.aspect_ratio`
+ * - `tools` / `response_mime_type` — not applicable to video generation
+ *
+ * Notable passthroughs:
+ * - `previous_interaction_id` — conversational video editing: chain a new
+ *   prompt onto a prior Omni interaction to refine its video
+ * - `generation_config.video_config.task` — pin the task mode
+ *   (`'text_to_video' | 'image_to_video' | 'reference_to_video' | 'edit'`)
+ *   instead of letting the model infer it
+ *
+ * @experimental Omni video generation is an experimental feature and may change.
+ */
+export type GeminiOmniVideoProviderOptions = Omit<
+  Interactions.CreateModelInteractionParamsNonStreaming,
+  | 'model'
+  | 'input'
+  | 'stream'
+  | 'background'
+  | 'response_modalities'
+  | 'response_format'
+  | 'response_mime_type'
+  | 'tools'
+>
+
 /**
  * Model-specific provider options mapping.
  *
  * @experimental Video generation is an experimental feature and may change.
  */
 export type GeminiVideoModelProviderOptionsByName = {
-  [TModel in GeminiVideoModel]: GeminiVideoProviderOptions
+  [TModel in GeminiVideoModel]: TModel extends GeminiInteractionsVideoModel
+    ? GeminiOmniVideoProviderOptions
+    : GeminiVideoProviderOptions
 }
 
 /**
@@ -70,17 +132,21 @@ export type GeminiVideoModelSizeByName = {
 /**
  * Per-model prompt input modalities. Every Veo model accepts image
  * conditioning inputs (first frame, last frame, reference images) alongside
- * the text prompt.
+ * the text prompt. Omni Flash additionally accepts video inputs (short
+ * reference clips / videos to edit).
  *
  * @experimental Video generation is an experimental feature and may change.
  */
 export type GeminiVideoModelInputModalitiesByName = {
-  [TModel in GeminiVideoModel]: readonly ['image']
+  [TModel in GeminiVideoModel]: TModel extends GeminiInteractionsVideoModel
+    ? readonly ['image', 'video']
+    : readonly ['image']
 }
 
 /**
- * Per-model duration unions (seconds, as numbers — the API's
- * `parameters.durationSeconds` field is numeric).
+ * Per-model duration unions (seconds, as numbers — Veo's
+ * `parameters.durationSeconds` field is numeric; Omni Flash clips are a
+ * fixed 10 seconds today, with longer durations "coming soon" per Google).
  *
  * @experimental Video generation is an experimental feature and may change.
  */
@@ -88,15 +154,18 @@ export type GeminiVideoModelDurationByName = {
   'veo-3.1-generate-preview': 4 | 6 | 8
   'veo-3.1-fast-generate-preview': 4 | 6 | 8
   'veo-3.1-lite-generate-preview': 4 | 6 | 8
+  'gemini-omni-flash-preview': 10
 }
 
 /**
  * Runtime duration table backing `availableDurations()` / `snapDuration()`.
  *
- * Curated from the official Veo docs
- * (https://ai.google.dev/gemini-api/docs/video) — the Gemini OpenAPI spec
+ * Curated from the official docs
+ * (https://ai.google.dev/gemini-api/docs/video,
+ * https://ai.google.dev/gemini-api/docs/omni) — the Gemini OpenAPI spec
  * types the `:predictLongRunning` request's `parameters` as unconstrained,
  * so it carries no per-model duration information to derive these from.
+ * Omni Flash has no duration request field at all; clips are 10 seconds.
  *
  * @experimental Video generation is an experimental feature and may change.
  */
@@ -108,10 +177,11 @@ export const GEMINI_VIDEO_DURATIONS: {
   'veo-3.1-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
   'veo-3.1-fast-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
   'veo-3.1-lite-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
+  'gemini-omni-flash-preview': { kind: 'discrete', values: [10] },
 }
 
 /**
- * Look up the duration options for a Veo model.
+ * Look up the duration options for a Gemini video model.
  *
  * @experimental Video generation is an experimental feature and may change.
  */
diff --git a/packages/ai-gemini/tests/video-adapter.test.ts b/packages/ai-gemini/tests/video-adapter.test.ts
index 5763d6737..41889ecc5 100644
--- a/packages/ai-gemini/tests/video-adapter.test.ts
+++ b/packages/ai-gemini/tests/video-adapter.test.ts
@@ -501,3 +501,371 @@ describe('Gemini Video Adapter', () => {
     })
   })
 })
+
+// ===========================
+// Gemini Omni Flash (Interactions API)
+// ===========================
+
+interface InteractionsClientStub {
+  interactions: {
+    create: ReturnType<typeof vi.fn>
+    get: ReturnType<typeof vi.fn>
+  }
+}
+
+const completedOmniInteraction = {
+  id: 'v1_omni-job-123',
+  status: 'completed',
+  usage: {
+    total_input_tokens: 12,
+    total_output_tokens: 57920,
+    total_tokens: 57932,
+    output_tokens_by_modality: [{ modality: 'video', tokens: 57920 }],
+  },
+  steps: [
+    { type: 'user_input', content: [{ type: 'text', text: 'a sunset' }] },
+    { type: 'thought', signature: 'sig' },
+    {
+      type: 'model_output',
+      content: [
+        { type: 'video', mime_type: 'video/mp4', data: 'AAAAIGZ0eXA=' },
+      ],
+    },
+  ],
+}
+
+function createInteractionsClientStub(
+  overrides: {
+    createResult?: Record<string, unknown>
+    getResult?: Record<string, unknown>
+  } = {},
+): InteractionsClientStub {
+  return {
+    interactions: {
+      create: vi.fn().mockResolvedValue(
+        overrides.createResult ?? {
+          id: 'v1_omni-job-123',
+          status: 'in_progress',
+          object: 'interaction',
+        },
+      ),
+      get: vi
+        .fn()
+        .mockResolvedValue(overrides.getResult ?? completedOmniInteraction),
+    },
+  }
+}
+
+class StubbedGeminiOmniVideoAdapter extends GeminiVideoAdapter<'gemini-omni-flash-preview'> {
+  constructor(stub: InteractionsClientStub) {
+    super({ apiKey: 'test-key' }, 'gemini-omni-flash-preview')
+    this.client = stub as unknown as GoogleGenAI
+  }
+}
+
+describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
+  describe('durations', () => {
+    it('reports the fixed 10-second clip length', () => {
+      const adapter = createGeminiVideo('gemini-omni-flash-preview', 'test-key')
+      expect(adapter.availableDurations()).toEqual({
+        kind: 'discrete',
+        values: [10],
+      })
+      expect(adapter.snapDuration(3)).toBe(10)
+      expect(adapter.snapDuration(60)).toBe(10)
+    })
+
+    it('types duration as the fixed 10-second literal at compile time', () => {
+      const omni = createGeminiVideo('gemini-omni-flash-preview', 'test-key')
+      expectTypeOf(omni.snapDuration).returns.toEqualTypeOf<10 | undefined>()
+      type OmniOptions = Parameters<typeof omni.createVideoJob>[0]
+      expectTypeOf<OmniOptions['duration']>().toEqualTypeOf<10 | undefined>()
+    })
+  })
+
+  describe('createVideoJob', () => {
+    it('creates a background interaction requesting video output', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      const result = await adapter.createVideoJob({
+        model: 'gemini-omni-flash-preview',
+        prompt: 'a sunset over the ocean',
+        size: '9:16',
+        logger: testLogger,
+      })
+
+      expect(result).toEqual({
+        jobId: 'v1_omni-job-123',
+        model: 'gemini-omni-flash-preview',
+      })
+      expect(stub.interactions.create).toHaveBeenCalledWith({
+        model: 'gemini-omni-flash-preview',
+        input: [
+          {
+            type: 'user_input',
+            content: [{ type: 'text', text: 'a sunset over the ocean' }],
+          },
+        ],
+        response_modalities: ['video'],
+        background: true,
+        response_format: { type: 'video', aspect_ratio: '9:16' },
+      })
+    })
+
+    it('omits response_format when no size is given and passes modelOptions through', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await adapter.createVideoJob({
+        model: 'gemini-omni-flash-preview',
+        prompt: 'make the violin invisible',
+        modelOptions: { previous_interaction_id: 'v1_prior-turn' },
+        logger: testLogger,
+      })
+
+      expect(stub.interactions.create).toHaveBeenCalledWith({
+        model: 'gemini-omni-flash-preview',
+        previous_interaction_id: 'v1_prior-turn',
+        input: [
+          {
+            type: 'user_input',
+            content: [{ type: 'text', text: 'make the violin invisible' }],
+          },
+        ],
+        response_modalities: ['video'],
+        background: true,
+      })
+    })
+
+    it('sends image and video prompt parts as content blocks before the text', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await adapter.createVideoJob({
+        model: 'gemini-omni-flash-preview',
+        prompt: [
+          {
+            type: 'image',
+            source: { type: 'data', value: 'aGVsbG8=', mimeType: 'image/png' },
+          },
+          {
+            type: 'video',
+            source: {
+              type: 'url',
+              value:
+                'https://generativelanguage.googleapis.com/v1beta/files/abc',
+              mimeType: 'video/mp4',
+            },
+          },
+          { type: 'text', content: 'animate this' },
+        ],
+        logger: testLogger,
+      })
+
+      expect(stub.interactions.create).toHaveBeenCalledWith(
+        expect.objectContaining({
+          input: [
+            {
+              type: 'user_input',
+              content: [
+                { type: 'image', data: 'aGVsbG8=', mime_type: 'image/png' },
+                {
+                  type: 'video',
+                  uri: 'https://generativelanguage.googleapis.com/v1beta/files/abc',
+                  mime_type: 'video/mp4',
+                },
+                { type: 'text', text: 'animate this' },
+              ],
+            },
+          ],
+        }),
+      )
+    })
+
+    it('throws on audio prompt parts', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await expect(
+        adapter.createVideoJob({
+          model: 'gemini-omni-flash-preview',
+          prompt: [
+            { type: 'text', content: 'sync to this' },
+            {
+              type: 'audio',
+              source: {
+                type: 'data',
+                value: 'aGVsbG8=',
+                mimeType: 'audio/wav',
+              },
+            },
+          ],
+          logger: testLogger,
+        }),
+      ).rejects.toThrow(/audio prompt parts/)
+      expect(stub.interactions.create).not.toHaveBeenCalled()
+    })
+
+    it('throws when the interaction comes back without an id', async () => {
+      const stub = createInteractionsClientStub({
+        createResult: { status: 'in_progress' },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await expect(
+        adapter.createVideoJob({
+          model: 'gemini-omni-flash-preview',
+          prompt: 'a sunset',
+          logger: testLogger,
+        }),
+      ).rejects.toThrow(/interaction id/)
+    })
+  })
+
+  describe('getVideoStatus', () => {
+    const jobId = 'v1_omni-job-123'
+
+    it('maps in_progress to processing', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: { id: jobId, status: 'in_progress' },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      expect(await adapter.getVideoStatus(jobId)).toEqual({
+        jobId,
+        status: 'processing',
+      })
+      expect(stub.interactions.get).toHaveBeenCalledWith(jobId)
+    })
+
+    it('maps a completed interaction with a video to completed', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      expect(await adapter.getVideoStatus(jobId)).toEqual({
+        jobId,
+        status: 'completed',
+      })
+    })
+
+    it('maps a completed interaction without video output to failed', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: {
+          id: jobId,
+          status: 'completed',
+          steps: [
+            {
+              type: 'model_output',
+              content: [{ type: 'text', text: 'cannot do that' }],
+            },
+          ],
+        },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      const status = await adapter.getVideoStatus(jobId)
+      expect(status.status).toBe('failed')
+      expect(status.error).toMatch(/without returning a video/)
+    })
+
+    it('maps terminal non-success statuses to failed', async () => {
+      for (const failure of ['failed', 'cancelled', 'incomplete']) {
+        const stub = createInteractionsClientStub({
+          getResult: { id: jobId, status: failure },
+        })
+        const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+        const status = await adapter.getVideoStatus(jobId)
+        expect(status.status).toBe('failed')
+        expect(status.error).toContain(failure)
+      }
+    })
+  })
+
+  describe('getVideoUrl', () => {
+    const jobId = 'v1_omni-job-123'
+
+    it('returns the inline base64 video as a data: URL with usage', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      expect(await adapter.getVideoUrl(jobId)).toEqual({
+        jobId,
+        url: 'data:video/mp4;base64,AAAAIGZ0eXA=',
+        usage: {
+          promptTokens: 12,
+          completionTokens: 57920,
+          totalTokens: 57932,
+        },
+      })
+    })
+
+    it('falls back to the video-modality token count when totals are missing', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: {
+          ...completedOmniInteraction,
+          usage: {
+            output_tokens_by_modality: [{ modality: 'video', tokens: 57920 }],
+          },
+        },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      const result = await adapter.getVideoUrl(jobId)
+      expect(result.usage).toEqual({
+        promptTokens: 0,
+        completionTokens: 57920,
+        totalTokens: 57920,
+      })
+    })
+
+    it('passes a URI delivery through as the URL', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: {
+          id: jobId,
+          status: 'completed',
+          output_video: {
+            type: 'video',
+            uri: 'https://generativelanguage.googleapis.com/v1beta/files/xyz:download',
+          },
+        },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      const result = await adapter.getVideoUrl(jobId)
+      expect(result.url).toBe(
+        'https://generativelanguage.googleapis.com/v1beta/files/xyz:download',
+      )
+    })
+
+    it('throws when the interaction is still in progress', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: { id: jobId, status: 'in_progress' },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await expect(adapter.getVideoUrl(jobId)).rejects.toThrow(/not ready/)
+    })
+
+    it('throws with the terminal status on failure', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: { id: jobId, status: 'failed' },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await expect(adapter.getVideoUrl(jobId)).rejects.toThrow(/"failed"/)
+    })
+
+    it('throws when a completed interaction has no video content', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: { id: jobId, status: 'completed', steps: [] },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await expect(adapter.getVideoUrl(jobId)).rejects.toThrow(
+        /Video not found/,
+      )
+    })
+  })
+})
diff --git a/packages/ai/skills/ai-core/media-generation/SKILL.md b/packages/ai/skills/ai-core/media-generation/SKILL.md
index 0c63f347e..8da02856c 100644
--- a/packages/ai/skills/ai-core/media-generation/SKILL.md
+++ b/packages/ai/skills/ai-core/media-generation/SKILL.md
@@ -467,6 +467,33 @@ const { jobId } = await generateVideo({
 // (x-goog-api-key header or ?key= query parameter).
 ```
 
+Gemini Omni Flash (`geminiVideo('gemini-omni-flash-preview')`) is served by
+the Interactions API instead of Veo's operations flow — same adapter, routed
+by model. Clips are a fixed 10s at 720p (`duration` is typed `10`), `size`
+is the aspect ratio (`'16:9' | '9:16'`), and the finished video arrives
+**inline** as a `data:video/mp4;base64,…` URL (no key needed to use it).
+Image/video prompt parts are sent as interaction content in order (no
+`metadata.role` routing); `data` sources go inline, `url` sources pass
+through as-is (never downloaded — use Gemini Files API URIs for remote
+media). For conversational editing, pass a prior generation's `jobId` as
+`modelOptions.previous_interaction_id` with a prompt describing the change:
+
+```typescript
+import { geminiVideo } from '@tanstack/ai-gemini'
+
+const omni = geminiVideo('gemini-omni-flash-preview')
+const first = await generateVideo({
+  adapter: omni,
+  prompt: 'A violinist outdoors',
+})
+// …poll first.jobId to completion, then edit it:
+const edited = await generateVideo({
+  adapter: omni,
+  prompt: 'Make the violin invisible',
+  modelOptions: { previous_interaction_id: first.jobId },
+})
+```
+
 Other video adapters: `openaiVideo('sora-2')` (pixel sizes like `'1280x720'`,
 durations 4/8/12s, single `input_reference` image prompt part), `grokVideo(...)`
 (`grok-imagine-video` does text-to-video + image-to-video; `grok-imagine-video-1.5` is
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index c2abbdf71..f9627db9a 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -1709,7 +1709,7 @@ importers:
   packages/ai-gemini:
     dependencies:
       '@google/genai':
-        specifier: ^2.8.0
+        specifier: ^2.10.0
         version: 2.10.0(@modelcontextprotocol/sdk@1.29.0(zod@4.3.6))
       '@tanstack/ai-utils':
         specifier: workspace:*
diff --git a/testing/e2e/global-setup.ts b/testing/e2e/global-setup.ts
index 42011f43f..2dbbace72 100644
--- a/testing/e2e/global-setup.ts
+++ b/testing/e2e/global-setup.ts
@@ -50,6 +50,15 @@ export default async function globalSetup() {
   // aimock's native Gemini handlers.
   mock.mount('/v1beta/models', geminiVeoMount())
 
+  // Gemini Omni Flash video generation (Interactions API). aimock handles
+  // synchronous text interactions natively, but not background video jobs
+  // (POST /v1beta/interactions with background:true → poll
+  // GET /v1beta/interactions/{id} → inline base64 mp4). The adapter under
+  // test points its baseUrl at this dedicated prefix so aimock's native
+  // interactions handling stays untouched for the stateful-interactions
+  // text tests.
+  mock.mount('/omni-video', geminiOmniVideoMount())
+
   // Anthropic server_tool_use bug reproduction (issue #604). aimock can't
   // natively synthesize `server_tool_use` / `web_fetch_tool_result` content
   // blocks, so this mount hand-crafts the raw SSE Claude would emit when a
@@ -345,6 +354,89 @@ function geminiVeoMount(): Mountable {
   }
 }
 
+/**
+ * Mounts Gemini Omni Flash's Interactions-API video generation flow under a
+ * dedicated `/omni-video` prefix (the adapter under test sets its baseUrl to
+ * it, so requests land on `/omni-video/v1beta/interactions`):
+ *
+ * - `POST /v1beta/interactions` — creates the background job and returns an
+ *   `in_progress` interaction with an id.
+ * - `GET /v1beta/interactions/{id}` — polls the job. The mock completes
+ *   immediately with the raw wire shape: a `model_output` step carrying an
+ *   inline base64 `video` content block plus `output_tokens_by_modality`
+ *   usage, which the adapter maps to a `data:video/mp4;base64,…` URL.
+ */
+function geminiOmniVideoMount(): Mountable {
+  const JOB_ID = 'v1_omni-video-e2e'
+  // Minimal MP4-ish base64 payload — the spec only asserts the <video>
+  // element renders with the data: URL the adapter builds from it.
+  const VIDEO_BASE64 = 'AAAAIGZ0eXBpc29tAAACAGlzb21pc28y'
+  return {
+    async handleRequest(
+      req: http.IncomingMessage,
+      res: http.ServerResponse,
+      // aimock strips the mount prefix ('/omni-video') and any query
+      // string, so pathname looks like '/v1beta/interactions' or
+      // '/v1beta/interactions/{id}'.
+      pathname: string,
+    ): Promise<boolean> {
+      if (pathname === '/v1beta/interactions' && req.method === 'POST') {
+        await drainBody(req)
+        res.statusCode = 200
+        res.setHeader('Content-Type', 'application/json')
+        res.end(
+          JSON.stringify({
+            id: JOB_ID,
+            object: 'interaction',
+            status: 'in_progress',
+            model: 'gemini-omni-flash-preview',
+          }),
+        )
+        return true
+      }
+
+      const pollMatch = pathname.match(/^\/v1beta\/interactions\/([^/]+)$/)
+      if (pollMatch && req.method === 'GET') {
+        res.statusCode = 200
+        res.setHeader('Content-Type', 'application/json')
+        res.end(
+          JSON.stringify({
+            id: pollMatch[1],
+            object: 'interaction',
+            status: 'completed',
+            model: 'gemini-omni-flash-preview',
+            usage: {
+              total_input_tokens: 12,
+              total_output_tokens: 57920,
+              total_tokens: 57932,
+              output_tokens_by_modality: [{ modality: 'video', tokens: 57920 }],
+            },
+            steps: [
+              {
+                type: 'user_input',
+                content: [{ type: 'text', text: 'a guitar being played' }],
+              },
+              {
+                type: 'model_output',
+                content: [
+                  {
+                    type: 'video',
+                    mime_type: 'video/mp4',
+                    data: VIDEO_BASE64,
+                  },
+                ],
+              },
+            ],
+          }),
+        )
+        return true
+      }
+
+      return false
+    },
+  }
+}
+
 /**
  * Mounts a Claude-shaped SSE response that includes a client `tool_use` block
  * followed by a `web_fetch` `server_tool_use` block, plus its
diff --git a/testing/e2e/src/components/VideoGenUI.tsx b/testing/e2e/src/components/VideoGenUI.tsx
index 1b068d741..2136c3e34 100644
--- a/testing/e2e/src/components/VideoGenUI.tsx
+++ b/testing/e2e/src/components/VideoGenUI.tsx
@@ -6,7 +6,7 @@ import {
 } from '@tanstack/ai-react'
 import { generateVideoFn } from '@/lib/server-functions'
 import type { MediaPrompt } from '@tanstack/ai'
-import type { Mode, Provider } from '@/lib/types'
+import type { Feature, Mode, Provider } from '@/lib/types'
 import type { VideoGenerateResult } from '@tanstack/ai-client'
 
 interface VideoGenUIProps {
@@ -16,6 +16,8 @@ interface VideoGenUIProps {
   aimockPort?: number
   /** Show a file input and send the prompt as multimodal parts (image-to-video). */
   withImageInput?: boolean
+  /** Video feature variant — selects the adapter server-side (e.g. 'interactions-video' → Gemini Omni Flash). */
+  feature?: Feature
 }
 
 function fileToBase64(file: File): Promise<string> {
@@ -45,12 +47,13 @@ export function VideoGenUI({
   testId,
   aimockPort,
   withImageInput,
+  feature,
 }: VideoGenUIProps) {
   const [prompt, setPrompt] = useState('')
   const [imageFile, setImageFile] = useState<File | null>(null)
 
   const connectionOptions = () => {
-    const body = { provider, testId, aimockPort }
+    const body = { provider, testId, aimockPort, feature }
 
     if (mode === 'sse') {
       return { connection: fetchServerSentEvents('/api/video'), body }
@@ -61,7 +64,7 @@ export function VideoGenUI({
     return {
       fetcher: async (input: { prompt: MediaPrompt }) => {
         return generateVideoFn({
-          data: { prompt: input.prompt, provider, aimockPort, testId },
+          data: { prompt: input.prompt, provider, aimockPort, testId, feature },
         }) as Promise<VideoGenerateResult>
       },
     }
diff --git a/testing/e2e/src/lib/feature-support.ts b/testing/e2e/src/lib/feature-support.ts
index 01738f1db..f64f708ec 100644
--- a/testing/e2e/src/lib/feature-support.ts
+++ b/testing/e2e/src/lib/feature-support.ts
@@ -245,6 +245,11 @@ export const matrix: Record<Feature, Set<Provider>> = {
   // routing remain unit-test-only (the spec's journal assertion is tied to
   // aimock's /v1/videos pipeline, which custom mounts bypass).
   'image-to-video': new Set(['openai']),
+  // Gemini Omni Flash video generation over the Interactions API. Runs
+  // through a dedicated aimock mount (see geminiOmniVideoMount in
+  // global-setup.ts) — aimock handles synchronous text interactions natively
+  // but not background video jobs (create → poll → inline base64 mp4).
+  'interactions-video': new Set(['gemini']),
   // Only Gemini currently surfaces a first-class stateful conversation API via
   // the adapter (geminiTextInteractions, behind @tanstack/ai-gemini/experimental).
   'stateful-interactions': new Set(['gemini']),
diff --git a/testing/e2e/src/lib/features.ts b/testing/e2e/src/lib/features.ts
index 972eba0ae..2efa9427f 100644
--- a/testing/e2e/src/lib/features.ts
+++ b/testing/e2e/src/lib/features.ts
@@ -132,6 +132,10 @@ export const featureConfigs: Record<Feature, FeatureConfig> = {
     tools: [],
     modelOptions: {},
   },
+  'interactions-video': {
+    tools: [],
+    modelOptions: {},
+  },
   'stateful-interactions': {
     tools: [],
     modelOptions: {},
diff --git a/testing/e2e/src/lib/media-providers.ts b/testing/e2e/src/lib/media-providers.ts
index ad5c01815..c4a9e1d11 100644
--- a/testing/e2e/src/lib/media-providers.ts
+++ b/testing/e2e/src/lib/media-providers.ts
@@ -131,8 +131,21 @@ export function createVideoAdapter(
   provider: Provider,
   aimockPort?: number,
   testId?: string,
+  feature: Feature = 'video-gen',
 ) {
   const headers = testHeaders(testId)
+  // Gemini Omni Flash only serves the Interactions API; its background
+  // video jobs run through a dedicated aimock mount (see geminiOmniVideoMount
+  // in global-setup.ts) addressed via a distinct baseUrl prefix so aimock's
+  // native /v1beta/interactions text handling is untouched.
+  if (feature === 'interactions-video') {
+    if (provider !== 'gemini') {
+      throw new Error(`No interactions-video adapter for provider: ${provider}`)
+    }
+    return createGeminiVideo('gemini-omni-flash-preview', DUMMY_KEY, {
+      httpOptions: { baseUrl: `${llmockBase(aimockPort)}/omni-video`, headers },
+    })
+  }
   const factories: Record<string, () => any> = {
     openai: () =>
       createOpenaiVideo('sora-2', DUMMY_KEY, {
diff --git a/testing/e2e/src/lib/server-functions.ts b/testing/e2e/src/lib/server-functions.ts
index 20faeb7b4..f2cee72e5 100644
--- a/testing/e2e/src/lib/server-functions.ts
+++ b/testing/e2e/src/lib/server-functions.ts
@@ -142,6 +142,7 @@ export const generateVideoFn = createServerFn({ method: 'POST' })
       provider: Provider
       aimockPort?: number
       testId?: string
+      feature?: Feature
     }) => {
       const isEmpty =
         typeof data.prompt === 'string'
@@ -158,6 +159,7 @@ export const generateVideoFn = createServerFn({ method: 'POST' })
       data.provider,
       data.aimockPort,
       data.testId,
+      data.feature,
     )
     // Non-streaming: create job, poll until complete, return result with URL
     const { jobId } = await generateVideo({
diff --git a/testing/e2e/src/lib/types.ts b/testing/e2e/src/lib/types.ts
index e982da7e3..2249a9277 100644
--- a/testing/e2e/src/lib/types.ts
+++ b/testing/e2e/src/lib/types.ts
@@ -41,6 +41,7 @@ export type Feature =
   | 'transcription'
   | 'video-gen'
   | 'image-to-video'
+  | 'interactions-video'
   | 'stateful-interactions'
 
 export const ALL_PROVIDERS: Provider[] = [
@@ -85,5 +86,6 @@ export const ALL_FEATURES: Feature[] = [
   'transcription',
   'video-gen',
   'image-to-video',
+  'interactions-video',
   'stateful-interactions',
 ]
diff --git a/testing/e2e/src/routes/$provider/$feature.tsx b/testing/e2e/src/routes/$provider/$feature.tsx
index b1fe5b40f..3da15834e 100644
--- a/testing/e2e/src/routes/$provider/$feature.tsx
+++ b/testing/e2e/src/routes/$provider/$feature.tsx
@@ -47,6 +47,7 @@ const MEDIA_FEATURES = new Set<Feature>([
   'transcription',
   'video-gen',
   'image-to-video',
+  'interactions-video',
   'audio-gen',
   'sound-effects',
 ])
@@ -181,6 +182,16 @@ function MediaFeature({
           withImageInput
         />
       )
+    case 'interactions-video':
+      return (
+        <VideoGenUI
+          provider={provider}
+          mode={mode}
+          testId={testId}
+          aimockPort={aimockPort}
+          feature="interactions-video"
+        />
+      )
     case 'audio-gen':
     case 'sound-effects':
       return (
diff --git a/testing/e2e/src/routes/api.video.stream.ts b/testing/e2e/src/routes/api.video.stream.ts
index 88eb1a189..05c5b74f1 100644
--- a/testing/e2e/src/routes/api.video.stream.ts
+++ b/testing/e2e/src/routes/api.video.stream.ts
@@ -2,7 +2,7 @@ import { createFileRoute } from '@tanstack/react-router'
 import { generateVideo, toHttpResponse } from '@tanstack/ai'
 import { createVideoAdapter } from '@/lib/media-providers'
 import type { MediaPrompt } from '@tanstack/ai'
-import type { Provider } from '@/lib/types'
+import type { Feature, Provider } from '@/lib/types'
 
 export const Route = createFileRoute('/api/video/stream')({
   server: {
@@ -12,14 +12,20 @@ export const Route = createFileRoute('/api/video/stream')({
         const abortController = new AbortController()
         const body = await request.json()
         const data = body.forwardedProps ?? body.data ?? body
-        const { prompt, provider, testId, aimockPort } = data as {
+        const { prompt, provider, testId, aimockPort, feature } = data as {
           prompt: MediaPrompt
           provider: Provider
           testId?: string
           aimockPort?: number
+          feature?: Feature
         }
 
-        const adapter = createVideoAdapter(provider, aimockPort, testId)
+        const adapter = createVideoAdapter(
+          provider,
+          aimockPort,
+          testId,
+          feature,
+        )
 
         try {
           const stream = generateVideo({
diff --git a/testing/e2e/src/routes/api.video.ts b/testing/e2e/src/routes/api.video.ts
index a9b0903ec..83ceec707 100644
--- a/testing/e2e/src/routes/api.video.ts
+++ b/testing/e2e/src/routes/api.video.ts
@@ -2,7 +2,7 @@ import { createFileRoute } from '@tanstack/react-router'
 import { generateVideo, toServerSentEventsResponse } from '@tanstack/ai'
 import { createVideoAdapter } from '@/lib/media-providers'
 import type { MediaPrompt } from '@tanstack/ai'
-import type { Provider } from '@/lib/types'
+import type { Feature, Provider } from '@/lib/types'
 
 export const Route = createFileRoute('/api/video')({
   server: {
@@ -12,14 +12,20 @@ export const Route = createFileRoute('/api/video')({
         const abortController = new AbortController()
         const body = await request.json()
         const data = body.forwardedProps ?? body.data ?? body
-        const { prompt, provider, testId, aimockPort } = data as {
+        const { prompt, provider, testId, aimockPort, feature } = data as {
           prompt: MediaPrompt
           provider: Provider
           testId?: string
           aimockPort?: number
+          feature?: Feature
         }
 
-        const adapter = createVideoAdapter(provider, aimockPort, testId)
+        const adapter = createVideoAdapter(
+          provider,
+          aimockPort,
+          testId,
+          feature,
+        )
 
         try {
           const stream = generateVideo({
diff --git a/testing/e2e/tests/interactions-video.spec.ts b/testing/e2e/tests/interactions-video.spec.ts
new file mode 100644
index 000000000..35f302dbc
--- /dev/null
+++ b/testing/e2e/tests/interactions-video.spec.ts
@@ -0,0 +1,76 @@
+import { test, expect } from './fixtures'
+import {
+  fillPrompt,
+  clickGenerate,
+  waitForGenerationComplete,
+  featureUrl,
+} from './helpers'
+import { providersFor } from './test-matrix'
+
+// Gemini Omni Flash (gemini-omni-flash-preview) video generation over the
+// Interactions API: create a background interaction → poll it by id →
+// receive the finished clip as inline base64 the adapter surfaces as a
+// data:video/mp4 URL. Backed by the geminiOmniVideoMount in global-setup.ts.
+for (const provider of providersFor('interactions-video')) {
+  test.describe(`${provider} -- interactions-video`, () => {
+    test('sse -- generates video via SSE connection', async ({
+      page,
+      testId,
+      aimockPort,
+    }) => {
+      await page.goto(
+        featureUrl(provider, 'interactions-video', testId, aimockPort, 'sse'),
+      )
+      await fillPrompt(page, 'a guitar being played in a store')
+      await clickGenerate(page)
+      await waitForGenerationComplete(page, 60_000)
+      const video = page.getByTestId('generated-video')
+      await expect(video).toBeVisible()
+      await expect(video).toHaveAttribute('src', /^data:video\/mp4;base64,/)
+    })
+
+    test('http-stream -- generates video via HTTP stream', async ({
+      page,
+      testId,
+      aimockPort,
+    }) => {
+      await page.goto(
+        featureUrl(
+          provider,
+          'interactions-video',
+          testId,
+          aimockPort,
+          'http-stream',
+        ),
+      )
+      await fillPrompt(page, 'a guitar being played in a store')
+      await clickGenerate(page)
+      await waitForGenerationComplete(page, 60_000)
+      const video = page.getByTestId('generated-video')
+      await expect(video).toBeVisible()
+      await expect(video).toHaveAttribute('src', /^data:video\/mp4;base64,/)
+    })
+
+    test('fetcher -- generates video via server function', async ({
+      page,
+      testId,
+      aimockPort,
+    }) => {
+      await page.goto(
+        featureUrl(
+          provider,
+          'interactions-video',
+          testId,
+          aimockPort,
+          'fetcher',
+        ),
+      )
+      await fillPrompt(page, 'a guitar being played in a store')
+      await clickGenerate(page)
+      await waitForGenerationComplete(page, 60_000)
+      const video = page.getByTestId('generated-video')
+      await expect(video).toBeVisible()
+      await expect(video).toHaveAttribute('src', /^data:video\/mp4;base64,/)
+    })
+  })
+}

From b4321f4642e7351efa709119d07a8a8d2becf898 Mon Sep 17 00:00:00 2001
From: Tom Beckenham <34339192+tombeckenham@users.noreply.github.com>
Date: Thu, 2 Jul 2026 20:53:29 +1000
Subject: [PATCH 2/4] feat(examples): Gemini Omni Flash with all inputs in
 ts-react-media
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add gemini-omni-flash-preview (text-to-video + image-to-video) to the
ts-react-media example, exercising every Omni input: text prompts, a
start image, an attached reference/edit video clip (Omni-only — never
sent to other providers), and conversational editing that chains a new
prompt onto a completed generation via previous_interaction_id.

Also fixes a latent core type bug this surfaced: generateVideo /
getVideoJobStatus constrained adapters as VideoAdapter<string, any,
any, any>, leaving the duration generic at its Record<string, number>
default — any adapter with a narrowed per-model duration union (Omni's
10, Veo's 4|6|8) failed assignability under strict function-type
contravariance. All video-activity constraints now span all six
VideoAdapter generics.

Verified live: Omni edit chaining (previous_interaction_id) against the
real Gemini API returned an edited 10s MP4; example dev server boots
and type-checks.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .../video-adapter-duration-constraint.md      |   5 +
 .../src/components/ImageGenerator.tsx         |   8 +-
 .../src/components/VideoGenerator.tsx         | 206 ++++++++++++++++--
 examples/ts-react-media/src/lib/media.ts      |  33 ++-
 examples/ts-react-media/src/lib/models.ts     |  16 ++
 .../src/lib/server-functions.ts               |  77 ++++++-
 .../ai/src/activities/generateVideo/index.ts  |  40 ++--
 7 files changed, 335 insertions(+), 50 deletions(-)
 create mode 100644 .changeset/video-adapter-duration-constraint.md

diff --git a/.changeset/video-adapter-duration-constraint.md b/.changeset/video-adapter-duration-constraint.md
new file mode 100644
index 000000000..093e86715
--- /dev/null
+++ b/.changeset/video-adapter-duration-constraint.md
@@ -0,0 +1,5 @@
+---
+'@tanstack/ai': patch
+---
+
+Fix `generateVideo` / `getVideoJobStatus` rejecting video adapters that declare a narrowed per-model duration union (e.g. Gemini's `4 | 6 | 8` for Veo or `10` for Omni Flash) at the type level. The activity's `TAdapter extends VideoAdapter<string, any, any, any>` constraints left the input-modality and duration generics at their defaults, so `duration?: number` failed contravariance against the adapter's literal union. All video-activity constraints and helper conditionals now span all six `VideoAdapter` generics.
diff --git a/examples/ts-react-media/src/components/ImageGenerator.tsx b/examples/ts-react-media/src/components/ImageGenerator.tsx
index 9b4d5fd29..09e2eb7d4 100644
--- a/examples/ts-react-media/src/components/ImageGenerator.tsx
+++ b/examples/ts-react-media/src/components/ImageGenerator.tsx
@@ -6,8 +6,8 @@ import type { MediaPrompt } from '@tanstack/ai/client'
 import { generateImageFn } from '@/lib/server-functions'
 import { getRandomImagePrompt } from '@/lib/prompts'
 import { IMAGE_MODELS } from '@/lib/models'
-import { readImageFile, toImagePart } from '@/lib/media'
-import type { AttachedImage } from '@/lib/media'
+import { readMediaFile, toImagePart } from '@/lib/media'
+import type { AttachedMedia } from '@/lib/media'
 
 interface ImageGeneratorProps {
   onImageGenerated?: (imageUrl: string) => void
@@ -36,7 +36,7 @@ export default function ImageGenerator({
   const [selectedModel, setSelectedModel] = useState<string>('all')
   const [isLoading, setIsLoading] = useState(false)
   const [results, setResults] = useState<Record<string, ModelResult>>({})
-  const [images, setImages] = useState<Array<AttachedImage>>([])
+  const [images, setImages] = useState<Array<AttachedMedia>>([])
   const fileInputRef = useRef<HTMLInputElement>(null)
 
   const currentModel = IMAGE_MODELS.find((m) => m.id === selectedModel)
@@ -56,7 +56,7 @@ export default function ImageGenerator({
     const files = Array.from(e.target.files ?? [])
     if (fileInputRef.current) fileInputRef.current.value = ''
     if (files.length === 0) return
-    const attached = await Promise.all(files.map((file) => readImageFile(file)))
+    const attached = await Promise.all(files.map((file) => readMediaFile(file)))
     setImages((prev) => [...prev, ...attached])
   }
 
diff --git a/examples/ts-react-media/src/components/VideoGenerator.tsx b/examples/ts-react-media/src/components/VideoGenerator.tsx
index f31a8078e..73696175f 100644
--- a/examples/ts-react-media/src/components/VideoGenerator.tsx
+++ b/examples/ts-react-media/src/components/VideoGenerator.tsx
@@ -1,6 +1,8 @@
 import { useEffect, useRef, useState } from 'react'
-import { Film, Loader2, Shuffle, Upload, X } from 'lucide-react'
+import { Film, Loader2, Shuffle, Upload, Wand2, X } from 'lucide-react'
 import type { VideoMode } from '@/lib/models'
+import type { AttachedMedia } from '@/lib/media'
+import type { MediaPromptPart } from '@tanstack/ai/client'
 
 import {
   createVideoJobFn,
@@ -9,7 +11,7 @@ import {
 } from '@/lib/server-functions'
 import { VIDEO_MODELS } from '@/lib/models'
 import { getRandomVideoPrompt } from '@/lib/prompts'
-import { imageUrlToPart, readImageFile } from '@/lib/media'
+import { imageUrlToPart, readMediaFile, toVideoPart } from '@/lib/media'
 
 type JobState =
   | { status: 'idle' }
@@ -21,7 +23,13 @@ type JobState =
       model: string
       progress?: number | undefined
     }
-  | { status: 'completed'; url: string; unitsBilled?: number; cost?: number }
+  | {
+      status: 'completed'
+      url: string
+      jobId: string
+      unitsBilled?: number
+      cost?: number
+    }
   | { status: 'error'; message: string }
 
 interface VideoGeneratorProps {
@@ -37,13 +45,25 @@ export default function VideoGenerator({
   const [imagePreview, setImagePreview] = useState<string | null>(
     initialImageUrl ?? null,
   )
+  const [attachedVideo, setAttachedVideo] = useState<AttachedMedia | null>(null)
+  const [editPrompts, setEditPrompts] = useState<Record<string, string>>({})
   const [jobStates, setJobStates] = useState<Record<string, JobState>>({})
   const fileInputRef = useRef<HTMLInputElement>(null)
+  const videoInputRef = useRef<HTMLInputElement>(null)
   const pollingRefs = useRef<Map<string, NodeJS.Timeout>>(new Map())
 
   const filteredModels = VIDEO_MODELS.filter((m) => m.mode === mode)
   const falModels = filteredModels.filter((m) => m.provider === 'fal')
   const xaiModels = filteredModels.filter((m) => m.provider === 'xai')
+  const geminiModels = filteredModels.filter((m) => m.provider === 'gemini')
+
+  // Gemini Omni Flash additionally accepts video prompt parts (a reference
+  // clip or a video to edit). Offer the upload whenever an Omni model is in
+  // the running — other providers never receive the video part.
+  const omniInRun =
+    selectedModel === 'all'
+      ? geminiModels.length > 0
+      : selectedModel.startsWith('gemini-omni-flash-preview')
 
   useEffect(() => {
     if (initialImageUrl) {
@@ -68,7 +88,7 @@ export default function VideoGenerator({
     const file = e.target.files?.[0]
     if (fileInputRef.current) fileInputRef.current.value = ''
     if (!file) return
-    const attached = await readImageFile(file)
+    const attached = await readMediaFile(file)
     setImagePreview(attached.dataUrl)
   }
 
@@ -77,6 +97,18 @@ export default function VideoGenerator({
     if (fileInputRef.current) fileInputRef.current.value = ''
   }
 
+  const handleVideoSelect = async (e: React.ChangeEvent<HTMLInputElement>) => {
+    const file = e.target.files?.[0]
+    if (videoInputRef.current) videoInputRef.current.value = ''
+    if (!file) return
+    setAttachedVideo(await readMediaFile(file))
+  }
+
+  const clearVideo = () => {
+    setAttachedVideo(null)
+    if (videoInputRef.current) videoInputRef.current.value = ''
+  }
+
   const pollStatus = async (jobId: string, model: string) => {
     try {
       const status = await getVideoStatusFn({ data: { jobId, model } })
@@ -98,6 +130,7 @@ export default function VideoGenerator({
           [model]: {
             status: 'completed',
             url: url,
+            jobId,
             unitsBilled: urlResult.usage?.unitsBilled,
             cost: urlResult.usage?.cost,
           },
@@ -134,6 +167,16 @@ export default function VideoGenerator({
     }
   }
 
+  // Poll keyed by the UI model id, not result.model: the direct-xAI
+  // entries share one adapter model ('grok-imagine-video-1.5'),
+  // so result.model wouldn't identify the card (or the adapter) uniquely.
+  const beginPolling = (modelId: string, jobId: string) => {
+    const interval = setInterval(() => {
+      pollStatus(jobId, modelId)
+    }, 4000)
+    pollingRefs.current.set(modelId, interval)
+  }
+
   const startJobForModel = async (modelId: string) => {
     setJobStates((prev) => ({
       ...prev,
@@ -141,16 +184,21 @@ export default function VideoGenerator({
     }))
 
     try {
+      const model = VIDEO_MODELS.find((m) => m.id === modelId)
+      const parts: Array<MediaPromptPart> = [{ type: 'text', content: prompt }]
       // Image-to-video sends the start frame as a prompt part — the fal
       // adapter routes `role: 'start_frame'` to the endpoint's start-image
-      // field (e.g. `image_url` on Kling i2v).
-      const builtPrompt =
-        mode === 'image-to-video' && imagePreview
-          ? [
-              { type: 'text' as const, content: prompt },
-              imageUrlToPart(imagePreview, { role: 'start_frame' }),
-            ]
-          : prompt
+      // field (e.g. `image_url` on Kling i2v); Omni takes it as an
+      // interaction content block.
+      if (mode === 'image-to-video' && imagePreview) {
+        parts.push(imageUrlToPart(imagePreview, { role: 'start_frame' }))
+      }
+      // Video prompt parts (reference clip / video to edit) are an Omni
+      // capability only — never send them to the other providers.
+      if (attachedVideo && model?.provider === 'gemini') {
+        parts.push(toVideoPart(attachedVideo))
+      }
+      const builtPrompt = parts.length === 1 ? prompt : parts
       const result = await createVideoJobFn({
         data: {
           prompt: builtPrompt,
@@ -167,13 +215,7 @@ export default function VideoGenerator({
         },
       }))
 
-      // Poll keyed by the UI model id, not result.model: the direct-xAI
-      // entries share one adapter model ('grok-imagine-video-1.5'),
-      // so result.model wouldn't identify the card (or the adapter) uniquely.
-      const interval = setInterval(() => {
-        pollStatus(result.jobId, modelId)
-      }, 4000)
-      pollingRefs.current.set(modelId, interval)
+      beginPolling(modelId, result.jobId)
     } catch (err) {
       setJobStates((prev) => ({
         ...prev,
@@ -186,6 +228,51 @@ export default function VideoGenerator({
     }
   }
 
+  /**
+   * Gemini Omni Flash conversational editing: chain a new prompt onto a
+   * completed generation via its interaction id (the jobId). The model
+   * applies the change while preserving everything else in the video.
+   */
+  const handleEditVideo = async (modelId: string, previousJobId: string) => {
+    const editPrompt = editPrompts[modelId]?.trim()
+    if (!editPrompt) return
+
+    setJobStates((prev) => ({
+      ...prev,
+      [modelId]: { status: 'submitting' },
+    }))
+
+    try {
+      const result = await createVideoJobFn({
+        data: {
+          prompt: editPrompt,
+          model: modelId,
+          previousInteractionId: previousJobId,
+        },
+      })
+
+      setJobStates((prev) => ({
+        ...prev,
+        [modelId]: {
+          status: 'pending',
+          jobId: result.jobId,
+          model: result.model,
+        },
+      }))
+      setEditPrompts((prev) => ({ ...prev, [modelId]: '' }))
+
+      beginPolling(modelId, result.jobId)
+    } catch (err) {
+      setJobStates((prev) => ({
+        ...prev,
+        [modelId]: {
+          status: 'error',
+          message: err instanceof Error ? err.message : 'Failed to edit video',
+        },
+      }))
+    }
+  }
+
   const handleGenerate = async () => {
     if (!prompt.trim()) return
     if (mode === 'image-to-video' && !imagePreview) return
@@ -269,6 +356,13 @@ export default function VideoGenerator({
                 </option>
               ))}
             </optgroup>
+            <optgroup label="Google (direct)">
+              {geminiModels.map((model) => (
+                <option key={model.id} value={model.id}>
+                  {model.name}
+                </option>
+              ))}
+            </optgroup>
           </select>
         </div>
 
@@ -311,6 +405,49 @@ export default function VideoGenerator({
           </div>
         )}
 
+        {omniInRun && (
+          <div>
+            <label className="block text-sm font-medium text-gray-300 mb-2">
+              Reference video{' '}
+              <span className="text-gray-500 font-normal">
+                (optional — Gemini Omni Flash only, short clips)
+              </span>
+            </label>
+            {attachedVideo ? (
+              <div className="relative">
+                <video
+                  src={attachedVideo.dataUrl}
+                  controls
+                  muted
+                  className="w-full max-h-64 rounded-lg border border-gray-700"
+                />
+                <button
+                  onClick={clearVideo}
+                  disabled={isGenerating}
+                  className="absolute top-2 right-2 p-1 bg-gray-900/80 hover:bg-gray-800 rounded-full text-white disabled:opacity-50"
+                >
+                  <X className="w-4 h-4" />
+                </button>
+              </div>
+            ) : (
+              <button
+                onClick={() => videoInputRef.current?.click()}
+                className="w-full p-6 border-2 border-dashed border-gray-600 hover:border-gray-500 rounded-lg text-gray-400 hover:text-gray-300 transition-colors flex flex-col items-center gap-2"
+              >
+                <Upload className="w-6 h-6" />
+                <span>Click to attach a video clip</span>
+              </button>
+            )}
+            <input
+              ref={videoInputRef}
+              type="file"
+              accept="video/*"
+              onChange={handleVideoSelect}
+              className="hidden"
+            />
+          </div>
+        )}
+
         <div>
           <div className="flex items-center justify-between mb-2">
             <label className="text-sm font-medium text-gray-300">Prompt</label>
@@ -437,6 +574,37 @@ export default function VideoGenerator({
                         </p>
                       )
                     )}
+                    {model?.provider === 'gemini' && (
+                      <div className="flex gap-2">
+                        <input
+                          type="text"
+                          value={editPrompts[modelId] ?? ''}
+                          onChange={(e) =>
+                            setEditPrompts((prev) => ({
+                              ...prev,
+                              [modelId]: e.target.value,
+                            }))
+                          }
+                          onKeyDown={(e) => {
+                            if (e.key === 'Enter')
+                              handleEditVideo(modelId, state.jobId)
+                          }}
+                          placeholder="Describe an edit — e.g. 'make it nighttime'..."
+                          disabled={isGenerating}
+                          className="flex-1 px-3 py-2 bg-gray-800 border border-gray-700 rounded-lg text-white text-sm placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent disabled:opacity-50"
+                        />
+                        <button
+                          onClick={() => handleEditVideo(modelId, state.jobId)}
+                          disabled={
+                            isGenerating || !editPrompts[modelId]?.trim()
+                          }
+                          className="px-4 py-2 bg-purple-600 hover:bg-purple-700 disabled:bg-gray-700 disabled:cursor-not-allowed text-white text-sm font-medium rounded-lg transition-colors flex items-center gap-1.5"
+                        >
+                          <Wand2 className="w-4 h-4" />
+                          Edit
+                        </button>
+                      </div>
+                    )}
                   </>
                 )}
               </div>
diff --git a/examples/ts-react-media/src/lib/media.ts b/examples/ts-react-media/src/lib/media.ts
index 40d82c039..65bec82b9 100644
--- a/examples/ts-react-media/src/lib/media.ts
+++ b/examples/ts-react-media/src/lib/media.ts
@@ -1,22 +1,23 @@
 import type { MediaInputMetadata, MediaPromptPart } from '@tanstack/ai/client'
 
 /**
- * An image the user attached as conditioning input. `dataUrl` is the full
- * `data:<mime>;base64,...` string used directly for the thumbnail preview;
- * `base64` is the same payload with the prefix stripped for the prompt part.
+ * A media file (image or video) the user attached as conditioning input.
+ * `dataUrl` is the full `data:<mime>;base64,...` string used directly for
+ * the thumbnail preview; `base64` is the same payload with the prefix
+ * stripped for the prompt part.
  */
-export interface AttachedImage {
+export interface AttachedMedia {
   id: string
   name: string
   mimeType: string
-  /** Full data URL, used for the <img> preview. */
+  /** Full data URL, used for the <img> / <video> preview. */
   dataUrl: string
   /** Base64 payload without the `data:` prefix, used for the prompt part. */
   base64: string
 }
 
-/** Reads a File into an AttachedImage (data URL preview + raw base64 payload). */
-export function readImageFile(file: File): Promise<AttachedImage> {
+/** Reads a File into an AttachedMedia (data URL preview + raw base64 payload). */
+export function readMediaFile(file: File): Promise<AttachedMedia> {
   return new Promise((resolve, reject) => {
     const reader = new FileReader()
     reader.onerror = () =>
@@ -42,7 +43,7 @@ export function readImageFile(file: File): Promise<AttachedImage> {
 
 /** Builds an image prompt part from an attached image, with optional role hint. */
 export function toImagePart(
-  image: AttachedImage,
+  image: AttachedMedia,
   metadata?: MediaInputMetadata,
 ): MediaPromptPart {
   return {
@@ -52,6 +53,22 @@ export function toImagePart(
   }
 }
 
+/**
+ * Builds a video prompt part from an attached video clip — e.g. a reference
+ * clip or a video to edit for Gemini Omni Flash, which accepts video inputs
+ * alongside text and images.
+ */
+export function toVideoPart(
+  video: AttachedMedia,
+  metadata?: MediaInputMetadata,
+): MediaPromptPart {
+  return {
+    type: 'video',
+    source: { type: 'data', value: video.base64, mimeType: video.mimeType },
+    ...(metadata ? { metadata } : {}),
+  }
+}
+
 /**
  * Builds an image prompt part from a URL string — either a remote URL
  * (passed through as a `url` source) or a `data:` URL (decomposed into a
diff --git a/examples/ts-react-media/src/lib/models.ts b/examples/ts-react-media/src/lib/models.ts
index 5947febe5..a272f23dd 100644
--- a/examples/ts-react-media/src/lib/models.ts
+++ b/examples/ts-react-media/src/lib/models.ts
@@ -162,6 +162,22 @@ export const VIDEO_MODELS = [
     mode: 'image-to-video' as const,
     provider: 'fal' as const,
   },
+  {
+    id: 'gemini-omni-flash-preview',
+    name: 'Gemini Omni Flash (Text-to-Video)',
+    description:
+      'Google multimodal video generation with conversational editing, via the Interactions API (10s, 720p)',
+    mode: 'text-to-video' as const,
+    provider: 'gemini' as const,
+  },
+  {
+    id: 'gemini-omni-flash-preview/image-to-video',
+    name: 'Gemini Omni Flash (Image-to-Video)',
+    description:
+      'Animate an image with Gemini Omni Flash via the Interactions API',
+    mode: 'image-to-video' as const,
+    provider: 'gemini' as const,
+  },
 ] as const
 
 export type ImageModel = (typeof IMAGE_MODELS)[number]
diff --git a/examples/ts-react-media/src/lib/server-functions.ts b/examples/ts-react-media/src/lib/server-functions.ts
index 1b3b52639..8f45f0834 100644
--- a/examples/ts-react-media/src/lib/server-functions.ts
+++ b/examples/ts-react-media/src/lib/server-functions.ts
@@ -1,6 +1,6 @@
 import { createServerFn } from '@tanstack/react-start'
 import { falImage, falVideo } from '@tanstack/ai-fal'
-import { geminiImage } from '@tanstack/ai-gemini'
+import { geminiImage, geminiVideo } from '@tanstack/ai-gemini'
 import { grokImage, grokVideo } from '@tanstack/ai-grok'
 import { generateImage, generateVideo, getVideoJobStatus } from '@tanstack/ai'
 
@@ -9,12 +9,19 @@ import type {
   MediaInputMetadata,
   MediaPrompt,
   TextPart,
+  VideoPart,
 } from '@tanstack/ai/client'
 
 /** A prompt restricted to text — accepted by every (incl. text-only) model. */
 type TextPrompt = string | Array<TextPart>
 /** A prompt of text + image parts — accepted by image-conditioned models. */
 type ImagePrompt = string | Array<TextPart | ImagePart<MediaInputMetadata>>
+/** A prompt of text + image + video parts — Gemini Omni Flash accepts all three. */
+type OmniPrompt =
+  | string
+  | Array<
+      TextPart | ImagePart<MediaInputMetadata> | VideoPart<MediaInputMetadata>
+    >
 
 /** True when the prompt carries text — a non-empty string or any prompt part. */
 function hasPromptContent(prompt: MediaPrompt): boolean {
@@ -50,6 +57,19 @@ function asTextPrompt(prompt: MediaPrompt): TextPrompt {
   })
 }
 
+/**
+ * Narrows a wire `MediaPrompt` for Gemini Omni Flash, which accepts text,
+ * image, and video prompt parts (audio would be the only rejected kind).
+ */
+function asOmniPrompt(prompt: MediaPrompt): OmniPrompt {
+  if (typeof prompt === 'string') return prompt
+  return prompt.map((part) => {
+    if (part.type === 'text' || part.type === 'image' || part.type === 'video')
+      return part
+    throw new Error(`Unsupported prompt part for Omni Flash: ${part.type}`)
+  })
+}
+
 /**
  * Like `asImagePrompt`, but additionally requires at least one image part —
  * image-to-video endpoints need a start frame.
@@ -79,6 +99,11 @@ function videoAdapterForModel(model: string) {
   if (model === 'grok-imagine-video-1.5/image-to-video') {
     return grokVideo('grok-imagine-video-1.5')
   }
+  if (model.startsWith('gemini-omni-flash-preview')) {
+    // Both UI entries (text-to-video and image-to-video) run on the one
+    // Omni model over the Interactions API (GEMINI_API_KEY).
+    return geminiVideo('gemini-omni-flash-preview')
+  }
   return falVideo(model)
 }
 
@@ -206,11 +231,21 @@ export const generateImageFn = createServerFn({ method: 'POST' })
   })
 
 export const createVideoJobFn = createServerFn({ method: 'POST' })
-  .inputValidator((data: { prompt: MediaPrompt; model: string }) => {
-    if (!hasPromptContent(data.prompt)) throw new Error('Prompt is required')
-    if (!data.model) throw new Error('Model is required')
-    return data
-  })
+  .inputValidator(
+    (data: {
+      prompt: MediaPrompt
+      model: string
+      /**
+       * Gemini Omni Flash conversational editing: the jobId (interaction id)
+       * of a prior Omni generation to refine. Ignored by other models.
+       */
+      previousInteractionId?: string
+    }) => {
+      if (!hasPromptContent(data.prompt)) throw new Error('Prompt is required')
+      if (!data.model) throw new Error('Model is required')
+      return data
+    },
+  )
   .handler(async ({ data }) => {
     // Image-to-video models receive the start frame as a prompt part
     // (role: 'start_frame') — the fal adapter routes it to the endpoint's
@@ -317,6 +352,36 @@ export const createVideoJobFn = createServerFn({ method: 'POST' })
           size: '16:9_2160p',
         })
       }
+      // Gemini Omni Flash (Interactions API, GEMINI_API_KEY). One model
+      // serves both UI entries; it accepts text, image, AND video prompt
+      // parts (sent as interaction content blocks in order). Clips are a
+      // fixed 10s at 720p; `size` is the output aspect ratio. Passing
+      // `previous_interaction_id` chains a prompt onto a prior generation
+      // for conversational editing.
+      case 'gemini-omni-flash-preview':
+      case 'gemini-omni-flash-preview/image-to-video': {
+        const prompt = asOmniPrompt(data.prompt)
+        if (
+          data.model.endsWith('/image-to-video') &&
+          !data.previousInteractionId &&
+          (typeof prompt === 'string' ||
+            !prompt.some((part) => part.type === 'image'))
+        ) {
+          throw new Error('Start image is required for image-to-video')
+        }
+        return generateVideo({
+          adapter: geminiVideo('gemini-omni-flash-preview'),
+          prompt,
+          size: '16:9',
+          ...(data.previousInteractionId
+            ? {
+                modelOptions: {
+                  previous_interaction_id: data.previousInteractionId,
+                },
+              }
+            : {}),
+        })
+      }
       default:
         throw new Error(`Unknown video model: ${data.model}`)
     }
diff --git a/packages/ai/src/activities/generateVideo/index.ts b/packages/ai/src/activities/generateVideo/index.ts
index 3c30803c9..3386a88b1 100644
--- a/packages/ai/src/activities/generateVideo/index.ts
+++ b/packages/ai/src/activities/generateVideo/index.ts
@@ -47,7 +47,7 @@ export const kind = 'video' as const
  * Extract provider options from a VideoAdapter via ~types.
  */
 export type VideoProviderOptions<TAdapter> =
-  TAdapter extends VideoAdapter<any, any, any, any>
+  TAdapter extends VideoAdapter<any, any, any, any, any, any>
     ? TAdapter['~types']['providerOptions']
     : object
 
@@ -55,7 +55,14 @@ export type VideoProviderOptions<TAdapter> =
  * Extract the size type for a VideoAdapter's model via ~types.
  */
 export type VideoSizeForAdapter<TAdapter> =
-  TAdapter extends VideoAdapter<infer TModel, any, any, infer TSizeMap>
+  TAdapter extends VideoAdapter<
+    infer TModel,
+    any,
+    any,
+    infer TSizeMap,
+    any,
+    any
+  >
     ? TModel extends keyof TSizeMap
       ? TSizeMap[TModel]
       : string
@@ -68,7 +75,14 @@ export type VideoSizeForAdapter<TAdapter> =
  * without a map fall back to the full MediaPrompt.
  */
 export type VideoPromptForAdapter<TAdapter> =
-  TAdapter extends VideoAdapter<infer TModel, any, any, any, infer ModsByName>
+  TAdapter extends VideoAdapter<
+    infer TModel,
+    any,
+    any,
+    any,
+    infer ModsByName,
+    any
+  >
     ? string extends keyof ModsByName
       ? MediaPrompt
       : TModel extends keyof ModsByName
@@ -108,7 +122,7 @@ function createId(prefix: string): string {
  * The model is extracted from the adapter's model property.
  */
 interface VideoActivityBaseOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 > {
   /** The video adapter to use (must be created with a model) */
   adapter: TAdapter & { kind: typeof kind }
@@ -124,7 +138,7 @@ interface VideoActivityBaseOptions<
  * @experimental Video generation is an experimental feature and may change.
  */
 export type VideoCreateOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
   TStream extends boolean = false,
 > = VideoActivityBaseOptions<TAdapter> & {
   /** Request type - create a new job (default if not specified) */
@@ -191,7 +205,7 @@ export type VideoCreateOptions<
  * @experimental Video generation is an experimental feature and may change.
  */
 export interface VideoStatusOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 > extends VideoActivityBaseOptions<TAdapter> {
   /** Request type - get job status */
   request: 'status'
@@ -205,7 +219,7 @@ export interface VideoStatusOptions<
  * @experimental Video generation is an experimental feature and may change.
  */
 export interface VideoUrlOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 > extends VideoActivityBaseOptions<TAdapter> {
   /** Request type - get video URL */
   request: 'url'
@@ -220,7 +234,7 @@ export interface VideoUrlOptions<
  * @experimental Video generation is an experimental feature and may change.
  */
 export type VideoActivityOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
   TRequest extends 'create' | 'status' | 'url' = 'create',
   TStream extends boolean = false,
 > = TRequest extends 'status'
@@ -296,7 +310,7 @@ export type VideoActivityResult<
  * ```
  */
 export function generateVideo<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
   TStream extends boolean = false,
 >(
   options: VideoCreateOptions<TAdapter, TStream>,
@@ -314,7 +328,7 @@ export function generateVideo<
  * Internal implementation of non-streaming video job creation.
  */
 async function runCreateVideoJob<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 >(options: VideoCreateOptions<TAdapter, boolean>): Promise<VideoJobResult> {
   const { adapter, prompt, size, duration, modelOptions, middleware } = options
   const model = adapter.model
@@ -383,7 +397,7 @@ function sleep(ms: number): Promise<void> {
  * Handles the full job lifecycle: create job → poll for status → stream updates → yield final result.
  */
 async function* runStreamingVideoGeneration<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 >(options: VideoCreateOptions<TAdapter, true>): AsyncIterable<StreamChunk> {
   const { adapter, prompt, size, duration, modelOptions, middleware } = options
   const model = adapter.model
@@ -582,7 +596,7 @@ async function* runStreamingVideoGeneration<
  * ```
  */
 export async function getVideoJobStatus<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
 >(options: {
   adapter: TAdapter & { kind: typeof kind }
   jobId: string
@@ -692,7 +706,7 @@ export async function getVideoJobStatus<
  * Create typed options for the generateVideo() function without executing.
  */
 export function createVideoOptions<
-  TAdapter extends VideoAdapter<string, any, any, any>,
+  TAdapter extends VideoAdapter<string, any, any, any, any, any>,
   TStream extends boolean = false,
 >(
   options: VideoCreateOptions<TAdapter, TStream>,

From 68b1340a6458e71459d299c4b55962073fc1ad64 Mon Sep 17 00:00:00 2001
From: Tom Beckenham <34339192+tombeckenham@users.noreply.github.com>
Date: Thu, 2 Jul 2026 21:03:08 +1000
Subject: [PATCH 3/4] feat(ai-gemini): support Omni Flash 3-10s clip durations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The issue's live verification concluded Omni clips were a fixed 10
seconds, but response_format.duration is a real request field — just
undocumented. Verified against the live API: it takes a "<seconds>s"
string, accepts any value in the 3-10s range including fractional
seconds (a 3s request returns a 3.008s MP4 per ffprobe), rejects
out-of-range values with explicit minimum/maximum errors, and defaults
to 10s when omitted.

Omni's duration is now typed number with availableDurations() =
{ kind: 'range', min: 3, max: 10, unit: 'seconds' } and snapDuration
clamping into it; the adapter maps the generateVideo duration option
onto response_format.duration.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .changeset/gemini-omni-flash-video.md         |  2 +-
 docs/media/video-generation.md                | 10 ++++--
 packages/ai-gemini/src/adapters/video.ts      | 24 ++++++++++----
 .../src/video/video-provider-options.ts       | 25 ++++++++++-----
 .../ai-gemini/tests/video-adapter.test.ts     | 32 +++++++++++++------
 .../skills/ai-core/media-generation/SKILL.md  |  5 +--
 6 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/.changeset/gemini-omni-flash-video.md b/.changeset/gemini-omni-flash-video.md
index dbd040bd3..74d04a3da 100644
--- a/.changeset/gemini-omni-flash-video.md
+++ b/.changeset/gemini-omni-flash-video.md
@@ -2,4 +2,4 @@
 '@tanstack/ai-gemini': minor
 ---
 
-Add Gemini Omni Flash (`gemini-omni-flash-preview`) video generation via the Interactions API. Omni only serves the Interactions API (`generateContent` rejects it), so the video adapter now routes by model: Veo models keep the `:predictLongRunning` operations flow, while `geminiVideo('gemini-omni-flash-preview')` creates a background interaction with `response_modalities: ['video']`, polls it by id, and returns the inline base64 MP4 as a `data:` URL (Files-API URI delivery passes through). Usage is mapped from the interaction's `output_tokens_by_modality`. Image and video prompt parts are sent as interaction content blocks, and `modelOptions.previous_interaction_id` chains a new prompt onto a prior Omni generation for conversational video editing. The top-level `size` option maps onto `response_format.aspect_ratio` (`'16:9' | '9:16'`); clips are a fixed 10 seconds today. Raises the `@google/genai` floor to `^2.10.0` for the Interactions API surface.
+Add Gemini Omni Flash (`gemini-omni-flash-preview`) video generation via the Interactions API. Omni only serves the Interactions API (`generateContent` rejects it), so the video adapter now routes by model: Veo models keep the `:predictLongRunning` operations flow, while `geminiVideo('gemini-omni-flash-preview')` creates a background interaction with `response_modalities: ['video']`, polls it by id, and returns the inline base64 MP4 as a `data:` URL (Files-API URI delivery passes through). Usage is mapped from the interaction's `output_tokens_by_modality`. Image and video prompt parts are sent as interaction content blocks, and `modelOptions.previous_interaction_id` chains a new prompt onto a prior Omni generation for conversational video editing. The top-level `size` option maps onto `response_format.aspect_ratio` (`'16:9' | '9:16'`) and `duration` onto `response_format.duration` — any value in the 3–10 second range (fractional seconds included, verified against the live API), defaulting to a 10-second clip when omitted. Raises the `@google/genai` floor to `^2.10.0` for the Interactions API surface.
diff --git a/docs/media/video-generation.md b/docs/media/video-generation.md
index 2b4645986..8b6e8418a 100644
--- a/docs/media/video-generation.md
+++ b/docs/media/video-generation.md
@@ -582,9 +582,12 @@ finished clip comes back **inline as a `data:video/mp4;base64,…` URL**
 (when Google delivers by reference instead, the Files API URI passes
 through and needs your API key to download, like Veo).
 
-Clips are 720p at 24 FPS and a fixed **10 seconds** today (`duration` is
-typed as `10`; `snapDuration(n)` always returns `10`). The `size` option
-maps onto the interaction's output aspect ratio:
+Clips are 720p at 24 FPS, and `duration` accepts any value in the **3–10
+second** range (fractional seconds included), defaulting to 10 seconds when
+omitted. `availableDurations()` reports
+`{ kind: 'range', min: 3, max: 10, unit: 'seconds' }` and `snapDuration(n)`
+clamps raw seconds into it. The `size` option maps onto the interaction's
+output aspect ratio:
 
 ```typescript ignore
 import { generateVideo, getVideoJobStatus } from '@tanstack/ai'
@@ -596,6 +599,7 @@ const { jobId } = await generateVideo({
   adapter,
   prompt: 'A woman playing violin outdoors at golden hour',
   size: '9:16', // aspect ratio: '16:9' (default) or '9:16'
+  duration: 6, // 3-10 seconds; omit for the 10s default
 })
 
 const status = await getVideoJobStatus({ adapter, jobId })
diff --git a/packages/ai-gemini/src/adapters/video.ts b/packages/ai-gemini/src/adapters/video.ts
index 75c25a69f..03946a6e5 100644
--- a/packages/ai-gemini/src/adapters/video.ts
+++ b/packages/ai-gemini/src/adapters/video.ts
@@ -313,7 +313,7 @@ export class GeminiVideoAdapter<
       GeminiVideoModelDurationByName[TModel]
     >,
   ): Promise<VideoJobResult> {
-    const { prompt, size, logger } = options
+    const { prompt, size, duration, logger } = options
     const modelOptions = options.modelOptions as
       | GeminiOmniVideoProviderOptions
       | undefined
@@ -340,18 +340,28 @@ export class GeminiVideoAdapter<
         )
       }
 
+      // Aspect ratio and clip length ride on `response_format`. Duration is
+      // a `"<seconds>s"` string, accepted anywhere in the 3–10s range
+      // (fractional included) and defaulting to 10s when omitted — verified
+      // against the live API; the docs don't publish the field.
+      const responseFormat =
+        size !== undefined || duration !== undefined
+          ? {
+              response_format: {
+                type: 'video' as const,
+                ...(size !== undefined && { aspect_ratio: size }),
+                ...(duration !== undefined && { duration: `${duration}s` }),
+              },
+            }
+          : {}
+
       const interaction = await this.client.interactions.create({
         ...modelOptions,
         model: this.model,
         input: [{ type: 'user_input', content }],
         response_modalities: ['video'],
         background: true,
-        // Omni's clip length is fixed (10s) and not a request field, so the
-        // typed `duration` option is compile-time-only here. Aspect ratio is
-        // the one output knob the API exposes today.
-        ...(size !== undefined && {
-          response_format: { type: 'video' as const, aspect_ratio: size },
-        }),
+        ...responseFormat,
       })
 
       if (!interaction.id) {
diff --git a/packages/ai-gemini/src/video/video-provider-options.ts b/packages/ai-gemini/src/video/video-provider-options.ts
index a99ae4d6c..b03e8652d 100644
--- a/packages/ai-gemini/src/video/video-provider-options.ts
+++ b/packages/ai-gemini/src/video/video-provider-options.ts
@@ -145,8 +145,9 @@ export type GeminiVideoModelInputModalitiesByName = {
 
 /**
  * Per-model duration unions (seconds, as numbers — Veo's
- * `parameters.durationSeconds` field is numeric; Omni Flash clips are a
- * fixed 10 seconds today, with longer durations "coming soon" per Google).
+ * `parameters.durationSeconds` field is numeric; Omni Flash accepts a
+ * continuous 3–10 second range, fractional seconds included, so it stays
+ * `number` and is clamped by the range entry below).
  *
  * @experimental Video generation is an experimental feature and may change.
  */
@@ -154,18 +155,21 @@ export type GeminiVideoModelDurationByName = {
   'veo-3.1-generate-preview': 4 | 6 | 8
   'veo-3.1-fast-generate-preview': 4 | 6 | 8
   'veo-3.1-lite-generate-preview': 4 | 6 | 8
-  'gemini-omni-flash-preview': 10
+  'gemini-omni-flash-preview': number
 }
 
 /**
  * Runtime duration table backing `availableDurations()` / `snapDuration()`.
  *
- * Curated from the official docs
- * (https://ai.google.dev/gemini-api/docs/video,
- * https://ai.google.dev/gemini-api/docs/omni) — the Gemini OpenAPI spec
+ * Veo values are curated from the official docs
+ * (https://ai.google.dev/gemini-api/docs/video) — the Gemini OpenAPI spec
  * types the `:predictLongRunning` request's `parameters` as unconstrained,
  * so it carries no per-model duration information to derive these from.
- * Omni Flash has no duration request field at all; clips are 10 seconds.
+ * Omni Flash's 3–10s range was verified against the live API
+ * (2026-07-02): `response_format.duration` takes a `"<seconds>s"` string,
+ * fractional values are accepted, out-of-range values are rejected with
+ * "minimum allowed 3s" / "maximum allowed 10s", and omitting it defaults
+ * to a 10-second clip.
  *
  * @experimental Video generation is an experimental feature and may change.
  */
@@ -177,7 +181,12 @@ export const GEMINI_VIDEO_DURATIONS: {
   'veo-3.1-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
   'veo-3.1-fast-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
   'veo-3.1-lite-generate-preview': { kind: 'discrete', values: [4, 6, 8] },
-  'gemini-omni-flash-preview': { kind: 'discrete', values: [10] },
+  'gemini-omni-flash-preview': {
+    kind: 'range',
+    min: 3,
+    max: 10,
+    unit: 'seconds',
+  },
 }
 
 /**
diff --git a/packages/ai-gemini/tests/video-adapter.test.ts b/packages/ai-gemini/tests/video-adapter.test.ts
index 41889ecc5..17541f709 100644
--- a/packages/ai-gemini/tests/video-adapter.test.ts
+++ b/packages/ai-gemini/tests/video-adapter.test.ts
@@ -104,7 +104,9 @@ describe('Gemini Video Adapter', () => {
       for (const model of Object.keys(
         GEMINI_VIDEO_DURATIONS,
       ) as Array<GeminiVideoModel>) {
-        expect(getGeminiVideoDurationOptions(model).kind).toBe('discrete')
+        expect(['discrete', 'range']).toContain(
+          getGeminiVideoDurationOptions(model).kind,
+        )
       }
     })
   })
@@ -565,21 +567,28 @@ class StubbedGeminiOmniVideoAdapter extends GeminiVideoAdapter<'gemini-omni-flas
 
 describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
   describe('durations', () => {
-    it('reports the fixed 10-second clip length', () => {
+    it('reports the 3-10 second range and clamps raw seconds into it', () => {
       const adapter = createGeminiVideo('gemini-omni-flash-preview', 'test-key')
       expect(adapter.availableDurations()).toEqual({
-        kind: 'discrete',
-        values: [10],
+        kind: 'range',
+        min: 3,
+        max: 10,
+        unit: 'seconds',
       })
-      expect(adapter.snapDuration(3)).toBe(10)
+      expect(adapter.snapDuration(1)).toBe(3)
+      expect(adapter.snapDuration(7)).toBe(7)
       expect(adapter.snapDuration(60)).toBe(10)
     })
 
-    it('types duration as the fixed 10-second literal at compile time', () => {
+    it('types duration as number (continuous range) at compile time', () => {
       const omni = createGeminiVideo('gemini-omni-flash-preview', 'test-key')
-      expectTypeOf(omni.snapDuration).returns.toEqualTypeOf<10 | undefined>()
+      expectTypeOf(omni.snapDuration).returns.toEqualTypeOf<
+        number | undefined
+      >()
       type OmniOptions = Parameters<typeof omni.createVideoJob>[0]
-      expectTypeOf<OmniOptions['duration']>().toEqualTypeOf<10 | undefined>()
+      expectTypeOf<OmniOptions['duration']>().toEqualTypeOf<
+        number | undefined
+      >()
     })
   })
 
@@ -592,6 +601,7 @@ describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
         model: 'gemini-omni-flash-preview',
         prompt: 'a sunset over the ocean',
         size: '9:16',
+        duration: 8,
         logger: testLogger,
       })
 
@@ -609,7 +619,11 @@ describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
         ],
         response_modalities: ['video'],
         background: true,
-        response_format: { type: 'video', aspect_ratio: '9:16' },
+        response_format: {
+          type: 'video',
+          aspect_ratio: '9:16',
+          duration: '8s',
+        },
       })
     })
 
diff --git a/packages/ai/skills/ai-core/media-generation/SKILL.md b/packages/ai/skills/ai-core/media-generation/SKILL.md
index 8da02856c..ebede1d1b 100644
--- a/packages/ai/skills/ai-core/media-generation/SKILL.md
+++ b/packages/ai/skills/ai-core/media-generation/SKILL.md
@@ -469,8 +469,9 @@ const { jobId } = await generateVideo({
 
 Gemini Omni Flash (`geminiVideo('gemini-omni-flash-preview')`) is served by
 the Interactions API instead of Veo's operations flow — same adapter, routed
-by model. Clips are a fixed 10s at 720p (`duration` is typed `10`), `size`
-is the aspect ratio (`'16:9' | '9:16'`), and the finished video arrives
+by model. Clips are 720p; `duration` is any number of seconds in the 3–10
+range (fractional ok, default 10 — availableDurations() reports the range),
+`size` is the aspect ratio (`'16:9' | '9:16'`), and the finished video arrives
 **inline** as a `data:video/mp4;base64,…` URL (no key needed to use it).
 Image/video prompt parts are sent as interaction content in order (no
 `metadata.role` routing); `data` sources go inline, `url` sources pass

From b427cc4195ee82d62e6a8546393c3b89df464b81 Mon Sep 17 00:00:00 2001
From: Tom Beckenham <34339192+tombeckenham@users.noreply.github.com>
Date: Fri, 3 Jul 2026 12:19:30 +1000
Subject: [PATCH 4/4] fix(ai-gemini): address PR review findings for Omni Flash
 video

- Reject out-of-range Omni durations at job creation with a clear local
  error instead of silently passing them to the live API
- Map requires_action interactions to a failed status so polling can't
  spin until timeout (reachable via previous_interaction_id chaining)
- Surface failed job statuses in the ts-react-media example instead of
  polling forever on a pending spinner
- Add a compile-time regression test guarding the generateVideo
  VideoAdapter generic-arity fix, plus unit tests for duration
  rejection, fractional pass-through, and requires_action mapping
- Fix stale doc/comment claims: Veo 2/3 model lists, "fixed 10s" clips,
  "clamped" duration wording, and content-block ordering (images, then
  videos, then text)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/media/video-generation.md                | 15 +++--
 .../src/components/VideoGenerator.tsx         | 13 ++++
 .../src/lib/server-functions.ts               |  5 +-
 packages/ai-gemini/src/adapters/video.ts      | 37 +++++++++--
 packages/ai-gemini/src/model-meta.ts          |  2 +-
 .../src/video/video-provider-options.ts       |  3 +-
 .../ai-gemini/tests/video-adapter.test.ts     | 66 +++++++++++++++++++
 .../skills/ai-core/media-generation/SKILL.md  |  7 +-
 8 files changed, 130 insertions(+), 18 deletions(-)

diff --git a/docs/media/video-generation.md b/docs/media/video-generation.md
index 8b6e8418a..2386de6fa 100644
--- a/docs/media/video-generation.md
+++ b/docs/media/video-generation.md
@@ -42,7 +42,7 @@ TanStack AI provides experimental support for video generation through dedicated
 
 Currently supported:
 - **OpenAI**: Sora-2 and Sora-2-Pro models (when available)
-- **Google Gemini**: Veo 3.1, Veo 3, and Veo 2 models (via the long-running operations API), and Gemini Omni Flash (via the Interactions API)
+- **Google Gemini**: Veo 3.1 models (via the long-running operations API), and Gemini Omni Flash (via the Interactions API)
 - **Grok (xAI)**: grok-imagine-video (text-to-video + image-to-video) and grok-imagine-video-1.5 (image-to-video only) models
 - **fal.ai**: MiniMax, Luma, Kling, Hunyuan, and other hosted video models
 
@@ -585,9 +585,11 @@ through and needs your API key to download, like Veo).
 Clips are 720p at 24 FPS, and `duration` accepts any value in the **3–10
 second** range (fractional seconds included), defaulting to 10 seconds when
 omitted. `availableDurations()` reports
-`{ kind: 'range', min: 3, max: 10, unit: 'seconds' }` and `snapDuration(n)`
-clamps raw seconds into it. The `size` option maps onto the interaction's
-output aspect ratio:
+`{ kind: 'range', min: 3, max: 10, unit: 'seconds' }`; out-of-range
+`duration` values are rejected at job creation, and `snapDuration(n)` snaps
+raw seconds into the range (clamping to its bounds and rounding to whole
+seconds). The `size` option maps onto the interaction's output aspect
+ratio:
 
 ```typescript ignore
 import { generateVideo, getVideoJobStatus } from '@tanstack/ai'
@@ -607,8 +609,9 @@ const status = await getVideoJobStatus({ adapter, jobId })
 ```
 
 Image and video prompt parts are sent to the interaction as content blocks
-in order (Omni doesn't use Veo's `metadata.role` routing), so you can
-condition the generation on stills or short reference clips. `data` sources
+— grouped as images, then videos, then the text prompt (Omni doesn't use
+Veo's `metadata.role` routing) — so you can condition the generation on
+stills or short reference clips. `data` sources
 are sent inline as base64; `url` sources pass through as-is — the adapter
 never downloads them, so use Gemini Files API URIs (upload large media via
 the Files API first).
diff --git a/examples/ts-react-media/src/components/VideoGenerator.tsx b/examples/ts-react-media/src/components/VideoGenerator.tsx
index 73696175f..f59063fd7 100644
--- a/examples/ts-react-media/src/components/VideoGenerator.tsx
+++ b/examples/ts-react-media/src/components/VideoGenerator.tsx
@@ -145,6 +145,19 @@ export default function VideoGenerator({
             progress: status.progress,
           },
         }))
+      } else if (status.status === 'failed') {
+        const interval = pollingRefs.current.get(model)
+        if (interval) {
+          clearInterval(interval)
+          pollingRefs.current.delete(model)
+        }
+        setJobStates((prev) => ({
+          ...prev,
+          [model]: {
+            status: 'error',
+            message: status.error ?? 'Video generation failed',
+          },
+        }))
       } else {
         setJobStates((prev) => ({
           ...prev,
diff --git a/examples/ts-react-media/src/lib/server-functions.ts b/examples/ts-react-media/src/lib/server-functions.ts
index 8f45f0834..91ef53ed8 100644
--- a/examples/ts-react-media/src/lib/server-functions.ts
+++ b/examples/ts-react-media/src/lib/server-functions.ts
@@ -354,8 +354,9 @@ export const createVideoJobFn = createServerFn({ method: 'POST' })
       }
       // Gemini Omni Flash (Interactions API, GEMINI_API_KEY). One model
       // serves both UI entries; it accepts text, image, AND video prompt
-      // parts (sent as interaction content blocks in order). Clips are a
-      // fixed 10s at 720p; `size` is the output aspect ratio. Passing
+      // parts (sent as interaction content blocks: images, then videos,
+      // then text). Clips are 3–10s at 720p (default 10s when `duration`
+      // is omitted); `size` is the output aspect ratio. Passing
       // `previous_interaction_id` chains a prompt onto a prior generation
       // for conversational editing.
       case 'gemini-omni-flash-preview':
diff --git a/packages/ai-gemini/src/adapters/video.ts b/packages/ai-gemini/src/adapters/video.ts
index 03946a6e5..9b2661bd6 100644
--- a/packages/ai-gemini/src/adapters/video.ts
+++ b/packages/ai-gemini/src/adapters/video.ts
@@ -206,7 +206,8 @@ function interactionUsageToTokenUsage(
  * `response_modalities: ['video']`, `getVideoStatus` polls it by id, and
  * `getVideoUrl` returns the inline base64 MP4 as a `data:` URL (or the
  * Files API URI when the server delivers by reference). Image and video
- * prompt parts are sent as interaction content blocks in order; pass
+ * prompt parts are sent as interaction content blocks, grouped as images,
+ * then videos, then the text prompt (interleaving is not preserved); pass
  * `modelOptions.previous_interaction_id` to conversationally edit a prior
  * Omni generation.
  *
@@ -340,10 +341,24 @@ export class GeminiVideoAdapter<
         )
       }
 
+      // Reject out-of-range durations locally rather than snapping (which
+      // would silently change the clip length the caller asked for) or
+      // letting the live API reject them after the round trip.
+      const durations = this.availableDurations()
+      if (
+        duration !== undefined &&
+        durations.kind === 'range' &&
+        (duration < durations.min || duration > durations.max)
+      ) {
+        throw new Error(
+          `${this.name}.createVideoJob: duration ${duration}s is outside the ${durations.min}–${durations.max}s range supported by ${this.model}. Use snapDuration() to snap arbitrary values into range.`,
+        )
+      }
+
       // Aspect ratio and clip length ride on `response_format`. Duration is
       // a `"<seconds>s"` string, accepted anywhere in the 3–10s range
       // (fractional included) and defaulting to 10s when omitted — verified
-      // against the live API; the docs don't publish the field.
+      // against the live API; the docs don't publish the range constraints.
       const responseFormat =
         size !== undefined || duration !== undefined
           ? {
@@ -475,7 +490,11 @@ export class GeminiVideoAdapter<
    * Poll an Omni background interaction. `in_progress` maps to
    * 'processing'; a `completed` interaction with no video content (e.g.
    * filtered output) is surfaced as a failure so `getVideoUrl` doesn't
-   * throw on an empty response.
+   * throw on an empty response. `requires_action` also fails: the adapter
+   * never sends tools, so it can only arise via
+   * `previous_interaction_id` chaining onto a tool-bearing interaction —
+   * and such an interaction never progresses without a client response,
+   * so polling it would spin until timeout.
    */
   private async getInteractionsVideoStatus(
     jobId: string,
@@ -483,9 +502,17 @@ export class GeminiVideoAdapter<
     const interaction = await this.getInteraction(jobId)
     const status = interaction.status
 
-    if (status === 'in_progress' || status === 'requires_action') {
+    if (status === 'in_progress') {
       return { jobId, status: 'processing' }
     }
+    if (status === 'requires_action') {
+      return {
+        jobId,
+        status: 'failed',
+        error:
+          'Gemini Omni interaction is waiting on a client action (tool response), which the video jobs flow does not support.',
+      }
+    }
     if (status === 'completed') {
       if (!extractInteractionVideo(interaction)) {
         return {
@@ -548,7 +575,7 @@ export class GeminiVideoAdapter<
     const interaction = await this.getInteraction(jobId)
     const status = interaction.status
 
-    if (status === 'in_progress' || status === 'requires_action') {
+    if (status === 'in_progress') {
       throw new Error(
         `Video is not ready yet. Check status first. Job ID: ${jobId}`,
       )
diff --git a/packages/ai-gemini/src/model-meta.ts b/packages/ai-gemini/src/model-meta.ts
index 7174d38a2..d69af9609 100644
--- a/packages/ai-gemini/src/model-meta.ts
+++ b/packages/ai-gemini/src/model-meta.ts
@@ -717,7 +717,7 @@ const VEO_3_1_LITE_PREVIEW = {
  * editing. Serves only the Interactions API (`generateContent` rejects it),
  * so it routes through the interactions-based path of the video adapter,
  * not Veo's `:predictLongRunning` flow. Pricing is per second of generated
- * video ($0.10/sec). 720p / 24 FPS, 10-second clips.
+ * video ($0.10/sec). 720p / 24 FPS, 3–10 second clips (default 10s).
  * @experimental Omni video generation is an experimental feature and may change.
  */
 const GEMINI_OMNI_FLASH_PREVIEW = {
diff --git a/packages/ai-gemini/src/video/video-provider-options.ts b/packages/ai-gemini/src/video/video-provider-options.ts
index b03e8652d..487e5d2df 100644
--- a/packages/ai-gemini/src/video/video-provider-options.ts
+++ b/packages/ai-gemini/src/video/video-provider-options.ts
@@ -147,7 +147,8 @@ export type GeminiVideoModelInputModalitiesByName = {
  * Per-model duration unions (seconds, as numbers — Veo's
  * `parameters.durationSeconds` field is numeric; Omni Flash accepts a
  * continuous 3–10 second range, fractional seconds included, so it stays
- * `number` and is clamped by the range entry below).
+ * `number` — the adapter rejects out-of-range values at job creation,
+ * against the range entry below).
  *
  * @experimental Video generation is an experimental feature and may change.
  */
diff --git a/packages/ai-gemini/tests/video-adapter.test.ts b/packages/ai-gemini/tests/video-adapter.test.ts
index 17541f709..7c1ca3832 100644
--- a/packages/ai-gemini/tests/video-adapter.test.ts
+++ b/packages/ai-gemini/tests/video-adapter.test.ts
@@ -1,4 +1,5 @@
 import { describe, expect, expectTypeOf, it, vi } from 'vitest'
+import { generateVideo } from '@tanstack/ai'
 import { resolveDebugOption } from '@tanstack/ai/adapter-internals'
 import {
   GeminiVideoAdapter,
@@ -138,6 +139,25 @@ describe('Gemini Video Adapter', () => {
         4 | 6 | 8 | undefined
       >()
     })
+
+    it('satisfies generateVideo constraints despite narrowed per-model maps', () => {
+      // Regression guard for the VideoAdapter generic-arity fix:
+      // generateVideo's constraints once spelled `VideoAdapter<_, _, _, _>`
+      // (four generics), whose duration-map parameter defaulted to
+      // Record<string, number> — the closed-key
+      // GeminiVideoModelDurationByName is not assignable to that, so these
+      // instantiations failed to compile. They must keep compiling, with
+      // `duration` narrowed per model.
+      const veo3 = createGeminiVideo('veo-3.1-generate-preview', 'test-key')
+      type VeoCreate = Parameters<typeof generateVideo<typeof veo3>>[0]
+      expectTypeOf<VeoCreate['duration']>().toEqualTypeOf<
+        4 | 6 | 8 | undefined
+      >()
+
+      const omni = createGeminiVideo('gemini-omni-flash-preview', 'test-key')
+      type OmniCreate = Parameters<typeof generateVideo<typeof omni>>[0]
+      expectTypeOf<OmniCreate['duration']>().toEqualTypeOf<number | undefined>()
+    })
   })
 
   describe('createVideoJob', () => {
@@ -735,6 +755,41 @@ describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
         }),
       ).rejects.toThrow(/interaction id/)
     })
+
+    it('rejects out-of-range durations without calling the API', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      for (const duration of [2, 15]) {
+        await expect(
+          adapter.createVideoJob({
+            model: 'gemini-omni-flash-preview',
+            prompt: 'a sunset',
+            duration,
+            logger: testLogger,
+          }),
+        ).rejects.toThrow(/outside the 3–10s range/)
+      }
+      expect(stub.interactions.create).not.toHaveBeenCalled()
+    })
+
+    it('passes fractional in-range durations through verbatim', async () => {
+      const stub = createInteractionsClientStub()
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      await adapter.createVideoJob({
+        model: 'gemini-omni-flash-preview',
+        prompt: 'a sunset',
+        duration: 4.5,
+        logger: testLogger,
+      })
+
+      expect(stub.interactions.create).toHaveBeenCalledWith(
+        expect.objectContaining({
+          response_format: { type: 'video', duration: '4.5s' },
+        }),
+      )
+    })
   })
 
   describe('getVideoStatus', () => {
@@ -763,6 +818,17 @@ describe('Gemini Omni Flash Video Adapter (Interactions API)', () => {
       })
     })
 
+    it('maps requires_action to failed instead of polling forever', async () => {
+      const stub = createInteractionsClientStub({
+        getResult: { id: jobId, status: 'requires_action' },
+      })
+      const adapter = new StubbedGeminiOmniVideoAdapter(stub)
+
+      const status = await adapter.getVideoStatus(jobId)
+      expect(status.status).toBe('failed')
+      expect(status.error).toMatch(/client action/)
+    })
+
     it('maps a completed interaction without video output to failed', async () => {
       const stub = createInteractionsClientStub({
         getResult: {
diff --git a/packages/ai/skills/ai-core/media-generation/SKILL.md b/packages/ai/skills/ai-core/media-generation/SKILL.md
index ebede1d1b..3c94bd9e2 100644
--- a/packages/ai/skills/ai-core/media-generation/SKILL.md
+++ b/packages/ai/skills/ai-core/media-generation/SKILL.md
@@ -443,8 +443,8 @@ return toServerSentEventsResponse(stream)
 ```
 
 Google Veo (`@tanstack/ai-gemini`) uses the same jobs/polling flow. Its
-`duration` option is typed per model (e.g. `4 | 6 | 8` for Veo 3.x,
-`5 | 6 | 8` for Veo 2); use `adapter.snapDuration(seconds)` to coerce raw
+`duration` option is typed per model (`4 | 6 | 8` for the Veo 3.1 models);
+use `adapter.snapDuration(seconds)` to coerce raw
 seconds and `adapter.availableDurations()` to enumerate the valid set.
 Image prompt parts route by `metadata.role`: first un-roled /
 `'start_frame'` image → input image, `'end_frame'` → `lastFrame`,
@@ -473,7 +473,8 @@ by model. Clips are 720p; `duration` is any number of seconds in the 3–10
 range (fractional ok, default 10 — availableDurations() reports the range),
 `size` is the aspect ratio (`'16:9' | '9:16'`), and the finished video arrives
 **inline** as a `data:video/mp4;base64,…` URL (no key needed to use it).
-Image/video prompt parts are sent as interaction content in order (no
+Image/video prompt parts are sent as interaction content blocks, grouped
+as images, then videos, then text (no
 `metadata.role` routing); `data` sources go inline, `url` sources pass
 through as-is (never downloaded — use Gemini Files API URIs for remote
 media). For conversational editing, pass a prior generation's `jobId` as