Skip to content

Add Landmarks app: A multimodal trip planner using Gemini Live#250

Open
peterfriese wants to merge 4 commits into
mainfrom
peterfriese/landmarks
Open

Add Landmarks app: A multimodal trip planner using Gemini Live#250
peterfriese wants to merge 4 commits into
mainfrom
peterfriese/landmarks

Conversation

@peterfriese

@peterfriese peterfriese commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

This PR introduces the new Landmarks app, a functional prototype demonstrating a live conversation feature using video and audio powered by Gemini.

Key Features:

  • Multimodal Live Conversation: Talk to an AI travel guide with real-time audio and video context.
  • Optimized for Performance:
    • Zero Choke Ups: Implemented an automatic cache purge on startup and silenced high-frequency logging to prevent MDB_MAP_FULL storage errors.
    • Low Latency: Optimized for European users by routing through europe-west1 (Belgium).
    • Immediate Visuals: The model receives visual context immediately upon camera activation.
    • Balanced Throughput: Adjusted audio buffer sizes (16k) and video resolution (VGA, 0.5 FPS, 0.4 quality) for a stable, high-performance demo.
  • Robust UI Controls: Fully functional Mute and Pause controls that stay in sync with background media loops.
  • Intelligent Speech Handling: Disabled barge-in and restored transcription continuity for a more natural conversational flow.

- Disabled AI barge-in by forcing silence when AI is speaking.
- Fixed UI buttons (Mute, Pause) by using escaping closures for state.
- Reduced audio packet frequency (4k -> 16k buffer) to lower data pressure.
- Reduced video overhead: VGA resolution, 0.5 FPS, 0.4 JPEG quality.
- Implemented immediate first-frame delivery for visual context.
- Optimized for Europe by switching backend to europe-west1.
- Silenced high-frequency console logs to prevent lag.
- Implemented auto-cache purge on startup to prevent MDB_MAP_FULL errors.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Trip Planner iOS application showcasing guided generation and tool calling with Google's Gemini models via Firebase AI Logic. It includes features like live itinerary planning, real-time audio/video streaming, and Google Maps/Search integration. The code review feedback identifies several critical improvements, including fixing barge-in silence logic, offloading synchronous disk operations to a background thread during app launch, marking ModelData as @mainactor for Swift 6 concurrency safety, and ensuring proper connection state updates when the live session ends. Additionally, optimizations were suggested to remove dead code, redundant delays, and SwiftUI view hierarchy anti-patterns.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +58 to +63
if isUserSpeaking {
await session.sendAudioRealtime(pcmData)
} else if !isAISpeaking {
let silence = Data(count: pcmData.count)
await session.sendAudioRealtime(silence)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current barge-in prevention logic does not send silence when isAISpeaking is true because of the else if !isAISpeaking guard on line 60. Instead, it sends nothing. To actually force silence transmission when the AI is speaking (as described in the PR description), you should send silence whenever the user is not speaking.

Suggested change
if isUserSpeaking {
await session.sendAudioRealtime(pcmData)
} else if !isAISpeaking {
let silence = Data(count: pcmData.count)
await session.sendAudioRealtime(silence)
}
if isUserSpeaking {
await session.sendAudioRealtime(pcmData)
} else {
let silence = Data(count: pcmData.count)
await session.sendAudioRealtime(silence)
}

Comment on lines +41 to +54
private func clearFirebaseCache() {
let fileManager = FileManager.default
guard let cacheDir = fileManager.urls(for: .cachesDirectory, in: .userDomainMask).first else { return }

let foldersToClear = ["google-sdks-events", "google-app-measurement"]

for folder in foldersToClear {
let folderUrl = cacheDir.appendingPathComponent(folder)
if fileManager.fileExists(atPath: folderUrl.path) {
try? fileManager.removeItem(at: folderUrl)
print("DEBUG: Cleared internal Firebase storage at: \(folder)")
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Executing synchronous file system operations (clearFirebaseCache()) on the main thread during app launch can block the main thread and potentially cause watchdog crashes if the directory is large or the disk is slow. It is highly recommended to run these operations asynchronously on a background queue.

    private func clearFirebaseCache() {
        DispatchQueue.global(qos: .background).async {
            let fileManager = FileManager.default
            guard let cacheDir = fileManager.urls(for: .cachesDirectory, in: .userDomainMask).first else { return }
            
            let foldersToClear = ["google-sdks-events", "google-app-measurement"]
            
            for folder in foldersToClear {
                let folderUrl = cacheDir.appendingPathComponent(folder)
                if fileManager.fileExists(atPath: folderUrl.path) {
                    try? fileManager.removeItem(at: folderUrl)
                    print("DEBUG: Cleared internal Firebase storage at: \\(folder)")
                }
            }
        }
    }

Comment on lines +13 to +14
@Observable
class ModelData {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ModelData class is not isolated to @MainActor, but its properties (such as itineraryPlanners) are accessed and mutated from the main actor (UI). This is a data race hazard in Swift 6. Marking the entire class as @MainActor ensures thread safety and conforms to Swift 6 concurrency requirements.

Suggested change
@Observable
class ModelData {
@Observable
@MainActor
class ModelData {

// 3. Handshake and Processing
state = .connected
await sendInitialGreeting(session: session)
try await startProcessingResponses()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When startProcessingResponses() returns normally (e.g., when the connection is closed by the server), the connect() method exits without updating the state to .disconnected. This causes the UI to still show "Live" or "Connected" even though the session is dead. You should call await disconnect() when the processing loop finishes.

Suggested change
try await startProcessingResponses()
try await startProcessingResponses()
await disconnect()

let startTime = Date()
Logging.general.log("FindPointsOfInterestMapsTool: call called with query: \(arguments.naturalLanguageQuery), category: \(arguments.pointOfInterest.rawValue)")

let model = FirebaseAI.firebaseAI(backend: .vertexAI(location: "global"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PR description states that the backend was switched to europe-west1 (Belgium) for lower latency in Europe. However, FindPointsOfInterestMapsTool still uses "global" as the location for Vertex AI. This should be updated to "europe-west1" to maintain consistency and optimize latency.

Suggested change
let model = FirebaseAI.firebaseAI(backend: .vertexAI(location: "global"))
let model = FirebaseAI.firebaseAI(backend: .vertexAI(location: "europe-west1"))

Comment on lines +29 to +32
@ObservationIgnored
nonisolated(unsafe) private var isExtractingFrames = false
@ObservationIgnored
nonisolated(unsafe) private var lastExtractedFrame: UIImage?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The private variable isExtractingFrames is written to but never read, making it dead code. Additionally, since CameraManager is @MainActor, lastExtractedFrame is already isolated to the main actor and does not need the nonisolated(unsafe) annotation. Please remove isExtractingFrames and its writes (on lines 129 and 136), and remove the unsafe annotation from lastExtractedFrame.

Suggested change
@ObservationIgnored
nonisolated(unsafe) private var isExtractingFrames = false
@ObservationIgnored
nonisolated(unsafe) private var lastExtractedFrame: UIImage?
@ObservationIgnored
private var lastExtractedFrame: UIImage?

Comment on lines +157 to +174
Rectangle()
.fill(Color.clear)
.frame(height: 200)
.overlay(
AsyncImage(url: url) { image in
image
.resizable()
.scaledToFill()
} placeholder: {
ProgressView("Loading image...")
.frame(maxWidth: .infinity, maxHeight: .infinity)
.background(Color.gray.opacity(0.1))
}
)
.clipped()
.clipShape(RoundedRectangle(cornerRadius: 12))
.padding([.horizontal, .bottom], 4)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Wrapping AsyncImage in an .overlay of a clear Rectangle with a fixed height is an anti-pattern in SwiftUI that unnecessarily complicates the view hierarchy. You can apply the frame, clipping, and padding directly to the AsyncImage component.

                                AsyncImage(url: url) {
                                    image in
                                    image
                                        .resizable()
                                        .scaledToFill()
                                } placeholder: {
                                    ProgressView("Loading image...")
                                        .frame(maxWidth: .infinity, maxHeight: .infinity)
                                        .background(Color.gray.opacity(0.1))
                                }
                                .frame(height: 200)
                                .clipped()
                                .clipShape(RoundedRectangle(cornerRadius: 12))
                                .padding([.horizontal, .bottom], 4)
                            }

Comment on lines +130 to +134
if FirebaseApp.app() == nil {
try await Task.sleep(nanoseconds: 1_000_000_000)
guard self.connectionId == currentId else { throw ApplicationError("Cancelled") }
throw ApplicationError("Firebase missing GoogleService-Info.plist")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The 1-second sleep in checkFirebaseConfiguration is redundant and unnecessarily delays error reporting when Firebase is not configured. The error should be thrown immediately.

        if FirebaseApp.app() == nil {
            throw ApplicationError("Firebase missing GoogleService-Info.plist")
        }

@peterfriese peterfriese changed the title Optimize Live Itinerary for performance and stability Add Landmarks app: A multimodal trip planner using Gemini Live Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant