Skip to content

On-demand model deployment + model display/selector redesign#116

Open
AdamBelfki3 wants to merge 26 commits into
mainfrom
on-demand-models
Open

On-demand model deployment + model display/selector redesign#116
AdamBelfki3 wants to merge 26 commits into
mainfrom
on-demand-models

Conversation

@AdamBelfki3
Copy link
Copy Markdown
Member

Summary

Adds on-demand deployment of cold models and a redesigned, status-aware model browsing/selection experience, end to end.

Backend

  • Track NDIF's currently-deployed models in a catalog refreshed on each /models poll, backed by a disk-cached HuggingFace metadata layer (metadata.py) and LRU eviction of non-pinned model wrappers.
  • Surface a per-model heat (hot / warm / deploying / cold) to the frontend, with deploying derived from NDIF's application_state. Gated access is driven by a parameter-count threshold.

Frontend

  • Model display: redesigned filterable model grid with status-aware cards; cold models show their heat and a deploy affordance, and signed-out visitors see cold models as gated.
  • Selectors: unified the model selector + backend status into ModelControl, with a shared ModelPopover/pill design between the landing page and workspace, and a runnable-only picker.
  • Cold-model deployment: clicking a model opens a tool + workspace picker; cold models warm up via a throwaway generation tracked in a navigation-surviving store that polls until the model has served a request, then forces it to read as hot until reload. Already-deployed models open straight into a chart.
  • Chart deploying state: while a chart's model is deploying (or cold) the chart shows a deploying panel instead of its controls/visualization; a saved chart with data stays visible read-only.

Notable fixes

  • A newly created chart now invalidates the sidebar query so its card appears immediately (it was unreachable after navigating away).
  • The warmup request sends the oauth2-proxy session cookie (credentials: "include") and treats a 200 with no job id as a failure rather than a false "deployed".

Merge

  • Includes a merge of main (Playwright E2E suite E2E CI Testing with Playwright and Argos #113 + preview-deploy CI infra). Conflicts were all PR-113 lint/formatting over files this branch refactored, resolved in favor of this branch's logic.

Verification

  • tsc --noEmit and eslint clean on changed files (remaining tsc errors are pre-existing, hidden by ignoreBuildErrors).

Track NDIF's currently-deployed models in a catalog refreshed on each /models poll, backed by a disk-cached HuggingFace metadata layer and LRU eviction of non-pinned model wrappers. Surface a per-model heat (hot/warm/deploying/cold) to the frontend, including a deploying state derived from NDIF's application_state.
Centralize deployment heat (hot/warm/deploying/cold/gated/...) with runnable/cold/deploying checks. The workspace model picker now offers only models that are ready to run.
Add a navigation-surviving store that warms a cold model with a throwaway generation and polls until it has served a request. A deployed model is forced to read as hot in the models query, since neither the backend nor NDIF can be made to bust their heat caches on demand.
Clicking a model card opens a tool + workspace picker; cold models deploy first, already-deployed models open straight into a chart. Cards carry their deployment heat, and signed-out visitors see cold models as gated. Share the tool/workspace selectors with the landing page.
While a chart's model is deploying (or cold), the chart shows a deploying panel instead of its controls and visualization; a saved chart with data stays visible read-only. Opening a model creates an empty chart of the chosen tool.
Brings in the Playwright E2E suite (#113) and preview-deploy CI infra. All conflicts came from PR 113's lint/formatting touching files this branch refactored; resolved in favor of this branch's logic, then re-ran prettier:

- state.py: metadata fetching moved to metadata.py with param-threshold gating, so main's fetch_model_metadata tweaks no longer apply.
- LandingPage, activation-patching/lens2 areas + controls: kept the useToolArea refactor and shared selectors.
- AutoWorkspaceCreator, workbench page: kept the deploy + sign-in wiring.
- modelsApi: kept the hot-status override alongside main's credentials:include.
- AutoWorkspaceCreator created charts via the raw server actions without invalidating the sidebar query, so a newly created chart had no sidebar card and was unreachable after navigating away. Invalidate charts.sidebar after creation.
- The warmup POST was missing credentials:include (added to the other API calls in #113), so in the cross-origin preview env an auth gateway could answer with a 200 and no job_id, which the store treated as "deployed". Send the session cookie, and treat a 200 with neither a job id nor a local result as a failure rather than a false success.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Jun 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workbench Ready Ready Preview, Comment Jun 6, 2026 8:53pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

Warning

Review limit reached

@AdamBelfki3, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 18 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c4ac69dc-0a59-4fc7-966a-629b4562147d

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7c2c8 and 64eba8a.

⛔ Files ignored due to path filters (1)
  • workbench/_web/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (59)
  • .gitignore
  • scripts/api.sh
  • workbench/_api/_metadata_cache.json
  • workbench/_api/_model_configs/dev.toml
  • workbench/_api/_model_configs/local.toml
  • workbench/_api/_model_configs/prod.toml
  • workbench/_api/_model_configs/template.toml
  • workbench/_api/data_models.py
  • workbench/_api/metadata.py
  • workbench/_api/routes/models.py
  • workbench/_api/state.py
  • workbench/_web/src/app/globals.css
  • workbench/_web/src/app/workbench/[workspaceId]/[chartId]/components/lens/LensArea.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/activation-patching/[chartId]/components/ActivationPatchingArea.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/activation-patching/[chartId]/components/ActivationPatchingControls.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/activation-patching/[chartId]/components/ActivationPatchingDisplay.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/activation-patching/[chartId]/page.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/components/ModelDeployingPanel.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/components/ToolPanelHeader.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/layout.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/lens2/[chartId]/components/Lens2Area.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/lens2/[chartId]/components/Lens2Controls.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/lens2/[chartId]/components/Lens2Display.tsx
  • workbench/_web/src/app/workbench/[workspaceId]/lens2/[chartId]/page.tsx
  • workbench/_web/src/app/workbench/components/AutoWorkspaceCreator.tsx
  • workbench/_web/src/app/workbench/components/ModelsDisplay.tsx
  • workbench/_web/src/app/workbench/components/ModelsSectionStateController.tsx
  • workbench/_web/src/app/workbench/components/WorkspaceList.tsx
  • workbench/_web/src/app/workbench/page.tsx
  • workbench/_web/src/components/LandingPage.tsx
  • workbench/_web/src/components/ModelControl.tsx
  • workbench/_web/src/components/ModelSelector.tsx
  • workbench/_web/src/components/WorkbenchStatus.tsx
  • workbench/_web/src/components/charts/ChartModelPill.tsx
  • workbench/_web/src/components/model-selector/ModelPopover.tsx
  • workbench/_web/src/components/model-selector/status.ts
  • workbench/_web/src/components/models/ModelCard.tsx
  • workbench/_web/src/components/models/ModelLaunchDialog.tsx
  • workbench/_web/src/components/models/ModelRowCarousel.tsx
  • workbench/_web/src/components/models/ModelsFetchErrorBanner.tsx
  • workbench/_web/src/components/models/ModelsSection.tsx
  • workbench/_web/src/components/models/ModelsSectionHeader.tsx
  • workbench/_web/src/components/selectors/LaunchSelectors.tsx
  • workbench/_web/src/components/ui/pill-popover.tsx
  • workbench/_web/src/hooks/useBackgroundTokenPair.ts
  • workbench/_web/src/hooks/useBlurTokenizeScheduler.ts
  • workbench/_web/src/hooks/useChartModelReady.ts
  • workbench/_web/src/hooks/useDraftModel.ts
  • workbench/_web/src/hooks/useToolArea.ts
  • workbench/_web/src/lib/api/deployApi.ts
  • workbench/_web/src/lib/api/modelsApi.ts
  • workbench/_web/src/lib/configModelDiff.ts
  • workbench/_web/src/lib/queryKeys.ts
  • workbench/_web/src/stores/useModelDeployment.ts
  • workbench/_web/src/stores/useModelsSection.ts
  • workbench/_web/src/types/deployment.ts
  • workbench/_web/src/types/lens2.ts
  • workbench/_web/src/types/models.ts
  • workbench/_web/tailwind.config.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch on-demand-models

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

These files predated the repo's prettier enforcement (added in #113) and were failing the format:check gate. No logic changes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 6, 2026

🚀 Preview deployed

@argos-ci
Copy link
Copy Markdown

argos-ci Bot commented Jun 6, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ⚠️ Changes detected (Review) 6 removed, 6 failures Jun 6, 2026, 9:22 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant