On-demand model deployment + model display/selector redesign#116
On-demand model deployment + model display/selector redesign#116AdamBelfki3 wants to merge 26 commits into
Conversation
…e UI design + add model status to api query
… on the landing page and the workspace page
… high level ui component implementation
Track NDIF's currently-deployed models in a catalog refreshed on each /models poll, backed by a disk-cached HuggingFace metadata layer and LRU eviction of non-pinned model wrappers. Surface a per-model heat (hot/warm/deploying/cold) to the frontend, including a deploying state derived from NDIF's application_state.
Centralize deployment heat (hot/warm/deploying/cold/gated/...) with runnable/cold/deploying checks. The workspace model picker now offers only models that are ready to run.
Add a navigation-surviving store that warms a cold model with a throwaway generation and polls until it has served a request. A deployed model is forced to read as hot in the models query, since neither the backend nor NDIF can be made to bust their heat caches on demand.
Clicking a model card opens a tool + workspace picker; cold models deploy first, already-deployed models open straight into a chart. Cards carry their deployment heat, and signed-out visitors see cold models as gated. Share the tool/workspace selectors with the landing page.
While a chart's model is deploying (or cold), the chart shows a deploying panel instead of its controls and visualization; a saved chart with data stays visible read-only. Opening a model creates an empty chart of the chosen tool.
Brings in the Playwright E2E suite (#113) and preview-deploy CI infra. All conflicts came from PR 113's lint/formatting touching files this branch refactored; resolved in favor of this branch's logic, then re-ran prettier: - state.py: metadata fetching moved to metadata.py with param-threshold gating, so main's fetch_model_metadata tweaks no longer apply. - LandingPage, activation-patching/lens2 areas + controls: kept the useToolArea refactor and shared selectors. - AutoWorkspaceCreator, workbench page: kept the deploy + sign-in wiring. - modelsApi: kept the hot-status override alongside main's credentials:include.
- AutoWorkspaceCreator created charts via the raw server actions without invalidating the sidebar query, so a newly created chart had no sidebar card and was unreachable after navigating away. Invalidate charts.sidebar after creation. - The warmup POST was missing credentials:include (added to the other API calls in #113), so in the cross-origin preview env an auth gateway could answer with a 200 and no job_id, which the store treated as "deployed". Send the session cookie, and treat a 200 with neither a job id nor a local result as a failure rather than a false success.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
More reviews will be available in 51 minutes and 18 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (59)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
These files predated the repo's prettier enforcement (added in #113) and were failing the format:check gate. No logic changes.
|
🚀 Preview deployed
|
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
Summary
Adds on-demand deployment of cold models and a redesigned, status-aware model browsing/selection experience, end to end.
Backend
/modelspoll, backed by a disk-cached HuggingFace metadata layer (metadata.py) and LRU eviction of non-pinned model wrappers.hot/warm/deploying/cold) to the frontend, withdeployingderived from NDIF'sapplication_state. Gated access is driven by a parameter-count threshold.Frontend
ModelControl, with a sharedModelPopover/pill design between the landing page and workspace, and a runnable-only picker.Notable fixes
credentials: "include") and treats a200with no job id as a failure rather than a false "deployed".Merge
main(Playwright E2E suite E2E CI Testing with Playwright and Argos #113 + preview-deploy CI infra). Conflicts were all PR-113 lint/formatting over files this branch refactored, resolved in favor of this branch's logic.Verification
tsc --noEmitandeslintclean on changed files (remaining tsc errors are pre-existing, hidden byignoreBuildErrors).