Fix TUI bugs and UX issues from real hardware testing by eshork · Pull Request #15 · Rightbracket/NeuralDrive

eshork · 2026-04-24T02:39:28Z

Summary

Addresses 27 user-reported bugs and UX issues discovered during live testing on an RTX 3080 system booting NeuralDrive from USB, plus a critical GPU acceleration bug where Ollama was silently falling back to CPU-only inference. Includes comprehensive documentation updates across 17 files to reflect all implementation changes. All code changes have been deployed to the live system and verified on hardware.

Commits

Commit	Description
`6ecd510`	Fix TUI bugs and UX issues from real hardware testing (15 files, +1476/-174)
`334ef93`	Show model metadata and fix button visibility in model list
`927df1a`	Save API key to persistent disk alongside overlay
`a1e82c4`	Add live clock to dashboard top-right corner
`bdfd694`	Fix chat screen layout and text wrapping
`d0f1ca8`	Add model delete and fix chat model persistence
`ff34cab`	Harden wizard finalization, add --wizard flag, and Enter-to-pull
`fbac7b6`	Harden partition detection, wizard source of truth, and subprocess error checking
`6efe83a`	Move partition snapshot before mkpart to prevent race condition
`5534f54`	Harden partition creation safety and boot device detection
`c0e802c`	Guard pull button and Enter against concurrent submissions
`b0d8a88`	Remove dual wizard marker, check all subprocess returns, normalize live-media path, guard _pull_next
`64a9514`	Fall through to findmnt when live-media PKNAME fails
`b1003b1`	Fix Header crash on screen transitions and simplify --wizard flag
`493fe4e`	Fix GPU acceleration: load nvidia-uvm at boot and remove cgroup device filter
`5e1d376`	Escape Rich markup in [GPU]/[CPU] tags so they render visibly
`5f8908e`	Add arrow-key navigation with scroll-follow to installed models list
`b555d1f`	Unify models screen focus: zone-based Tab, arrow-key list+button nav
`3c12f7d`	Models screen: skip disabled buttons, Loading... feedback, column legend
`8064d81`	Restore _unload_from_vram and add legend column separators
`fe0a28f`	Fix unload race condition and keep manually loaded models in VRAM
`4bc2f32`	Fix keep_alive: pass integer -1 instead of string
`78fbc0d`	Redesign services screen to match models screen UX
`c8e3a71`	Remap screen hotkeys to F1-F5: Dash, Models, Svc, Logs, Chat
`ea9fcfc`	Guard service poll timer against widget rebuild race
`2a704c6`	Allow concurrent model loading and persist Ollama config
`37d0330`	Widen services Restart button to fit label
`2160fa8`	Create webui data directory on persistence partition
`5dec79c`	Update documentation to reflect TUI redesign, GPU fixes, and config changes
`c621827`	Mark VRAM-loaded models with * in chat selector and retain input focus

Issues Addressed

Closes #6, #8, #9, #10, #11, #13. References #4, #5, #7, #12, #14.

TUI Changes

Navigation Overhaul

Replaced single-letter hotkeys with F1-F5 function keys (Dashboard, Models, Services, Logs, Chat)
Implemented zone-based Tab navigation within screens
Arrow key navigation for lists and per-item action buttons
Enter key activates focused elements
Removed command palette and all hidden hotkeys

Models Screen (complete redesign)

Three-zone layout: installed models list, browse catalog, pull-by-name
Inline Load/Unload/Delete buttons per model with Left/Right arrow navigation
Column legend with metadata: Params / Quant / Disk / VRAM / Status
VRAM usage cache persisted to /var/lib/neuraldrive/config/
Download progress bar with cancel support
Loading... feedback and disabled button skip logic
Unload race condition fix (poll /api/ps until confirmed)
keep_alive: -1 (integer) for infinite retention on manual loads

Services Screen (complete redesign)

ServiceItem widget with inline Start/Stop/Restart buttons (colored: green/red/amber)
Arrow key navigation matching models screen behavior
Auto-poll every 5 seconds with _loading guard against widget rebuild race

Chat Screen

Model selector dropdown with persistence across screen switches
VRAM-loaded models marked with * prefix in selector, refreshed every 10 seconds
Input focus retained after sending messages (no re-click needed)
Streaming responses via @work(exclusive=True)

Dashboard

Live system clock in upper-right corner
GPU/CPU tags ([GPU]/[CPU]) rendered correctly (escaped Rich markup)

Wizard & First Boot

Correct 6-step flow: Welcome → Storage → Security → Network → Models → Done
Creates persistence directories including /var/lib/neuraldrive/webui/
--wizard CLI flag to force re-run
Sentinel file: /etc/neuraldrive/first-boot-complete

Reliability

SafeHeader widget catches Textual Header NoMatches bug (#4258)
Crash dumps written to /var/lib/neuraldrive/logs/tui-crash-*.log
Screenshots saved to /var/lib/neuraldrive/screenshots/

GPU / System Changes

Critical: GPU Acceleration Fix

nvidia-uvm: Added modprobe nvidia-current-uvm + nvidia-modprobe -u as ExecStartPre in Ollama service and /etc/modules-load.d/nvidia-uvm.conf for boot-time loading
DeviceAllow removed: cgroup v2 eBPF device filters blocked CUDA even with explicit allow rules; removed all DeviceAllow from Ollama service, kept PrivateDevices=no
Result: Ollama now uses GPU (was silently CPU-only before)

Ollama Configuration

OLLAMA_MAX_LOADED_MODELS=0 (auto, was 1) — concurrent model loading with LRU eviction
Persistent config override via EnvironmentFile=-/var/lib/neuraldrive/config/ollama.conf
API key synced to persistent disk alongside overlay

Documentation Updates (17 files)

User Guide

TUI pages (5 files): Complete rewrite of models, services, chat, dashboard, and main TUI docs with F1-F5 hotkeys, zone-based navigation, and accurate interface descriptions
First boot: Corrected wizard steps, sentinel file path, --wizard flag
Config/Performance/Recommendations (3 files): Updated OLLAMA_MAX_LOADED_MODELS to 0 (auto), added persistent config override docs, expanded config inventory
Services reference: Added GPU access note for Ollama
Troubleshooting (2 files): Added nvidia-uvm and cgroup v2 GPU troubleshooting, updated concurrent model support

Developer Guide

TUI component: Added chat screen, custom widgets (SafeHeader, ServiceItem, ModelItem), crash dumps, F1-F5 nav
Ollama component: DeviceAllow removal, persistent EnvironmentFile, nvidia-uvm ExecStartPre, API usage details
First-boot wizard: Corrected trigger mechanism (TUI, not systemd service), sentinel path, wizard steps
GPU detection: nvidia-current-uvm Debian naming, device node creation, cgroup v2 note
Security architecture: DeviceAllow removal explanation with cgroup v2 eBPF context

Testing

All TUI changes deployed and verified on live hardware (RTX 3080, 500GB USB, Debian 12, kernel 6.1, Ollama 0.21.1). GPU acceleration confirmed: inference compute: CUDA compute=8.6, NVIDIA GeForce RTX 3080, 10.0 GiB available.

Addresses 27 user-reported issues from live testing on an RTX 3080 system booting from USB. All changes deployed and verified on hardware. Crash handling: - Override App._handle_exception() to capture Textual runtime crashes - Write crash dumps to persistent disk (/var/lib/neuraldrive/logs/) - Screenshots routed to persistent disk via TEXTUAL_SCREENSHOT_LOCATION - Outer try/except in __main__ catches startup crashes Chat screen: - Fix TypeError from RichLog.write(end='') — removed invalid param - Move streaming response to @work(exclusive=True) to unblock UI - Add on_screen_resume to refresh model list on every screen visit - Add model selector (Select widget) on dedicated row with amber border Models screen: - Rewrite catalog with two-zone keyboard navigation (list + buttons) - Arrow keys navigate, Enter/Space toggle, PgUp/PgDn page jump - Add download cancel button with worker cancellation - Handle asyncio.CancelledError in _start_pull - Add model load/unload via Ollama generate API (keep_alive) - Show both Load and Unload buttons per model (disable irrelevant one) - Fix ModelItem._size/_name collision with Textual Widget internals Services screen: - Fix DuplicateIds crash: await remove_children() before mounting - Use sudo systemctl for service start/stop/restart - Arrow-key service selection with yellow highlight - Use Binding() objects for show/priority params (not 4-element tuples) Dashboard: - Expand GPU StatsBox to show Device, VRAM, Temp, Utilization - Rename 'Loaded Models' to 'Active Models (VRAM)' Wizard: - Rewrite _create_persistence_partition(): fix parted start position, detect actual free space, immediate mount, correct Ollama dirs, proper ownership, restart Ollama after partition creation - Add YAML config persistence (persistent disk with overlay fallback) Navigation: - Replace single-letter hotkeys with F2-F6 function keys (priority=True) - Remove old silent hotkeys entirely - Disable command palette via ENABLE_COMMAND_PALETTE=False (COMMAND_PALETTE_BINDING=None crashes Textual 8.2.4) Security: - Add scoped NOPASSWD sudoers (/etc/sudoers.d/neuraldrive-tui) that survives wizard _finalize() stripping NOPASSWD from neuraldrive-admin - Covers systemctl, parted, mkfs, mount, chpasswd, and file ops New files: - utils/config.py: YAML config read/write with persistent/overlay fallback - utils/hardware.py: Boot device detection, partition enumeration - etc/sudoers.d/neuraldrive-tui: Scoped NOPASSWD rules for TUI ops - dev-reset.sh: Development reset script (password, NOPASSWD, sentinel) Build: - Add pyyaml to TUI venv dependencies - Set neuraldrive-tui sudoers permissions in build hook

Display parameter count, quantization level, disk size, and VRAM usage for each installed model. VRAM is cached to persistent config on first load so it remains visible after unloading. Fix model-item height (3->5) so Load/Unload buttons render inside the bordered container instead of being clipped. Show both buttons per model with the irrelevant one disabled. Add disabled button styles.

Write api.key and credentials.conf to both /etc/neuraldrive/ (overlay) and /var/lib/neuraldrive/config/ (persistent disk) when available. Update wizard completion text to show where the key is stored instead of telling the user to save it manually.

Updates every 2 seconds alongside the system stats refresh. Shows HH:MM:SS so the user can tell at a glance the dashboard is live.

- Compact model selector into horizontal row with inline label - Remove clipping on Select widget (border removed, height auto) - Enable text wrapping in chat log (wrap=True on RichLog) - Remove dock:bottom on input row to prevent footer collision - Center Send button label vertically

Save Select value before refreshing options list, restore it if the model is still available. Falls back to first model only when previous selection is no longer present.

- Add red Delete button to each installed model item - Auto-unload from VRAM before deleting if model is loaded - Fix httpx DELETE with json body (use client.request instead) - Preserve selected chat model when returning to chat screen

- Gate sentinel write behind errors check: sentinel is only written after config.save() and all prior writes succeed, preventing the wizard from being silently skipped after partial failures - Guard partition detection: reject if lsblk returns base device instead of new partition, preventing accidental whole-disk format - Add --wizard CLI flag to force wizard rerun on demand - Add on_input_submitted to ModelsScreen so Enter in the pull-input field triggers model download

…ror checking - Launcher now forwards "$@" so neuraldrive-tui --wizard works - Partition detection uses before/after diff instead of fragile last-line - Wizard completion uses sentinel file as single source of truth - config.save() and wizard._sudo_write() check all subprocess return codes

lsblk before-snapshot was taken after mkpart, which could show the new partition if the kernel auto-detected the table change. Snapshot now taken before mkpart so the diff is always reliable.

- Abort before mkpart if pre-lsblk snapshot fails (no disk mutation without a valid baseline) - Check partprobe return code; poll lsblk with bounded retry loop instead of fixed sleep(2) - Replace fragile regex in get_boot_device() with lsblk PKNAME (supports NVMe, MMC, and sd devices) - Guard Enter-to-pull against re-submission during active download

Set _pulling=True immediately in both user-facing entry points before scheduling the @work worker, closing the race window. Pull button handler now mirrors the Enter-to-pull guard.

…ve-media path, guard _pull_next - Remove wizard_complete config key write from wizard finalize; sentinel file is now the single source of truth for wizard completion - Remove unused wizard_complete() function from config.py - Check return codes for all subprocess calls in partition creation: mkdir, chown, umount, systemctl (warning-only for restart) - Normalize live-media= cmdline path through lsblk PKNAME for NVMe/MMC - Set _pulling=True in _pull_next() before _start_pull() to prevent concurrent pull submissions from all entry points

Instead of returning the raw live-media= partition path when lsblk PKNAME resolution fails, fall through to the findmnt detection path. This prevents handing an unvalidated partition/symlink path to the storage wizard for partition creation.

Replace Textual's Header with SafeHeader subclass that catches NoMatches during title watcher updates. Textual 8.2.4 only catches NoScreen in the set_title watcher but not NoMatches, causing crashes when screens are pushed/popped and HeaderTitle hasn't recomposed yet. This is a known upstream bug (Textualize/textual#4258, PR #4817). Simplify --wizard: instead of a separate force_wizard constructor flag, --wizard now removes the sentinel file before launch so the existing on_mount check triggers the wizard naturally.

…e filter - Add ExecStartPre to load nvidia-current-uvm module and create /dev/nvidia-uvm device nodes before Ollama starts (with - prefix for non-fatal failure on non-NVIDIA systems) - Remove DeviceAllow lines that blocked CUDA access under cgroup v2 - Add nvidia-modprobe to NVIDIA package list for device node creation - Add /etc/modules-load.d/nvidia-uvm.conf for early boot module load - Show [GPU]/[CPU] tags with VRAM usage per model on dashboard

Rich interprets [GPU] and [CPU] as style tags and silently drops them. Escape with backslash-bracket on dashboard. Also change model_item status from 'VRAM' to 'GPU' for consistency.

Up/Down/PgUp/PgDn navigate between model items with a yellow highlight border. The scroll container follows the highlighted item via scroll_visible(), matching the catalog popup behavior.

Tab cycles between zones: model list, Browse button, Pull input, Pull button. Within the model list zone, Up/Down navigates models with scroll-follow, Left/Right selects Load/Unload/Delete per model, Enter activates the selected button. All ModelItem buttons are non-focusable — navigation is fully managed by the screen.

Left/Right nav now skips disabled buttons (Unload when not loaded, Load when already loaded). Load button shows 'Loading...' and disables during VRAM load. Added column header row (Params, Quant, Disk, VRAM, Status) aligned with model item columns.

Poll /api/ps after unload until model is actually evicted (Ollama returns 200 before eviction completes). Await remove_children() to prevent stale widgets. Use keep_alive=-1 for manual loads so models stay loaded until explicitly unloaded.

Ollama rejects "-1" with 'missing unit in duration', but accepts the integer -1 for infinite keep-alive.

Each service gets its own row with inline Start/Stop/Restart buttons. Arrow keys navigate services (Up/Down) and buttons (Left/Right). Disabled buttons are skipped. Enter activates the highlighted button. Service status auto-polls every 5 seconds and updates in place.

Poll fires every 5s but _load_services clears and remounts items. Skip poll while _loading flag is set to avoid NoMatches on .svc-state.

Set OLLAMA_MAX_LOADED_MODELS=0 (auto) so Ollama manages concurrency based on available VRAM. Add persistent EnvironmentFile override so config on /var/lib/neuraldrive/config/ollama.conf survives reboots, falling back to baked-in defaults when persistent disk is unavailable.

Wizard was missing /var/lib/neuraldrive/webui from the directory list, causing systemd NAMESPACE failure (status=226) when ReadWritePaths referenced the missing path.

…hanges Rewrite 17 docs files across user-guide and dev-guide to match the current implementation after the TUI UX overhaul, GPU/VRAM fixes, and Ollama configuration changes. Key updates: - Replace old single-letter hotkeys with F1-F5 function key nav - Rewrite models and services screen docs for zone-based navigation - Correct first-boot wizard steps, sentinel file path, and --wizard flag - Update OLLAMA_MAX_LOADED_MODELS from 1 to 0 (auto/LRU eviction) - Document DeviceAllow removal (cgroup v2 eBPF incompatibility) - Document nvidia-current-uvm module naming and boot-time loading - Add nvidia-uvm and cgroup v2 GPU troubleshooting sections - Add persistent config override (EnvironmentFile) documentation - Document crash dump logging, VRAM cache, and chat model selector

Chat model dropdown now prefixes loaded models with * so users can see which models are ready without loading delay. A 10-second poll timer keeps the indicators current as models load/unload. Input focus is restored after each response completes so users can type follow-up messages without re-clicking the input box.

eshork added 30 commits April 23, 2026 22:38

Add live clock to dashboard top-right corner

a1e82c4

Updates every 2 seconds alongside the system stats refresh. Shows HH:MM:SS so the user can tell at a glance the dashboard is live.

Preserve selected model when returning to chat screen

b9edad1

Save Select value before refreshing options list, restore it if the model is still available. Falls back to first model only when previous selection is no longer present.

Add model delete and fix chat model persistence

d0f1ca8

- Add red Delete button to each installed model item - Auto-unload from VRAM before deleting if model is loaded - Fix httpx DELETE with json body (use client.request instead) - Preserve selected chat model when returning to chat screen

Move partition snapshot before mkpart to prevent race condition

6efe83a

lsblk before-snapshot was taken after mkpart, which could show the new partition if the kernel auto-detected the table change. Snapshot now taken before mkpart so the diff is always reliable.

Guard pull button and Enter against concurrent submissions

c0e802c

Set _pulling=True immediately in both user-facing entry points before scheduling the @work worker, closing the race window. Pull button handler now mirrors the Enter-to-pull guard.

Escape Rich markup in [GPU]/[CPU] tags so they render visibly

5e1d376

Rich interprets [GPU] and [CPU] as style tags and silently drops them. Escape with backslash-bracket on dashboard. Also change model_item status from 'VRAM' to 'GPU' for consistency.

Add arrow-key navigation with scroll-follow to installed models list

5f8908e

Up/Down/PgUp/PgDn navigate between model items with a yellow highlight border. The scroll container follows the highlighted item via scroll_visible(), matching the catalog popup behavior.

Restore _unload_from_vram and add legend column separators

8064d81

Fix keep_alive: pass integer -1 instead of string

4bc2f32

Ollama rejects "-1" with 'missing unit in duration', but accepts the integer -1 for infinite keep-alive.

Remap screen hotkeys to F1-F5: Dash, Models, Svc, Logs, Chat

c8e3a71

Guard service poll timer against widget rebuild race

ea9fcfc

Poll fires every 5s but _load_services clears and remounts items. Skip poll while _loading flag is set to avoid NoMatches on .svc-state.

Widen services Restart button to fit label

37d0330

Create webui data directory on persistence partition

2160fa8

Wizard was missing /var/lib/neuraldrive/webui from the directory list, causing systemd NAMESPACE failure (status=226) when ReadWritePaths referenced the missing path.

fix git urls

7692b40

eshork added 2 commits April 24, 2026 11:18

eshork merged commit da792ef into main Apr 24, 2026
2 checks passed

eshork deleted the tui-fixes branch April 24, 2026 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TUI bugs and UX issues from real hardware testing#15

Fix TUI bugs and UX issues from real hardware testing#15
eshork merged 32 commits intomainfrom
tui-fixes

eshork commented Apr 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eshork commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Issues Addressed

TUI Changes

Navigation Overhaul

Models Screen (complete redesign)

Services Screen (complete redesign)

Chat Screen

Dashboard

Wizard & First Boot

Reliability

GPU / System Changes

Critical: GPU Acceleration Fix

Ollama Configuration

Documentation Updates (17 files)

User Guide

Developer Guide

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eshork commented Apr 24, 2026 •

edited

Loading