Skip to content

Fix/android anr advert throttle#146

Open
1technophile wants to merge 3 commits into
developmentfrom
fix/android-anr-advert-throttle
Open

Fix/android anr advert throttle#146
1technophile wants to merge 3 commits into
developmentfrom
fix/android-anr-advert-throttle

Conversation

@1technophile

@1technophile 1technophile commented Jun 22, 2026

Copy link
Copy Markdown
Member

Description:

Summary

Two Android stability fixes for crash clusters seen in 1.5.0, plus diagnostics to measure them. All BLE-advert processing and Android rendering related; validated on two physical phones.

  1. IME ANR — per-device BLE advertisement throttle (8b8c440)
  2. EGL makeCurrent abort — force the basic render loop on Android (460cca8)
  3. Instrumentation — make the throttle's effect + GUI-thread responsiveness readable on any device (9b0d10f)

Reviewable commit-by-commit.


1. fix(android): throttle per-device BLE adverts to prevent IME ANR

Symptom: ANR with main (Android UI) thread stack
handleWindowFocusChanged → IME startInput → QtEditText.onCreateInputConnection → QtInputConnection.getExtractedText → QMetaObject::invokeMethodImpl → QMetaMethodInvoker::invokeImpl → futex.
Root cause: getExtractedText() is a Qt::BlockingQueuedConnection round-trip from the Android UI thread into the Qt GUI thread (qtMainLoopThread). bleDevice_updated() decodes + updates the model synchronously on that GUI thread for every
advertisement the OS delivers
. In a dense BLE field the flood of queued deviceUpdated events keeps the GUI thread out of its event loop; when a focused text field then triggers the IME round-trip and the GUI thread can't service it within Android's
5 s input-dispatch deadline, the app ANRs.

Fix: coalesce adverts to at most one decode per device per second, so the per-device processing rate — and thus the GUI-thread backlog — is bounded at the source. The first advert from a device always passes (new devices appear immediately); the
nearby/RSSI-finder path (updateNearbyBleDevice) is separate and unaffected; a small prune keeps the map bounded against MAC-rotating devices. Also hoists the per-iteration TheengsDecoder construction to one instance per call.

2. fix(android): force basic render loop to avoid EGL makeCurrent abort

Symptom: hard abort() with QRhi::beginFrame → QOpenGLContext::makeCurrent → libqtforandroid → QMessageLogger::fatal → abort on a render thread (1.5.0 field crashes, Android 16).

Root cause: the threaded Qt Quick render loop races the Android SurfaceView lifecycle across background/foreground transitions. When eglCreateWindowSurface fails mid-transition, QAndroidPlatformOpenGLWindow::ensureEglSurfaceCreated() calls
qFatal(). Still unfixed upstream (present in qtbase v6.10.3 / 6.10 / dev; tracked loosely by QTBUG-142195), so a Qt bump doesn't help. Theengs is unusually exposed because its foreground BLE-scan service keeps the process alive across many surface
destroy/recreate cycles.

Fix: qputenv("QSG_RENDER_LOOP", "basic") on Android (only if unset) before QGuiApplication, moving rendering onto the GUI thread serialized with surface events — removing the race. An explicit override is still respected.

Trade-off: the basic loop renders on the GUI thread, so heavy scenes can show more frame jank than the threaded loop. Accepted in exchange for eliminating the abort; no ANR observed in testing.

3. feat(android): instrument BLE advert throttle + GUI-thread stall

Lightweight, always-on diagnostics (the throttle's benefit only manifests under field BLE density, which a single-radio test bench can't synthesize):

  • m_advert_recv / m_advert_throttleddrop rate (how often the throttle actually binds).
  • A 250 ms watchdog QTimer on the GUI thread → max event-loop stall (ms), the ANR proxy (a blocked loop can't service the IME round-trip).
  • Logged as [ble-perf] adverts recv=… throttled=… (…%) | tracked MACs=… | max GUI-loop stall=… ms ~every 60 s (captured by logcat / debug-log-share), plus bleThrottleStats() / resetBleStats() Q_INVOKABLEs for a debug screen.

Validation (physical devices)

LG H930 (Android 9) Galaxy S20 FE (Android 13)
Throttle binds (ambient BLE) recv=317 throttled=99 (31%), 29 MACs recv=1149 throttled=586 (51%), 39 MACs
Max GUI-loop stall 1211 ms 1095 ms
Decode → display ✅ multiple device types, correct values ✅ multiple device types, correct values
EGL surface-cycle stress (20×) n/a ✅ PID stable, no qFatal/abort; race caught non-fatally (makeCurrent(): no EGLSurface warnings)
Render under basic loop ✅ no blank screen

The throttle sheds ~⅓–½ of advert-processing load in normal ambient BLE (dozens of devices) — confirming it's active in everyday conditions, not just pathological density. The decode matrix is unaffected (the throttle changes processing rate, not
what decodes).

Checklist:

  • The pull request is done against the latest development branch
  • Only one feature/fix was added per PR and the code change compiles without warnings
  • I accept the DCO.

1technophile and others added 3 commits June 21, 2026 10:29
bleDevice_updated() decodes and updates the model synchronously on the Qt
GUI thread for every advertisement the OS delivers. In a dense BLE
environment the flood of queued deviceUpdated events keeps the GUI thread
out of its event loop past Android's 5 s input-dispatch deadline, so the
IME's blocking getExtractedText() round-trip (QtInputConnection) can't be
serviced and the app ANRs -- the field signature is
[libQt6Core] QMetaMethodInvoker::invokeImpl on the main thread, with the
GUI thread (qtMainLoopThread) stuck in TheengsDecoder::decodeBLEJson.

Coalesce adverts to at most one decode per device per second so the
per-device processing rate -- and thus the GUI-thread backlog -- is bounded
at the source. The first advert from a device always passes (new devices
still appear immediately); the nearby/RSSI-finder feature uses a separate
path (updateNearbyBleDevice) and is unaffected. A small opportunistic prune
keeps the throttle map bounded when devices rotate random MACs.

Also hoist the per-iteration TheengsDecoder construction to one instance
per call -- it holds only stable config, so reuse is safe and avoids
reconstructing it on every loop iteration.

Validated on an LG H930: per-MAC processing dropped from ~2-3/s to <=1/s
while decode -> display stayed intact (thermometers/hygrometers/curtains
all decoded with correct values).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On Android the threaded Qt Quick render loop races the SurfaceView
lifecycle across background/foreground transitions. When
eglCreateWindowSurface fails mid-transition,
QAndroidPlatformOpenGLWindow::ensureEglSurfaceCreated() calls qFatal()
and the app aborts. Observed as 1.5.0 field crashes on Android 16, stack:
QRhi::beginFrame -> QOpenGLContext::makeCurrent -> qtforandroid ->
QMessageLogger::fatal -> abort.

The qFatal is still present upstream in qtbase v6.10.3, the 6.10 branch
and dev (tracked loosely by the unresolved QTBUG-142195), so bumping Qt
does not help. Forcing QSG_RENDER_LOOP=basic moves rendering onto the GUI
thread, serialized with Android surface events, removing the race. An
explicit QSG_RENDER_LOOP from the environment is still respected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add lightweight, always-on diagnostics so the per-device throttle's effect
and the GUI-thread (qtMainLoopThread) responsiveness are readable on any
device -- the only way to observe the throttle's benefit, which manifests
under field BLE density a single-radio bench can't synthesize.

- m_advert_recv / m_advert_throttled: adverts reaching the throttle vs
  dropped by it -> the drop rate (how often the throttle actually binds).
- A 250 ms watchdog QTimer on the GUI thread: its tick lateness == how long
  the event loop was blocked; the max is the ANR-relevant stall (a blocked
  loop can't service the IME's BlockingQueued getExtractedText round-trip).
- Logged as "[ble-perf] ..." ~every 60 s (captured by logcat / the
  debug-log-share feature), plus bleThrottleStats()/resetBleStats()
  Q_INVOKABLEs for a debug screen.

Validated on an LG H930 in ambient BLE: recv=317 throttled=99 (31.2%),
29 tracked MACs, max GUI-loop stall 1211 ms -- i.e. the throttle shed ~1/3
of advert-processing load, the first on-device evidence of it binding.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant