Fix/android anr advert throttle#146
Open
1technophile wants to merge 3 commits into
Open
Conversation
bleDevice_updated() decodes and updates the model synchronously on the Qt GUI thread for every advertisement the OS delivers. In a dense BLE environment the flood of queued deviceUpdated events keeps the GUI thread out of its event loop past Android's 5 s input-dispatch deadline, so the IME's blocking getExtractedText() round-trip (QtInputConnection) can't be serviced and the app ANRs -- the field signature is [libQt6Core] QMetaMethodInvoker::invokeImpl on the main thread, with the GUI thread (qtMainLoopThread) stuck in TheengsDecoder::decodeBLEJson. Coalesce adverts to at most one decode per device per second so the per-device processing rate -- and thus the GUI-thread backlog -- is bounded at the source. The first advert from a device always passes (new devices still appear immediately); the nearby/RSSI-finder feature uses a separate path (updateNearbyBleDevice) and is unaffected. A small opportunistic prune keeps the throttle map bounded when devices rotate random MACs. Also hoist the per-iteration TheengsDecoder construction to one instance per call -- it holds only stable config, so reuse is safe and avoids reconstructing it on every loop iteration. Validated on an LG H930: per-MAC processing dropped from ~2-3/s to <=1/s while decode -> display stayed intact (thermometers/hygrometers/curtains all decoded with correct values). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On Android the threaded Qt Quick render loop races the SurfaceView lifecycle across background/foreground transitions. When eglCreateWindowSurface fails mid-transition, QAndroidPlatformOpenGLWindow::ensureEglSurfaceCreated() calls qFatal() and the app aborts. Observed as 1.5.0 field crashes on Android 16, stack: QRhi::beginFrame -> QOpenGLContext::makeCurrent -> qtforandroid -> QMessageLogger::fatal -> abort. The qFatal is still present upstream in qtbase v6.10.3, the 6.10 branch and dev (tracked loosely by the unresolved QTBUG-142195), so bumping Qt does not help. Forcing QSG_RENDER_LOOP=basic moves rendering onto the GUI thread, serialized with Android surface events, removing the race. An explicit QSG_RENDER_LOOP from the environment is still respected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add lightweight, always-on diagnostics so the per-device throttle's effect and the GUI-thread (qtMainLoopThread) responsiveness are readable on any device -- the only way to observe the throttle's benefit, which manifests under field BLE density a single-radio bench can't synthesize. - m_advert_recv / m_advert_throttled: adverts reaching the throttle vs dropped by it -> the drop rate (how often the throttle actually binds). - A 250 ms watchdog QTimer on the GUI thread: its tick lateness == how long the event loop was blocked; the max is the ANR-relevant stall (a blocked loop can't service the IME's BlockingQueued getExtractedText round-trip). - Logged as "[ble-perf] ..." ~every 60 s (captured by logcat / the debug-log-share feature), plus bleThrottleStats()/resetBleStats() Q_INVOKABLEs for a debug screen. Validated on an LG H930 in ambient BLE: recv=317 throttled=99 (31.2%), 29 tracked MACs, max GUI-loop stall 1211 ms -- i.e. the throttle shed ~1/3 of advert-processing load, the first on-device evidence of it binding. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Summary
Two Android stability fixes for crash clusters seen in 1.5.0, plus diagnostics to measure them. All BLE-advert processing and Android rendering related; validated on two physical phones.
8b8c440)makeCurrentabort — force the basic render loop on Android (460cca8)9b0d10f)Reviewable commit-by-commit.
1.
fix(android): throttle per-device BLE adverts to prevent IME ANRSymptom: ANR with
main(Android UI) thread stackhandleWindowFocusChanged → IME startInput → QtEditText.onCreateInputConnection → QtInputConnection.getExtractedText → QMetaObject::invokeMethodImpl → QMetaMethodInvoker::invokeImpl → futex.Root cause:
getExtractedText()is aQt::BlockingQueuedConnectionround-trip from the Android UI thread into the Qt GUI thread (qtMainLoopThread).bleDevice_updated()decodes + updates the model synchronously on that GUI thread for everyadvertisement the OS delivers. In a dense BLE field the flood of queued
deviceUpdatedevents keeps the GUI thread out of its event loop; when a focused text field then triggers the IME round-trip and the GUI thread can't service it within Android's5 s input-dispatch deadline, the app ANRs.
Fix: coalesce adverts to at most one decode per device per second, so the per-device processing rate — and thus the GUI-thread backlog — is bounded at the source. The first advert from a device always passes (new devices appear immediately); the
nearby/RSSI-finder path (
updateNearbyBleDevice) is separate and unaffected; a small prune keeps the map bounded against MAC-rotating devices. Also hoists the per-iterationTheengsDecoderconstruction to one instance per call.2.
fix(android): force basic render loop to avoid EGL makeCurrent abortSymptom: hard
abort()withQRhi::beginFrame → QOpenGLContext::makeCurrent → libqtforandroid → QMessageLogger::fatal → aborton a render thread (1.5.0 field crashes, Android 16).Root cause: the threaded Qt Quick render loop races the Android
SurfaceViewlifecycle across background/foreground transitions. WheneglCreateWindowSurfacefails mid-transition,QAndroidPlatformOpenGLWindow::ensureEglSurfaceCreated()callsqFatal(). Still unfixed upstream (present in qtbase v6.10.3 / 6.10 / dev; tracked loosely by QTBUG-142195), so a Qt bump doesn't help. Theengs is unusually exposed because its foreground BLE-scan service keeps the process alive across many surfacedestroy/recreate cycles.
Fix:
qputenv("QSG_RENDER_LOOP", "basic")on Android (only if unset) beforeQGuiApplication, moving rendering onto the GUI thread serialized with surface events — removing the race. An explicit override is still respected.Trade-off: the basic loop renders on the GUI thread, so heavy scenes can show more frame jank than the threaded loop. Accepted in exchange for eliminating the abort; no ANR observed in testing.
3.
feat(android): instrument BLE advert throttle + GUI-thread stallLightweight, always-on diagnostics (the throttle's benefit only manifests under field BLE density, which a single-radio test bench can't synthesize):
m_advert_recv/m_advert_throttled→ drop rate (how often the throttle actually binds).QTimeron the GUI thread → max event-loop stall (ms), the ANR proxy (a blocked loop can't service the IME round-trip).[ble-perf] adverts recv=… throttled=… (…%) | tracked MACs=… | max GUI-loop stall=… ms~every 60 s (captured by logcat / debug-log-share), plusbleThrottleStats()/resetBleStats()Q_INVOKABLEs for a debug screen.Validation (physical devices)
qFatal/abort; race caught non-fatally (makeCurrent(): no EGLSurfacewarnings)The throttle sheds ~⅓–½ of advert-processing load in normal ambient BLE (dozens of devices) — confirming it's active in everyday conditions, not just pathological density. The decode matrix is unaffected (the throttle changes processing rate, not
what decodes).
Checklist: