Skip to content

Fix: level CANN dlog before rtSetDevice so device logs honor log_level#763

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:fix/level-dlog-before-rtsetdevice
May 13, 2026
Merged

Fix: level CANN dlog before rtSetDevice so device logs honor log_level#763
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:fix/level-dlog-before-rtsetdevice

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

Summary

PR #723 (collapsed ChipWorker::init / set_device into a single
simpler_init) silently flipped the order of attach_current_thread
and the dlog_setlevel block inside simpler_init on both a2a3
and a5 onboard. CANN snapshots the device-side log session's
level at device-context open time (rtSetDevice inside
attach_current_thread), so a dlog_setlevel issued after that
is a no-op for the device side.

Net effect: when ASCEND_GLOBAL_LOG_LEVEL is not set in the
environment, the log_level the user passed to Worker(...) /
configure_logging(...) silently fails to reach the device-side
filter — ~/ascend/log/{debug,run}/device-N/*.log files are either
missing or pinned at CANN's default (logLevel=3 / ERROR).

Pre-#715/#723 the order was correct because init and set_device
were two separate C entries called in the right sequence; #723
merged them and the dlog ordering was silent collateral. Sim has
no CANN dlog and is unaffected.

The fix

Hoist the existing dlog_setlevel block above
attach_current_thread in both onboard simpler_inits. HostLogger
is already seeded by libsimpler_log.so's simpler_log_init()
(runs earlier in ChipWorker::init), so
HostLogger::get_instance().level() is already the user's choice at
this point — no new plumbing needed.

The comment on the hoisted block now explains the rtSetDevice
ordering constraint so this doesn't silently regress again.

Files

  • src/a2a3/platform/onboard/host/pto_runtime_c_api.cpp — hoist
  • src/a5/platform/onboard/host/pto_runtime_c_api.cpp — same hoist
  • src/common/worker/pto_runtime_c_api.hsimpler_init doc
    comment: responsibilities reordered (dlog first), wording explains
    the constraint
  • docs/logging.md, docs/dynamic-linking.md,
    docs/chip-level-arch.md — call-flow diagrams updated to match
    the new order (per .claude/rules/doc-consistency.md)

Sim variants (src/{a2a3,a5}/platform/sim/host/pto_runtime_c_api.cpp)
are untouched — they have no CANN dlog.

Hardware verification

Ascend910 / a2a3 onboard, device 2, tiny driver script
(Worker.initclose) with configure_logging("debug") and
ASCEND_GLOBAL_LOG_LEVEL unset:

first line of ~/ascend/log/run/device-2/device-{pid}_*.log ~/ascend/log/debug/device-2/device-{pid}_*.log
Before (PID 845511) logLevel=3, ccecpulogLevel=-1, aicpulogLevel=-1 not created
After (PID 856602) logLevel=0, ccecpulogLevel=-1, aicpulogLevel=-1 76 KB of DEBUG entries

With ASCEND_GLOBAL_LOG_LEVEL=1 exported (PID 860263), device
shows logLevel=1 regardless of configure_logging("debug")
the getenv guard correctly defers to the env var, so that path
is unchanged.

Existing onboard ST (tests/st/aicore_op_timeout, #762) still
passes in ~8 s after the rebuild.

Test plan

  • Pre-commit hooks (clang-format, cpplint, markdownlint,
    check-headers) pass on all 6 files without SKIP=.
  • Hardware before/after captures recorded above.
  • ASCEND_GLOBAL_LOG_LEVEL-set path unchanged.
  • Existing onboard a2a3 ST still passes (no regression).
  • CI runs the full hardware pipeline.

Fixes the regression introduced by #723.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reorders the initialization sequence in simpler_init to ensure that dlog_setlevel is called before the device context is opened via rtSetDevice. This change is necessary because CANN snapshots the log level at context-open time, making subsequent level changes ineffective for the device-side session. Documentation across several files has been updated to reflect this new order. The review feedback suggests explicitly including the header in the platform-specific implementation files where std::getenv is utilized to ensure portability.

Comment thread src/a2a3/platform/onboard/host/pto_runtime_c_api.cpp
Comment thread src/a5/platform/onboard/host/pto_runtime_c_api.cpp
PR hw-native-sys#723 (collapsed ChipWorker init/set_device into a single
simpler_init) flipped the order of attach_current_thread and the
dlog_setlevel block inside simpler_init on both a2a3 and a5
onboard. CANN snapshots the device-side log session's level at
device-context open time (rtSetDevice inside attach_current_thread),
so a dlog_setlevel issued after that is a no-op for the device
side. Net effect: when ASCEND_GLOBAL_LOG_LEVEL is not set in the
environment, the log_level the user passed to Worker(...) /
configure_logging(...) silently fails to reach the device-side
filter, and ~/ascend/log/{debug,run}/device-N/*.log files are
either missing or pinned at CANN's default (level 3 / ERROR).

Pre-hw-native-sys#715/hw-native-sys#723 the order was correct because init and set_device
were two separate C entries called in the right sequence; hw-native-sys#723
merged them and the dlog ordering was silent collateral. Sim has
no CANN dlog and is unaffected.

The fix: hoist the existing dlog_setlevel block above
attach_current_thread in both onboard simpler_init's. HostLogger
is already seeded by libsimpler_log.so's simpler_log_init() (runs
earlier in ChipWorker::init), so HostLogger::get_instance().level()
is already the user's choice at this point — no new plumbing.

Comment on the hoisted block explains the rtSetDevice ordering
constraint so this doesn't silently regress again. Header doc
(pto_runtime_c_api.h) reorders the three responsibilities and
docs (logging.md, dynamic-linking.md, chip-level-arch.md) update
their call-flow diagrams to match the new order.

Hardware verification on Ascend910 / a2a3 onboard
(ASCEND_GLOBAL_LOG_LEVEL unset, configure_logging("debug")):

  before  ~/ascend/log/run/device-2/device-845511_*.log:
              logLevel=3        (no DEBUG entries, debug/ dir empty)
  after   ~/ascend/log/run/device-2/device-856602_*.log:
              logLevel=0        (76 KB of DEBUG entries)

With ASCEND_GLOBAL_LOG_LEVEL=1 set, device shows logLevel=1
regardless of configure_logging — env-var path unchanged.
Existing onboard ST (tests/st/aicore_op_timeout, PR hw-native-sys#762) still
passes after rebuild.
@hw-native-sys-bot hw-native-sys-bot force-pushed the fix/level-dlog-before-rtsetdevice branch from 8131abf to 82a7da2 Compare May 13, 2026 01:53
@ChaoWao ChaoWao merged commit 479519d into hw-native-sys:main May 13, 2026
14 checks passed
@ChaoWao ChaoWao deleted the fix/level-dlog-before-rtsetdevice branch May 13, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants