Skip to content

Device printf invisible in Jupyter / piped stdout (libc block-buffering of runtime stdout) #653

@jhinpan

Description

@jhinpan

Summary

Device fx.printf (lowered to gpu.printf) output is delayed in Jupyter cells and in any piped/redirected stdout, even after torch.cuda.synchronize(). The output only appears at process teardown or when something explicitly flushes host stdout.

The root cause appears to be host-side libc block-buffering of the ROCm runtime's stdout, not a device-side flush problem: the bytes are delivered on synchronize, but they remain in the C stdio buffer because stdout is fully buffered under Jupyter and under pipes.

This makes interactive debugging awkward because notebooks currently need a file-descriptor capture or flush workaround around launches just to show GPU prints inline.

Repro (MI350X / gfx950, ROCm 7.2)

import torch, flydsl.compiler as flyc, flydsl.expr as fx

@flyc.kernel
def hello_kernel():
    tid = fx.thread_idx.x
    fx.printf("hello from thread {}", tid)

@flyc.jit
def hello(stream: fx.Stream = fx.Stream(None)):
    hello_kernel().launch(grid=(1, 1, 1), block=(4, 1, 1), stream=stream)

hello(); torch.cuda.synchronize()   # notebook / piped stdout: prints nothing immediately

Evidence

Same program with stdout piped, matching the notebook behavior:

  • hello(); torch.cuda.synchronize() produces no immediate output.
  • Immediately calling ctypes.CDLL("libc.so.6").fflush(None) makes the 4 lines appear at once.
  • Running the unchanged program under stdbuf -oL makes the lines appear right after synchronize(), in order.

So synchronize() appears to make the bytes available; they are stuck in the C stdio buffer until a host-side flush.

Suggested fix

Have the FlyDSL runtime make device printf output reach host-visible stdout promptly in notebooks and piped logs, for example:

  • configure runtime stdout as line-buffered during runtime initialization, or
  • flush runtime stdout after kernel launch / synchronization points that expose gpu.printf output.

As a stopgap, launching the kernel process under stdbuf -oL reproduces the desired behavior without code changes.


cc @sjfeng1999 — this is the printf-in-notebook buffering issue we discussed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions