Skip to content

[pull] master from axboe:master#314

Open
pull[bot] wants to merge 1433 commits into
kubestone:masterfrom
axboe:master
Open

[pull] master from axboe:master#314
pull[bot] wants to merge 1433 commits into
kubestone:masterfrom
axboe:master

Conversation

@pull

@pull pull Bot commented Dec 10, 2021

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

axboe and others added 27 commits August 21, 2025 16:23
This was more of a thought experiment back in the day, but even for
an old interface like libaio on Linux, it does not support canceling
IOs at all. Neither does posixaio. And while cancel support could
get plumbed up to io_uring, since Linux doesn't support canceling
normal IO, then it will never do anything.

Hence it's utterly pointless to have a cancel ops in the IO engine,
and the backend attempts at first reaping done IO and then canceling
the rest is also then pointless.

Just replace the at-exit cancelation with waiting on pending IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
* 'sprandom' of https://github.com/tomas-winkler-sndk/fio:
  sprandom: integrate sprandom_get_next_offset() into io_u path
  sprandom: initialize sprandom for file
  sprandom: implement sprandom_get_next_offset()
  sprandom: initialize random state
  unittests: add pcbuf simple unit test
  sprandom: pcbuf.h add two-phase circular buffer header-only library
  unittests: add bytes2str_simple()
  num2str: add bytes2str_simple()
  sprandom: set up LFSR random generator and disable randommap
  sprandom: implement region computation and invalidation percentage
  sprandom: examples: add sprandom example file
  sprandom: add debug facility
  sprandom: add command line options
Since fio is often used in scripts, refuse to continue and give the user
an opportunity to correct invalid settings when they appear.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
When thread=1 multiple jobs share the same buffer used to expand the %o
format specifier. This can cause verify failures when one thread tries
to verify a buffer using the %o expansion from another thread. This
patch makes sure threads use different buffers.

This is a stop-gap measure to resolve verify failures for now. A better
solution would be to at init time allocate a set of buffers for each job
thread (and verify_async thread since they are vulnerable to the same
issue) and use those buffers instead of doing a malloc/free for each
verify operation.

Fixes: #1845
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Fix the most egregious warnings produced by running
mandoc -W warning,stop ./fio.1 > /dev/null

- Fix usage of .RE when .RS block isn't open
- Stop escaping = as it's not needed
- Fix incorrect usage .TP before .RE as the line following .TP isn't a
  leading tag and it is having no effect
- Fix broken macro sequence and incorrect usage of \fR...\fP which
  should have been \fB...\fP because it's outside of .BI

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Silence mandoc "WARNING: skipping paragraph macro: PP empty" complaints.

This also fixes the over indention of the final paragraph in "Trace file
format v3".

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Update the man page's date to match the date of the last fio release
(fio-3.40) so things look less crufty.

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
* 'fix_mandoc_warnings' of https://github.com/sitsofe/fio:
  man: update date
  man: fix mandoc "PP empty" warnings
  man: fix mandoc lint errors
__SANE_USERSPACE_TYPES__ needs to be defined to get consistent formats on all platforms.

It mostly affects 64-bit architectures (no op on 32 bit) with long long
vs long.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Move the sprandom_init() call to occur before total_io_size is computed,
in order to ensure correctly compute statitics.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Fix a lefover bug, after code changes. Use 'offset' instead of  *b pointer.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Free invalid_pct, fix a memory leak.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
The validity_dist buffer is only needed to compute the invalid_pct
array. Once that is done, we can free it instead of keeping it
around unnecessarily.

Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
This (finally) provides macOS cache invalidation and is heavily based on
code originally provided by DeveloperEcosystemEngineering@apple.

Because posix_fadvise() isn't implemented on macOS,
DeveloperEcosystemEngineering demonstrated that creating a shared
mapping of a file and using using msync([...], MS_INVALIDATE) on it can
be used to discard covered page cache pages instead - ingenious! This
commit uses that technique to create a macOS posix_fadvise([...],
POSIX_FADV_DONTNEED) shim.

To paraphrase commit 8300eba ("windowsaio: add best effort cache
invalidation") that was done for similar reasons:

This change may make default bandwidth speeds on macOS look lower
compared to older versions of fio but this matches the behaviour of fio
on other platforms with invalidation (such as Linux) because we are
trying to avoid measuring cache reuse (unless invalidate=0 is set).

The impact of invalidation is demonstrated by the bandwidths achieved by
the following jobs running on an SSD of an otherwise idle Intel Mac
laptop with 16GBytes of RAM:

./fio --stonewall --size=128M --ioengine=posixaio --filename=fio.tmp \
  --iodepth=64 --bs=4k --direct=0 \
  --name=create --rw=write \
  --name=cached --rw=randread --loops=2 --invalidate=0 \
  --name=invalidated --rw=randread --loops=2 --invalidate=1

[...]
cached: (groupid=1, jobs=1): err= 0: pid=7795: Tue Sep  2 22:34:12 2025
  read: IOPS=228k, BW=889MiB/s (932MB/s)(256MiB/288msec)
[...]
invalidated: (groupid=2, jobs=1): err= 0: pid=7796: Tue Sep  2 22:34:12 2025
  read: IOPS=46.8k, BW=183MiB/s (192MB/s)(256MiB/1399msec)

v2:
- Move platform specific code into its own file under os/mac/
- Don't do prior fsync() because msync([...], MS_INVALIDATE) doesn't
  imply the dropping of dirty pages and will have the same effect

v3:
- Up the mmap chunk size to 16 GBytes to reduce the number of times we
  mmap()/msync()/munmap() on large files
- Align offset and len to the system page size to prevent errors on jobs
  like ./fio --name=n --offset=2k --size=30k
- Try and munmap() if msync() fails
- Make Rosetta comment clearer
- Drop some variables and rename some others
- Don't bother trying to restore errno after displaying an error message
  because posix_fadvise() isn't defined as setting errno

Fixes: #48
Suggested-by: DeveloperEcosystemEngineering <DeveloperEcosystemEngineering@apple.com>
Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
- Add support for POSIX_FADV_NORMAL in the posix_fadvise() shim by just
  ignoring it
- Add support for POSIX_FADV_SEQUENTIAL/POSIX_FADV_RANDOM by mapping
  them to enable/disable of readahead via fcntl(..., F_RDAHEAD, ...).
  Because macOS only lets you control readahead at the descriptor level
  the offset and len values passed will be ignored and range control is
  not done.

The impact of being able to tune readahead is demonstrated by the
bandwidths achieved by the following jobs running on an SSD of an
otherwise idle Intel Mac laptop with 16GBytes of RAM:

./fio --stonewall --size=128M --filename=fio.tmp --bs=4k --rw=read \
  --name=sequential-readahead --fadvise=sequential \
  --name=sequential-no-readahead --fadvise=random

[...]
sequential-readahead: (groupid=0, jobs=1): err= 0: pid=6250: Tue Sep  2 22:10:45 2025
  read: IOPS=331k, BW=1293MiB/s (1356MB/s)(128MiB/99msec)
[...]
sequential-no-readahead: (groupid=1, jobs=1): err= 0: pid=6251: Tue Sep  2 22:10:45 2025
  read: IOPS=25.9k, BW=101MiB/s (106MB/s)(128MiB/1263msec)

rm -f fio-huge.tmp
truncate -s 1T fio-huge.tmp
./fio --stonewall --filename=fio-huge.tmp --bs=32k --runtime=10s --rw=randread:3 \
  --name=partial-random-no-readahead --fadvise=random \
  --name=absorb-cache-invalidation --number_ios=1 --bs=4k \
  --name=partial-random-readahead --fadvise=sequential

[...]
partial-random-no-readahead: (groupid=0, jobs=1): err= 0: pid=6259: Tue Sep  2 22:12:35 2025
  read: IOPS=92.4k, BW=2888MiB/s (3029MB/s)(28.2GiB/10001msec)
[...]
partial-random-readahead: (groupid=2, jobs=1): err= 0: pid=6261: Tue Sep  2 22:12:35 2025
  read: IOPS=61.8k, BW=1931MiB/s (2024MB/s)(18.9GiB/10001msec)

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
…k/fio

* 'sprandom-fixes' of https://github.com/tomas-winkler-sndk/fio:
  sprandom: drop validity_dist after use
  sprandom: free invalid_pct buffer
  sprandom: fix debug printout for offset
  sprandom: setup SPRandom before total_io_size is computed
This array is actually used to calculate invalid_capacity. So wait to
free it until the very end.

Fixes: 8c8e705 ("sprandom: drop validity_dist after use")
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Stop hard coding the supported state version number in t/verify-state.c
and just use VSTATE_HDR_VERSION so we stay in sync with the rest of fio.

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
- Move INVALID_NUMBERIO to the verify-state.h to make it accessible to
  t/verify-state.c
- Make unused inflight I/O slots more obvious

Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
* 'fix_verify-state' of https://github.com/sitsofe/fio:
  t/verify-state: improve verify state inflight output
  t/verify-state: synchronise verify state version
Signed-off-by: Jens Axboe <axboe@kernel.dk>
…r-Ecosystem-Engineering/fio

* 'improve_flushing_darwin' of https://github.com/Developer-Ecosystem-Engineering/fio:
  mac: add readahead control to the posix_fadvise() shim
  mac: implement (file) cache invalidation
This PR fixes an issue in the Makefile. Specifically, previously, any
modifications of files like lib/types.h would not trigger a rebuild of
t/verify-state.o. The PR fixes this by including them as additional
dependencies. Mainly, T_OBJS and UT_OBJS do not use .d files to record
dependencies correctly, unlike OBJS.

Signed-off-by: Jun Lyu <lvjun_dnt@outlook.com>
* 'master' of https://github.com/Meiye-lj/fio:
  Makefile: fix missing test tool and unit test dependencies
Like commit:

21628ec ("fio_sem, diskutil: introduce fio_shared_sem and use it for diskutil lock")

the stats sem is also potentially shared between processes, and hence
should be allocated and freed as a shared sem.

See the referenced commit, which has more details. Switch the stats sem
to be allocated in such a way that it's propagated properly between
processes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current kernel NVMe passthrough path already supports vectored IO
when using fixed buffers, but fio has not yet adapted it. This patch
aims to add a corresponding test interface in fio.

Test results:

taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V1 -u1 -r4 /dev/ng1n1
submitter=0, tid=6179, file=/dev/ng1n1, nfiles=1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=289.78K, BW=141MiB/s, IOS/call=1/1
IOPS=294.68K, BW=143MiB/s, IOS/call=1/1
IOPS=295.26K, BW=144MiB/s, IOS/call=1/1
Exiting on timeout
Maximum IOPS=295.26K

taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V0 -u1 -r4 /dev/ng1n1
submitter=0, tid=6183, file=/dev/ng1n1, nfiles=1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=292.31K, BW=142MiB/s, IOS/call=1/1
IOPS=295.79K, BW=144MiB/s, IOS/call=1/1
IOPS=290.78K, BW=141MiB/s, IOS/call=1/1
Exiting on timeout
Maximum IOPS=295.79K

Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
vincentkfu and others added 30 commits June 10, 2026 11:54
…nwooim/fio

* 'io_uring/multiple-write-modes' of https://github.com/minwooim/fio:
  t/nvmept_write_mode: add multiple write_mode tests
  io_u: add zeroed, errored flags to @io_u for verify
  io_uring_cmd: support mixed write_mode with ratio
To make testing easier, add a debug print displaying which opcode was
chosen when write mode is randomly selected.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
To facilitate testing, add a debug print when verifying an offset that
has received a write zeroes command.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Count the commands actually submitted to make sure that the actual
distribution of different write commands is close to what was specified.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Run verify jobs with write zeroes and write uncorrectable commands.
Check to make sure that the number of write zeroes verify and read
errors matches the number of write zeroes and write uncorrectable
commands issued.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
pylint complained about formatting. Fix two issues.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
We run nightly tests on QEMU with an emulated NVMe device. QEMU NVMe
devices do not support the write uncorrectable command. Skip test cases
submitting write uncorrectable commands when runnight nightly GitHub
Actions QEMU tests.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
…im/fio

* 'fix/trim-with-do-verify-0' of https://github.com/minwooim/fio:
  backend: log trim @io_u to io_hist even `do_verify=0`
fio_ioring_cmd_queue_init() is identical to fio_ioring_queue_init()
except for the additional IORING_SETUP_SQE128 and IORING_SETUP_CQE32
flags. Check for is_uring_cmd_eng in fio_ioring_queue_init() and set
these flags accordingly. Replace fio_ioring_cmd_queue_init() with
fio_ioring_queue_init().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
fio_ioring_cmd_post_init() is identical to fio_ioring_post_init() except
for the additional handling of 128-byte SQEs. Check for is_uring_cmd_eng
in fio_ioring_post_init() and adjust the SQE size accordingly. Replace
fio_ioring_cmd_post_init() with fio_ioring_post_init().

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Try to issue IORING_REGISTER_RING_FDS in io_uring queue initialization
to register the io_uring's own file descriptor. Use the registered ring
fd with IORING_ENTER_REGISTERED_RING for all io_uring_enter() syscalls.
This improves performance by avoiding the io_uring file descriptor
reference-counting overhead on each io_uring_enter() syscall.
IORING_REGISTER_RING_FDS isn't supported on older kernels, so fall back
to the existing behavior of passing io_uring_enter() a raw fd if the
registration fails.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
…/fio

* 'opt/register-ring-fd' of https://github.com/calebsander/fio:
  io_uring: try to register ring fd
  io_uring: consolidate fio_ioring{,_cmd}_post_init()
  io_uring: consolidate fio_ioring{,_cmd}_queue_init()
No need for braces for a single line 'if' statement.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
…malikoyv/fio

* 'pr-speed-up-norandommap-read-bw' of https://github.com/malikoyv/fio:
  iolog: fix io_piece leak on overlapping in-flight writes
  verify: fix verify starvation with norandommap
Use the location of this script to find the Fio executable when none is
supplied.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Some of the QEMU tests run for a long time (2hrs for zbd, 1hr for 16-bit
Guard PI), so we have only been running them nightly on the tip of the
master branch.

This patch changes the workflow to run these tests on every push and for
every pull request. Even thought it will take longer to get a clean bill
of health I think it's worth it to automate these tests and detect any
issues sooner.

An issue with the io_uring_cmd bsg pull request would have been found
earlier if we automatically ran these tests.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Previously, io_uring_cmd engine was tightly coupled with NVMe command
structures, making it difficult to use with other device types.

This patch supports separated NVMe-specific logic from io_uring_cmd
engine to enable support for other device types such as bsg next to this
patch.

Note that this patch is a prep patch for the followings with no
functional changes.

Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com>
Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
The six static inline byte-order helpers (sgio_get/set_be16/32/64)
were defined privately inside sg.c, making them inaccessible to other
engines that also need to construct or parse SCSI CDBs.

Move them into a new engines/sg.h so that any engine speaking SCSI can
include the header directly rather than reimplementing the same
byte-swap wrappers independently.  sg.c gains an include of sg.h in
place of the now-removed definitions; no behaviour is changed.

Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
Extend io_uring_cmd engine to support SCSI commands via bsg (Block SCSI
Generic) interface. Previously, io_uring_cmd engine only supported NVMe
deivces through cmd_type=nvme.

This patch introduces:
- NEW cmd_type option "bsg" to select bsg_uring_cmd interface
- BSG-specific command preparation and completion handling
- SCSI CDB (Command Descriptor Block) construction
- Read Capacity support for device size detection
- Proper error reporting with SCSI status and host status separation

Currently, the following basic fio --rw= are supported:
- read: READ(10)
- write: WRITE(10)
- trim: UNMAP
- fsync: SYNC CACHE(10)

Example:
--filename=/dev/bsg/0\:0\:0\:0 --ioengine=io_uring_cmd --cmd_type=bsg

Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com>
Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
* 'upstream-io_uring-bsg' of https://github.com/ljw8161/fio:
  engines/io_uring: Add bsg support for io_uring_cmd engine
  engines/sg: extract BE accessor helpers into shared sg.h
  engines/io_uring: Separate NVMe-specific logic
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Rename t/nvmept.py to t/io_uring_cmd.py since we can use this script to
test ioengine=io_uring_cmd and cmd_type={nvme,bsg}.

Also change some of the strings inside the script to reflect the new
name.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Add `cmd_type` as a parameter so that we can run with `cmd_type=bsg`.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Add a test case that sets the writefua and readfua flags.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Use the location of this script to find the Fio executable to use for
testing if none is supplied.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
When using uring_cmd use fio_ioring_cmd_finish_zone instead of the
finish command from blkzone, which is supposed to be used with block
devices.

Signed-off-by: Krijn Doekemeijer <krijn.doekemeijer@wdc.com>
Add a new write mode 'zone_append' for io_uring_cmd that uses zone_appends instead of
writes when ZBD is enabled. Zone appends are issued to the start of
zones.

Signed-off-by: Krijn Doekemeijer <krijn.doekemeijer@wdc.com>
Instead of running one long configuration with all the 16-bit Guard PI
LBA formats, run two sets of tests in parallel. One set will have LBAFs
with 512B data size and the other set will have LBAFs with 4K data size.
These will run in parallel.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Instead of running a single configuration for all of the sections run by
t/zbd/run-tests-against-nuillb, split the 25 sections across five
different configurations. This should shorten the test time from nearly
2.5 hours to 30min or less.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
* 'master' of https://github.com/Krien/fio:
  io_uring: add support for zone appends when using ZBD
  io_uring: add fio_ioring_cmd_finish_zone when using uring_cmd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.