[pull] master from axboe:master#314
Open
pull[bot] wants to merge 1433 commits into
Open
Conversation
This was more of a thought experiment back in the day, but even for an old interface like libaio on Linux, it does not support canceling IOs at all. Neither does posixaio. And while cancel support could get plumbed up to io_uring, since Linux doesn't support canceling normal IO, then it will never do anything. Hence it's utterly pointless to have a cancel ops in the IO engine, and the backend attempts at first reaping done IO and then canceling the rest is also then pointless. Just replace the at-exit cancelation with waiting on pending IO. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* 'sprandom' of https://github.com/tomas-winkler-sndk/fio: sprandom: integrate sprandom_get_next_offset() into io_u path sprandom: initialize sprandom for file sprandom: implement sprandom_get_next_offset() sprandom: initialize random state unittests: add pcbuf simple unit test sprandom: pcbuf.h add two-phase circular buffer header-only library unittests: add bytes2str_simple() num2str: add bytes2str_simple() sprandom: set up LFSR random generator and disable randommap sprandom: implement region computation and invalidation percentage sprandom: examples: add sprandom example file sprandom: add debug facility sprandom: add command line options
Since fio is often used in scripts, refuse to continue and give the user an opportunity to correct invalid settings when they appear. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
When thread=1 multiple jobs share the same buffer used to expand the %o format specifier. This can cause verify failures when one thread tries to verify a buffer using the %o expansion from another thread. This patch makes sure threads use different buffers. This is a stop-gap measure to resolve verify failures for now. A better solution would be to at init time allocate a set of buffers for each job thread (and verify_async thread since they are vulnerable to the same issue) and use those buffers instead of doing a malloc/free for each verify operation. Fixes: #1845 Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Fix the most egregious warnings produced by running mandoc -W warning,stop ./fio.1 > /dev/null - Fix usage of .RE when .RS block isn't open - Stop escaping = as it's not needed - Fix incorrect usage .TP before .RE as the line following .TP isn't a leading tag and it is having no effect - Fix broken macro sequence and incorrect usage of \fR...\fP which should have been \fB...\fP because it's outside of .BI Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Silence mandoc "WARNING: skipping paragraph macro: PP empty" complaints. This also fixes the over indention of the final paragraph in "Trace file format v3". Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Update the man page's date to match the date of the last fio release (fio-3.40) so things look less crufty. Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
* 'fix_mandoc_warnings' of https://github.com/sitsofe/fio: man: update date man: fix mandoc "PP empty" warnings man: fix mandoc lint errors
__SANE_USERSPACE_TYPES__ needs to be defined to get consistent formats on all platforms. It mostly affects 64-bit architectures (no op on 32 bit) with long long vs long. Signed-off-by: Rosen Penev <rosenp@gmail.com>
* 'patch-1' of https://github.com/neheb/fio: fio: fix formats under MIPS64/PPC
Move the sprandom_init() call to occur before total_io_size is computed, in order to ensure correctly compute statitics. Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Fix a lefover bug, after code changes. Use 'offset' instead of *b pointer. Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
Free invalid_pct, fix a memory leak. Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
The validity_dist buffer is only needed to compute the invalid_pct array. Once that is done, we can free it instead of keeping it around unnecessarily. Signed-off-by: Tomas Winkler <tomas.winkler@sandisk.com>
This (finally) provides macOS cache invalidation and is heavily based on code originally provided by DeveloperEcosystemEngineering@apple. Because posix_fadvise() isn't implemented on macOS, DeveloperEcosystemEngineering demonstrated that creating a shared mapping of a file and using using msync([...], MS_INVALIDATE) on it can be used to discard covered page cache pages instead - ingenious! This commit uses that technique to create a macOS posix_fadvise([...], POSIX_FADV_DONTNEED) shim. To paraphrase commit 8300eba ("windowsaio: add best effort cache invalidation") that was done for similar reasons: This change may make default bandwidth speeds on macOS look lower compared to older versions of fio but this matches the behaviour of fio on other platforms with invalidation (such as Linux) because we are trying to avoid measuring cache reuse (unless invalidate=0 is set). The impact of invalidation is demonstrated by the bandwidths achieved by the following jobs running on an SSD of an otherwise idle Intel Mac laptop with 16GBytes of RAM: ./fio --stonewall --size=128M --ioengine=posixaio --filename=fio.tmp \ --iodepth=64 --bs=4k --direct=0 \ --name=create --rw=write \ --name=cached --rw=randread --loops=2 --invalidate=0 \ --name=invalidated --rw=randread --loops=2 --invalidate=1 [...] cached: (groupid=1, jobs=1): err= 0: pid=7795: Tue Sep 2 22:34:12 2025 read: IOPS=228k, BW=889MiB/s (932MB/s)(256MiB/288msec) [...] invalidated: (groupid=2, jobs=1): err= 0: pid=7796: Tue Sep 2 22:34:12 2025 read: IOPS=46.8k, BW=183MiB/s (192MB/s)(256MiB/1399msec) v2: - Move platform specific code into its own file under os/mac/ - Don't do prior fsync() because msync([...], MS_INVALIDATE) doesn't imply the dropping of dirty pages and will have the same effect v3: - Up the mmap chunk size to 16 GBytes to reduce the number of times we mmap()/msync()/munmap() on large files - Align offset and len to the system page size to prevent errors on jobs like ./fio --name=n --offset=2k --size=30k - Try and munmap() if msync() fails - Make Rosetta comment clearer - Drop some variables and rename some others - Don't bother trying to restore errno after displaying an error message because posix_fadvise() isn't defined as setting errno Fixes: #48 Suggested-by: DeveloperEcosystemEngineering <DeveloperEcosystemEngineering@apple.com> Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
- Add support for POSIX_FADV_NORMAL in the posix_fadvise() shim by just ignoring it - Add support for POSIX_FADV_SEQUENTIAL/POSIX_FADV_RANDOM by mapping them to enable/disable of readahead via fcntl(..., F_RDAHEAD, ...). Because macOS only lets you control readahead at the descriptor level the offset and len values passed will be ignored and range control is not done. The impact of being able to tune readahead is demonstrated by the bandwidths achieved by the following jobs running on an SSD of an otherwise idle Intel Mac laptop with 16GBytes of RAM: ./fio --stonewall --size=128M --filename=fio.tmp --bs=4k --rw=read \ --name=sequential-readahead --fadvise=sequential \ --name=sequential-no-readahead --fadvise=random [...] sequential-readahead: (groupid=0, jobs=1): err= 0: pid=6250: Tue Sep 2 22:10:45 2025 read: IOPS=331k, BW=1293MiB/s (1356MB/s)(128MiB/99msec) [...] sequential-no-readahead: (groupid=1, jobs=1): err= 0: pid=6251: Tue Sep 2 22:10:45 2025 read: IOPS=25.9k, BW=101MiB/s (106MB/s)(128MiB/1263msec) rm -f fio-huge.tmp truncate -s 1T fio-huge.tmp ./fio --stonewall --filename=fio-huge.tmp --bs=32k --runtime=10s --rw=randread:3 \ --name=partial-random-no-readahead --fadvise=random \ --name=absorb-cache-invalidation --number_ios=1 --bs=4k \ --name=partial-random-readahead --fadvise=sequential [...] partial-random-no-readahead: (groupid=0, jobs=1): err= 0: pid=6259: Tue Sep 2 22:12:35 2025 read: IOPS=92.4k, BW=2888MiB/s (3029MB/s)(28.2GiB/10001msec) [...] partial-random-readahead: (groupid=2, jobs=1): err= 0: pid=6261: Tue Sep 2 22:12:35 2025 read: IOPS=61.8k, BW=1931MiB/s (2024MB/s)(18.9GiB/10001msec) Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
…k/fio * 'sprandom-fixes' of https://github.com/tomas-winkler-sndk/fio: sprandom: drop validity_dist after use sprandom: free invalid_pct buffer sprandom: fix debug printout for offset sprandom: setup SPRandom before total_io_size is computed
This array is actually used to calculate invalid_capacity. So wait to free it until the very end. Fixes: 8c8e705 ("sprandom: drop validity_dist after use") Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Stop hard coding the supported state version number in t/verify-state.c and just use VSTATE_HDR_VERSION so we stay in sync with the rest of fio. Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
- Move INVALID_NUMBERIO to the verify-state.h to make it accessible to t/verify-state.c - Make unused inflight I/O slots more obvious Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
* 'fix_verify-state' of https://github.com/sitsofe/fio: t/verify-state: improve verify state inflight output t/verify-state: synchronise verify state version
…r-Ecosystem-Engineering/fio * 'improve_flushing_darwin' of https://github.com/Developer-Ecosystem-Engineering/fio: mac: add readahead control to the posix_fadvise() shim mac: implement (file) cache invalidation
This PR fixes an issue in the Makefile. Specifically, previously, any modifications of files like lib/types.h would not trigger a rebuild of t/verify-state.o. The PR fixes this by including them as additional dependencies. Mainly, T_OBJS and UT_OBJS do not use .d files to record dependencies correctly, unlike OBJS. Signed-off-by: Jun Lyu <lvjun_dnt@outlook.com>
* 'master' of https://github.com/Meiye-lj/fio: Makefile: fix missing test tool and unit test dependencies
Like commit: 21628ec ("fio_sem, diskutil: introduce fio_shared_sem and use it for diskutil lock") the stats sem is also potentially shared between processes, and hence should be allocated and freed as a shared sem. See the referenced commit, which has more details. Switch the stats sem to be allocated in such a way that it's propagated properly between processes. Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current kernel NVMe passthrough path already supports vectored IO when using fixed buffers, but fio has not yet adapted it. This patch aims to add a corresponding test interface in fio. Test results: taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V1 -u1 -r4 /dev/ng1n1 submitter=0, tid=6179, file=/dev/ng1n1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=289.78K, BW=141MiB/s, IOS/call=1/1 IOPS=294.68K, BW=143MiB/s, IOS/call=1/1 IOPS=295.26K, BW=144MiB/s, IOS/call=1/1 Exiting on timeout Maximum IOPS=295.26K taskset -c 1 t/io_uring -b512 -d64 -c2 -s2 -p1 -F1 -B1 -O0 -n1 -V0 -u1 -r4 /dev/ng1n1 submitter=0, tid=6183, file=/dev/ng1n1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=292.31K, BW=142MiB/s, IOS/call=1/1 IOPS=295.79K, BW=144MiB/s, IOS/call=1/1 IOPS=290.78K, BW=141MiB/s, IOS/call=1/1 Exiting on timeout Maximum IOPS=295.79K Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
…nwooim/fio * 'io_uring/multiple-write-modes' of https://github.com/minwooim/fio: t/nvmept_write_mode: add multiple write_mode tests io_u: add zeroed, errored flags to @io_u for verify io_uring_cmd: support mixed write_mode with ratio
To make testing easier, add a debug print displaying which opcode was chosen when write mode is randomly selected. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
To facilitate testing, add a debug print when verifying an offset that has received a write zeroes command. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Count the commands actually submitted to make sure that the actual distribution of different write commands is close to what was specified. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Run verify jobs with write zeroes and write uncorrectable commands. Check to make sure that the number of write zeroes verify and read errors matches the number of write zeroes and write uncorrectable commands issued. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
pylint complained about formatting. Fix two issues. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
We run nightly tests on QEMU with an emulated NVMe device. QEMU NVMe devices do not support the write uncorrectable command. Skip test cases submitting write uncorrectable commands when runnight nightly GitHub Actions QEMU tests. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
…im/fio * 'fix/trim-with-do-verify-0' of https://github.com/minwooim/fio: backend: log trim @io_u to io_hist even `do_verify=0`
fio_ioring_cmd_queue_init() is identical to fio_ioring_queue_init() except for the additional IORING_SETUP_SQE128 and IORING_SETUP_CQE32 flags. Check for is_uring_cmd_eng in fio_ioring_queue_init() and set these flags accordingly. Replace fio_ioring_cmd_queue_init() with fio_ioring_queue_init(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
fio_ioring_cmd_post_init() is identical to fio_ioring_post_init() except for the additional handling of 128-byte SQEs. Check for is_uring_cmd_eng in fio_ioring_post_init() and adjust the SQE size accordingly. Replace fio_ioring_cmd_post_init() with fio_ioring_post_init(). Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Try to issue IORING_REGISTER_RING_FDS in io_uring queue initialization to register the io_uring's own file descriptor. Use the registered ring fd with IORING_ENTER_REGISTERED_RING for all io_uring_enter() syscalls. This improves performance by avoiding the io_uring file descriptor reference-counting overhead on each io_uring_enter() syscall. IORING_REGISTER_RING_FDS isn't supported on older kernels, so fall back to the existing behavior of passing io_uring_enter() a raw fd if the registration fails. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
…/fio * 'opt/register-ring-fd' of https://github.com/calebsander/fio: io_uring: try to register ring fd io_uring: consolidate fio_ioring{,_cmd}_post_init() io_uring: consolidate fio_ioring{,_cmd}_queue_init()
No need for braces for a single line 'if' statement. Signed-off-by: Jens Axboe <axboe@kernel.dk>
…malikoyv/fio * 'pr-speed-up-norandommap-read-bw' of https://github.com/malikoyv/fio: iolog: fix io_piece leak on overlapping in-flight writes verify: fix verify starvation with norandommap
Use the location of this script to find the Fio executable when none is supplied. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Some of the QEMU tests run for a long time (2hrs for zbd, 1hr for 16-bit Guard PI), so we have only been running them nightly on the tip of the master branch. This patch changes the workflow to run these tests on every push and for every pull request. Even thought it will take longer to get a clean bill of health I think it's worth it to automate these tests and detect any issues sooner. An issue with the io_uring_cmd bsg pull request would have been found earlier if we automatically ran these tests. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Previously, io_uring_cmd engine was tightly coupled with NVMe command structures, making it difficult to use with other device types. This patch supports separated NVMe-specific logic from io_uring_cmd engine to enable support for other device types such as bsg next to this patch. Note that this patch is a prep patch for the followings with no functional changes. Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com> Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
The six static inline byte-order helpers (sgio_get/set_be16/32/64) were defined privately inside sg.c, making them inaccessible to other engines that also need to construct or parse SCSI CDBs. Move them into a new engines/sg.h so that any engine speaking SCSI can include the header directly rather than reimplementing the same byte-swap wrappers independently. sg.c gains an include of sg.h in place of the now-removed definitions; no behaviour is changed. Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
Extend io_uring_cmd engine to support SCSI commands via bsg (Block SCSI Generic) interface. Previously, io_uring_cmd engine only supported NVMe deivces through cmd_type=nvme. This patch introduces: - NEW cmd_type option "bsg" to select bsg_uring_cmd interface - BSG-specific command preparation and completion handling - SCSI CDB (Command Descriptor Block) construction - Read Capacity support for device size detection - Proper error reporting with SCSI status and host status separation Currently, the following basic fio --rw= are supported: - read: READ(10) - write: WRITE(10) - trim: UNMAP - fsync: SYNC CACHE(10) Example: --filename=/dev/bsg/0\:0\:0\:0 --ioengine=io_uring_cmd --cmd_type=bsg Signed-off-by: Jungwon Lee <jjung1.lee@samsung.com> Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com>
* 'upstream-io_uring-bsg' of https://github.com/ljw8161/fio: engines/io_uring: Add bsg support for io_uring_cmd engine engines/sg: extract BE accessor helpers into shared sg.h engines/io_uring: Separate NVMe-specific logic
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Rename t/nvmept.py to t/io_uring_cmd.py since we can use this script to
test ioengine=io_uring_cmd and cmd_type={nvme,bsg}.
Also change some of the strings inside the script to reflect the new
name.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Add `cmd_type` as a parameter so that we can run with `cmd_type=bsg`. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Add a test case that sets the writefua and readfua flags. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Use the location of this script to find the Fio executable to use for testing if none is supplied. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
When using uring_cmd use fio_ioring_cmd_finish_zone instead of the finish command from blkzone, which is supposed to be used with block devices. Signed-off-by: Krijn Doekemeijer <krijn.doekemeijer@wdc.com>
Add a new write mode 'zone_append' for io_uring_cmd that uses zone_appends instead of writes when ZBD is enabled. Zone appends are issued to the start of zones. Signed-off-by: Krijn Doekemeijer <krijn.doekemeijer@wdc.com>
Instead of running one long configuration with all the 16-bit Guard PI LBA formats, run two sets of tests in parallel. One set will have LBAFs with 512B data size and the other set will have LBAFs with 4K data size. These will run in parallel. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Instead of running a single configuration for all of the sections run by t/zbd/run-tests-against-nuillb, split the 25 sections across five different configurations. This should shorten the test time from nearly 2.5 hours to 30min or less. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
* 'master' of https://github.com/Krien/fio: io_uring: add support for zone appends when using ZBD io_uring: add fio_ioring_cmd_finish_zone when using uring_cmd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )