Skip to content

Benchmark S3ThreadPoolExecutor vs S3AioExecutor before defaulting to async #685

@laughingman7743

Description

@laughingman7743

Summary

Compare performance of S3ThreadPoolExecutor (sync, current default) vs S3AioExecutor (async, new) to validate the switch to AioS3FileSystem as the default in v3.30.0.

Background

PR #684 introduced the S3Executor strategy pattern, replacing hardcoded ThreadPoolExecutor usage with a pluggable interface. This eliminates thread-in-thread nesting when aio cursors use S3FileSystem. Before making AioS3FileSystem the default for async paths, we need empirical performance data.

Related:

Benchmark Scope

Scenarios

Scenario Description
Query result fetch AioS3FSCursor fetch performance (small/medium/large result sets)
Large file read Multipart range read via _fetch_range
Large file write Multipart upload via commit
Parallel copy _copy_object_with_multipart_upload

Metrics

  • Wall-clock time (latency)
  • Throughput (MB/s)
  • Concurrency behavior under varying max_workers

Comparison

  • S3FileSystem + S3ThreadPoolExecutor (sync baseline)
  • AioS3FileSystem + S3AioExecutor (async candidate)

Acceptance Criteria

  • Benchmark script(s) covering the scenarios above
  • Results showing no significant regression for async path
  • Summary with recommendation for v3.30.0 default switch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions