Skip to content

gh-100239: specialize left and right shift ops on ints#129431

Open
eendebakpt wants to merge 14 commits intopython:mainfrom
eendebakpt:binaryops_shift
Open

gh-100239: specialize left and right shift ops on ints#129431
eendebakpt wants to merge 14 commits intopython:mainfrom
eendebakpt:binaryops_shift

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented Jan 29, 2025

We add a specialization for >> and >> on compact integers.

  • For both lshift and rshift we have to avoid undefined behavior for shifts larger than the number of bits in a C long.
  • The lshift operator << can overflow, so we restrict shifts to values smaller than 17. Under this assumption the value of a << b with b <= 16) fit into a 32-bit or 64-bit integer depending on the platform.

Benchmark:

bench_lshift_compactint: Mean +- std dev: [mainx2] 381 us +- 33 us -> [prx2] 334 us +- 46 us: 1.14x faster
bench_rshift_compactint: Mean +- std dev: [mainx2] 398 us +- 34 us -> [prx2] 281 us +- 46 us: 1.41x faster

Benchmark hidden because not significant (1): bench_rshift_largeint

Geometric mean: 1.17x **faster**

(non-PGO, Windows)

There is a gain with the specialization, but on the other hand: how many shift operations on compact ints are there? For example the shifts are used in the uuid module, but always on large ints.

@eendebakpt eendebakpt added this to the hhhhhhhhhhhhhhhhhhhhhhhhhbbbbbbbb milestone Jan 29, 2025
@johnslavik
Copy link
Copy Markdown
Member

johnslavik commented Jan 29, 2025

@eendebakpt, something happened with the milestone.
image

@skirpichev skirpichev removed this from the hhhhhhhhhhhhhhhhhhhhhhhhhbbbbbbbb milestone Jan 29, 2025
@eendebakpt eendebakpt marked this pull request as draft January 29, 2025 19:51
@eendebakpt eendebakpt changed the title gh-100239: specialize right shift ops on ints gh-100239: specialize left and right shift ops on ints Jan 29, 2025
@eendebakpt eendebakpt marked this pull request as ready for review January 30, 2025 20:16
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 21, 2026
# Conflicts:
#	Python/specialize.c
@eendebakpt
Copy link
Copy Markdown
Contributor Author

Updated benchmarks:

Benchmark main PR Δ
lshift_compactint (literal rhs 1–5) 12.9 ns 9.09 ns 1.42x faster
rshift_compactint (literal rhs 1–5) 19.3 ns 14.5 ns 1.33x faster
lshift_loop_var (rhs from range(16)) 400 ns 322 ns 1.25x faster
rshift_loop_var (rhs from range(16)) 384 ns 327 ns 1.17x faster
inplace_shift_mix (a <<= k; a >>= k) 15.0 ns 9.31 ns 1.61x faster
rshift_largeint (lhs = 1 << 200) 29.3 ns 29.3 ns not significant
Geometric mean 1.28x faster

rshift_largeint is the negative control — the guard correctly rejects non-compact operands, so large-int code pays zero overhead.

Benchmark script
"""Microbenchmarks for compact-int shift specialization (gh-100239)."""

import pyperf


def bench_lshift_compactint(loops):
    range_it = range(loops)
    a = 3
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a << 1; a << 2; a << 3; a << 4; a << 5
        a << 1; a << 2; a << 3; a << 4; a << 5
        a << 1; a << 2; a << 3; a << 4; a << 5
        a << 1; a << 2; a << 3; a << 4; a << 5
    return pyperf.perf_counter() - t0


def bench_rshift_compactint(loops):
    range_it = range(loops)
    a = 1 << 20
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
    return pyperf.perf_counter() - t0


def bench_rshift_largeint(loops):
    range_it = range(loops)
    a = 1 << 200
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
        a >> 1; a >> 2; a >> 3; a >> 4; a >> 5
    return pyperf.perf_counter() - t0


def bench_lshift_loop_var(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a = 5
        for i in range(16):
            a << i
    return pyperf.perf_counter() - t0


def bench_rshift_loop_var(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a = 1 << 16
        for i in range(16):
            a >> i
    return pyperf.perf_counter() - t0


def bench_inplace_lshift(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for _ in range_it:
        a = 1
        a <<= 1; a >>= 1
        a <<= 2; a >>= 2
        a <<= 3; a >>= 3
        a <<= 4; a >>= 4
        a <<= 5; a >>= 5
    return pyperf.perf_counter() - t0


if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("lshift_compactint", bench_lshift_compactint, inner_loops=20)
    runner.bench_time_func("rshift_compactint", bench_rshift_compactint, inner_loops=20)
    runner.bench_time_func("rshift_largeint",   bench_rshift_largeint,   inner_loops=20)
    runner.bench_time_func("lshift_loop_var",   bench_lshift_loop_var)
    runner.bench_time_func("rshift_loop_var",   bench_rshift_loop_var)
    runner.bench_time_func("inplace_shift_mix", bench_inplace_lshift,    inner_loops=10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants