Skip to content

<random>: Implement modified ziggurat algorithm for normal_distribution#6104

Open
statementreply wants to merge 16 commits into
microsoft:mainfrom
statementreply:random-ziggurat
Open

<random>: Implement modified ziggurat algorithm for normal_distribution#6104
statementreply wants to merge 16 commits into
microsoft:mainfrom
statementreply:random-ziggurat

Conversation

@statementreply

@statementreply statementreply commented Feb 23, 2026

Copy link
Copy Markdown
Contributor

This PR implements modified ziggurat algorithm for normal_distribution, based on McFarland, C. D. (2015). A modified ziggurat algorithm for generating exponentially and normally distributed pseudorandom numbers.

The main differences between this PR and the paper:

  • The layers are moved upwards to reduce the area of non-rectangular regions, creating a bottommost layer of which the size of the rectangular region is smaller than 1/N.
  • Different criteria are used for the always sampled and always rejected regions of ziggurat overhangs.

Benchmark results for x64 on Intel Core i5-13600KF:

Benchmark Time Speed Up
BM_Generator<std::mt19937_64> 3.39 ns
BM_Generator<b_r::mt19937_64> 2.52 ns
BM_Distribution<std::mt19937_64, std_old::normal_distribution<double>> 8.91 ns 1.00
BM_Distribution<std::mt19937_64, std_new::normal_distribution<double>> 4.52 ns 1.97
BM_Distribution<std::mt19937_64, boost_r::normal_distribution<double>> 5.48 ns 1.63
BM_Distribution<b_r::mt19937_64, boost_r::normal_distribution<double>> 4.50 ns 1.98
BM_Generator<std::mt19937> 3.00 ns
BM_Generator<b_r::mt19937> 2.12 ns
BM_Distribution<std::mt19937, std_old::normal_distribution<float>> 8.12 ns 1.00
BM_Distribution<std::mt19937, std_new::normal_distribution<float>> 4.41 ns 1.84
BM_Distribution<std::mt19937, boost_r::normal_distribution<float>> 5.28 ns 1.54
BM_Distribution<b_r::mt19937, boost_r::normal_distribution<float>> 4.33 ns 1.88

Removed tests\GH_004618_normal_distribution_avoids_resets which became moot.

Fixes #1003.

Comment thread stl/inc/__msvc_random_ziggurat_tables.hpp Outdated
@github-project-automation github-project-automation Bot moved this from Initial Review to Work In Progress in STL Code Reviews Feb 23, 2026
@StephanTLavavej StephanTLavavej added the performance Must go faster label Feb 23, 2026
Comment thread stl/inc/__msvc_random_ziggurat_tables.hpp Outdated
Comment thread stl/inc/__msvc_random_ziggurat_tables.hpp
Comment thread stl/inc/__msvc_random_ziggurat_tables.hpp
Comment thread tools/scripts/random_ziggurat_tables_generate.cpp Outdated
Comment thread tools/scripts/random_ziggurat_tables_generate.cpp

_STD_BEGIN

template <class _Ty, class _Uty, bool _Signed, int _Lx>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove _Signed is it's always true?

Suggested change
template <class _Ty, class _Uty, bool _Signed, int _Lx>
template <class _Ty, class _Uty, int _Lx>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reserved for exponential_distribution, where _Signed would be false.

static_assert(_Lx <= 254, "invalid table size");

using _Uint_type = _Uty;
using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;
using _Xtype = make_signed_t<_Uty>;

Also, would it be better to rename it to _Sint_type?

using _Xtype = conditional_t<_Signed, make_signed_t<_Uty>, _Uty>;

static constexpr int _Layer_num = _Lx;
static constexpr _Ty _Width_scale = (_Signed ? _Ty{1} : _Ty{2}) * _Ty{(numeric_limits<_Uty>::max)() / 2u + 1u};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid numeric_limits.

Suggested change
static constexpr _Ty _Width_scale = (_Signed ? _Ty{1} : _Ty{2}) * _Ty{(numeric_limits<_Uty>::max)() / 2u + 1u};
static constexpr _Ty _Width_scale = _Ty{static_cast<_Uty>(-1) / 2u + 1u};

@StephanTLavavej StephanTLavavej moved this from Work In Progress to Initial Review in STL Code Reviews Feb 24, 2026
Comment thread stl/inc/random Outdated
Comment thread stl/inc/random
const _Fty _Ydiff = _Yy1 - _Yy0;

for (;;) {
const _Fty _Dx = _Xdiff * _STD _Nrand_impl<_Fty>(_Eng);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper bits of _Xbits could be used on the first iteration. Also for the tail, below.

Comment thread stl/inc/random
Comment on lines +2709 to +2714
const _Fty _Xval = -_STD log(_Fty{1.0} - _STD _Nrand_impl<_Fty>(_Eng)) * _Tail_reciprocal;
const _Fty _Yval = _STD _Nrand_impl<_Fty>(_Eng);
if (_Yval < _STD exp(_Xval * _Xval * _Fty{-0.5})) {
const _Fty _Xorig = _Tail + _Xval;
return static_cast<_Ty>(_Xbits < 0 ? -_Xorig : _Xorig);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marsaglia's tail method might be faster if we don't have a fast exponential generator, as it only needs sqrt when returning, instead of an additional log or exp. It can be found in Luc Devroye's book, section IX.1.2, or this DTIC report.

@StephanTLavavej StephanTLavavej moved this from Initial Review to Work In Progress in STL Code Reviews Mar 5, 2026
Copilot AI review requested due to automatic review settings June 13, 2026 15:39
After using 1 bit for the sign and 24 bits for the magnitude of the random
`float` from the 32-bit random value, we have only 7 unused bits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Status: Work In Progress

Development

Successfully merging this pull request may close these issues.

<random>: normal_distribution is slower than Boost

5 participants