From 557e20d4587481393dd63f40c545aef170a6ea8b Mon Sep 17 00:00:00 2001 From: Richard Ball Date: Wed, 25 Mar 2026 16:11:07 +0000 Subject: [PATCH 1/2] Add FEAT_CMH support --- main/acle.md | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/main/acle.md b/main/acle.md index f6d4878f..adc8a77c 100644 --- a/main/acle.md +++ b/main/acle.md @@ -1848,6 +1848,11 @@ execution state. Intrinsics for the use of these instructions are specified in data placement hints (FEAT_PCDPHINT) instructions and their associated intrinsics are available on the target. +### Contention Management hints + +`__ARM_FEATURE_CMH` is defined to `1` if the Contention Management hints +(FEAT_CMH) instructions and their associated intrinsics are available on the target. + ## Floating-point and vector hardware ### Hardware floating point @@ -2654,6 +2659,7 @@ be found in [[BA]](#BA). | [`__ARM_FEATURE_CDE`](#custom-datapath-extension) | Custom Datapath Extension | 0x01 | | [`__ARM_FEATURE_CDE_COPROC`](#custom-datapath-extension) | Custom Datapath Extension | 0xf | | [`__ARM_FEATURE_CLZ`](#clz) | CLZ instruction | 1 | +| [`__ARM_FEATURE_CMH`](#contention-management-hints) | Contention management hints | 1 | | [`__ARM_FEATURE_COMPLEX`](#complex-number-intrinsics) | Armv8.3-A extension | 1 | | [`__ARM_FEATURE_COPROC`](#coprocessor-intrinsics) | Coprocessor Intrinsics | 1 | | [`__ARM_FEATURE_CRC32`](#crc32-extension) | CRC32 extension | 1 | @@ -4980,6 +4986,58 @@ The fourth argument can contain the following values: | KEEP | 0 | Signals to retain the updated location in the local cache of the updating PE. | | STRM | 1 | Signals to not retain the updated location in the local cache of the updating PE. | +## Atomic store with CMH intrinsics + +These intrinsics provide an atomic store, which will +make use of the `STCPH` or `SHUH` hint instructions immediately followed by the +associated store instruction. These intrinsics are type generic and +support scalar types from 8-64 bits and are available when +`__ARM_FEATURE_CMH` is defined. + +To access these intrinsics, `` should be included. + +``` c + void __arm_atomic_store_with_stcph(type *ptr, type data, int memory_order); + void __arm_atomic_store_with_shuh(type *ptr, type data, int memory_order, int priority_hint); +``` + +The first argument in these intrinsics is a pointer `ptr` which is the location to store to. +The second argument `data` is the data which is to be stored. +The third argument `mem` can be one of 3 memory ordering variables supported by atomic_store: +__ATOMIC_RELAXED, __ATOMIC_SEQ_CST, and __ATOMIC_RELEASE. +The fourth argument `priority_hint` can be either 0 or 1. If set to 1 then if the next instruction in program order generates +an Explicit Memory Write Effect, then there is a performance benefit if that Explicit Memory Write Effect +is sequenced before Memory Effects from other threads of execution in the coherence order to the same +location. + +## Atomic fetch with CMH intrinsics + +These intrinsics provide some atomic fetch operations, which will +make use of the `SHUH` hint instruction immediately followed by the +associated fetch instructions. These intrinsics are type generic and +support scalar types from 8-64 bits and are available when +`__ARM_FEATURE_CMH` is defined. + +To access these intrinsics, `` should be included. + +``` c + type __arm_atomic_fetch_add_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_sub_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_and_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_xor_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_or_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_nand_with_shuh(type *ptr, type data, int memory_order, int priority_hint); +``` + +The first argument in these intrinsic is a pointer `ptr` which is the location to store to. +The second argument `data` is the data which is to be stored. +The third argument `mem` can be one of 6 memory ordering variables supported by atomic_fetch: +__ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, __ATOMIC_CONSUME, __ATOMIC_ACQ_REL and __ATOMIC_RELEASE. +The fourth argument `priority_hint` can be either 0 or 1. If set to 1 then if the next instruction in program order generates +an Explicit Memory Write Effect, then there is a performance benefit if that Explicit Memory Write Effect +is sequenced before Memory Effects from other threads of execution in the coherence order to the same +location. + # Custom Datapath Extension The intrinsics in this section provide access to instructions in the From 9e4c6c568e86f598609f9e297fd66b1d5f43f2f7 Mon Sep 17 00:00:00 2001 From: Richard Ball Date: Thu, 7 May 2026 16:37:00 +0100 Subject: [PATCH 2/2] Update FEAT_CMH support Following changes to PCDPHINT update ACLE description --- main/acle.md | 62 +++++++++++++++++++--------------------------------- 1 file changed, 22 insertions(+), 40 deletions(-) diff --git a/main/acle.md b/main/acle.md index 0084845d..e5301faa 100644 --- a/main/acle.md +++ b/main/acle.md @@ -488,7 +488,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin * Added [**Alpha**](#current-status-and-anticipated-changes) support for Brain 16-bit floating-point vector multiplication intrinsics. * Redesigned atomic store with hints intrinsics. - +* Added support for producer-consumer data placement hints. [**Alpha** state] ### References This document refers to the following documents. @@ -1849,7 +1849,7 @@ execution state. Intrinsics for the use of these instructions are specified in data placement hints (FEAT_PCDPHINT) instructions and their associated intrinsics are available on the target. -### Contention Management hints +### Contention Management hints [**Alpha** state] `__ARM_FEATURE_CMH` is defined to `1` if the Contention Management hints (FEAT_CMH) instructions and their associated intrinsics are available on the target. @@ -4991,58 +4991,40 @@ target. The following hint values are defined: | ---------------- | --------- | -------------------------- | --------------------------------------------------------------------------------- | | HINT_STSHH_KEEP | 0 | `__ARM_FEATURE_PCDPHINT` | Requests retention of the updated location in the local cache of the updating PE. | | HINT_STSHH_STRM | 1 | `__ARM_FEATURE_PCDPHINT` | Requests that the updated location not be retained in the local cache of the updating PE. | +| HINT_STCPH | 2 | `__ARM_FEATURE_CMH` | Ensures that the memory write effect of the next instruction occurs before any other effects from other threads.| +| HINT_SHUH | 3 | `__ARM_FEATURE_CMH` | Informs that the next instruction generates an effect in a location that one or more other threads of execution are likely to subsequently update. | +| HINT_SHUH_PH | 4 | `__ARM_FEATURE_CMH` | PH adds the effects of STCPH to SHUH. | -## Atomic store with CMH intrinsics - -These intrinsics provide an atomic store, which will -make use of the `STCPH` or `SHUH` hint instructions immediately followed by the -associated store instruction. These intrinsics are type generic and -support scalar types from 8-64 bits and are available when -`__ARM_FEATURE_CMH` is defined. - -To access these intrinsics, `` should be included. - -``` c - void __arm_atomic_store_with_stcph(type *ptr, type data, int memory_order); - void __arm_atomic_store_with_shuh(type *ptr, type data, int memory_order, int priority_hint); -``` - -The first argument in these intrinsics is a pointer `ptr` which is the location to store to. -The second argument `data` is the data which is to be stored. -The third argument `mem` can be one of 3 memory ordering variables supported by atomic_store: -__ATOMIC_RELAXED, __ATOMIC_SEQ_CST, and __ATOMIC_RELEASE. -The fourth argument `priority_hint` can be either 0 or 1. If set to 1 then if the next instruction in program order generates -an Explicit Memory Write Effect, then there is a performance benefit if that Explicit Memory Write Effect -is sequenced before Memory Effects from other threads of execution in the coherence order to the same -location. - -## Atomic fetch with CMH intrinsics +## Atomic fetch with hints intrinsics [**Alpha** state] These intrinsics provide some atomic fetch operations, which will -make use of the `SHUH` hint instruction immediately followed by the +make use of hint instructions immediately followed by the associated fetch instructions. These intrinsics are type generic and -support scalar types from 8-64 bits and are available when -`__ARM_FEATURE_CMH` is defined. +supports scalar integral and floating-point types of 8, 16, 32, and 64 bits. To access these intrinsics, `` should be included. ``` c - type __arm_atomic_fetch_add_with_shuh(type *ptr, type data, int memory_order, int priority_hint); - type __arm_atomic_fetch_sub_with_shuh(type *ptr, type data, int memory_order, int priority_hint); - type __arm_atomic_fetch_and_with_shuh(type *ptr, type data, int memory_order, int priority_hint); - type __arm_atomic_fetch_xor_with_shuh(type *ptr, type data, int memory_order, int priority_hint); - type __arm_atomic_fetch_or_with_shuh(type *ptr, type data, int memory_order, int priority_hint); - type __arm_atomic_fetch_nand_with_shuh(type *ptr, type data, int memory_order, int priority_hint); + type __arm_atomic_fetch_add_with_hint(type *ptr, type data, int memory_order, int hint); + type __arm_atomic_fetch_sub_with_hint(type *ptr, type data, int memory_order, int hint); + type __arm_atomic_fetch_and_with_hint(type *ptr, type data, int memory_order, int hint); + type __arm_atomic_fetch_xor_with_hint(type *ptr, type data, int memory_order, int hint); + type __arm_atomic_fetch_or_with_hint(type *ptr, type data, int memory_order, int hint); ``` The first argument in these intrinsic is a pointer `ptr` which is the location to store to. The second argument `data` is the data which is to be stored. The third argument `mem` can be one of 6 memory ordering variables supported by atomic_fetch: __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, __ATOMIC_CONSUME, __ATOMIC_ACQ_REL and __ATOMIC_RELEASE. -The fourth argument `priority_hint` can be either 0 or 1. If set to 1 then if the next instruction in program order generates -an Explicit Memory Write Effect, then there is a performance benefit if that Explicit Memory Write Effect -is sequenced before Memory Effects from other threads of execution in the coherence order to the same -location. + +The fourth argument `hint` selects the requested hint. The set of valid +hint values depends on the architectural features supported by the +target. The following hint values are defined: + +| **Hint** | **Value** | **Feature** | **Summary** | +| ---------------- | --------- | -------------------------- | --------------------------------------------------------------------------------- | +| HINT_SHUH | 0 | `__ARM_FEATURE_CMH` | Informs that the next instruction generates an effect in a location that one or more other threads of execution are likely to subsequently update. | +| HINT_SHUH_PH | 1 | `__ARM_FEATURE_CMH` | PH adds the effects of STCPH to SHUH. | # Custom Datapath Extension