Problem
The FP16 casting pass in casting.py correctly guards against static tensor overflow (weights/constants > FP16_MAX = 65504), but does not account for activation-level overflow — where operations like exp(x) produce intermediate values > 65504 at runtime even though the inputs are within fp16 range.
Example: softplus
A model with a Softplus() activation layer:
cast_fp32_to_fp16() successfully converts all weights and inputs to fp16 ✅
- The converter decomposes
softplus(x) = log(1 + exp(x))
- At runtime on Apple Neural Engine, when
x > 10.4:
exp(10.4) ≈ 32,900 (fits fp16)
exp(11.0) ≈ 59,874 (barely fits fp16)
exp(11.1) ≈ 66,686 → OVERFLOW → output collapses to 0
The current check_tensor_overflow_fp16() and handle_overflow_op() logic only checks scalar/tensor values, not whether the computation graph will produce intermediate overflows.
Affected Operations
| Operation |
Naive Form |
fp16 Overflow Threshold |
| softplus |
exp(x) |
x ≈ 10.4 |
| logsumexp |
sum(exp(x_i)) |
x ≈ 7.63 |
| logcumsumexp |
cumsum(exp(x_i)) |
x ≈ 11.09 |
The Compound Effect
When coreai-optimization applies weight compression (palettization, quantization) AND fp16 casting together:
- Quantization introduces rounding errors in weights
- These errors can shift activation distributions
- Values that were safely below the overflow threshold may now exceed it
- The casting pass has no mechanism to detect or prevent this
Proposed Fix
Add an activation_overflow_audit pass that:
- Identifies ops in the graph whose intermediates can overflow fp16 (
exp, log(1+exp(...)))
- Flags them for stable decomposition or fp32 accumulation
- Integrates with the existing
handle_overflow_op / handle_non_overflow_op classification
Prior Art
apple/coremltools PRs #2725, #2726, #2727 fix the converter-level decomposition
apple/coreai-torch PR #22 adds stable converters for softplus/mish/logsumexp
- This issue addresses the optimization layer, where the interaction between quantization and fp16 creates compound failures
Environment
Problem
The FP16 casting pass in
casting.pycorrectly guards against static tensor overflow (weights/constants > FP16_MAX = 65504), but does not account for activation-level overflow — where operations likeexp(x)produce intermediate values > 65504 at runtime even though the inputs are within fp16 range.Example: softplus
A model with a
Softplus()activation layer:cast_fp32_to_fp16()successfully converts all weights and inputs to fp16 ✅softplus(x) = log(1 + exp(x))x > 10.4:exp(10.4)≈ 32,900 (fits fp16)exp(11.0)≈ 59,874 (barely fits fp16)exp(11.1)≈ 66,686 → OVERFLOW → output collapses to 0The current
check_tensor_overflow_fp16()andhandle_overflow_op()logic only checks scalar/tensor values, not whether the computation graph will produce intermediate overflows.Affected Operations
exp(x)sum(exp(x_i))cumsum(exp(x_i))The Compound Effect
When
coreai-optimizationapplies weight compression (palettization, quantization) AND fp16 casting together:Proposed Fix
Add an
activation_overflow_auditpass that:exp,log(1+exp(...)))handle_overflow_op/handle_non_overflow_opclassificationPrior Art
apple/coremltoolsPRs #2725, #2726, #2727 fix the converter-level decompositionapple/coreai-torchPR #22 adds stable converters for softplus/mish/logsumexpEnvironment