FP16 casting pass does not guard against activation-level overflow (softplus, exp, logsumexp)

## Problem

The FP16 casting pass in `casting.py` correctly guards against **static tensor overflow** (weights/constants > FP16_MAX = 65504), but does not account for **activation-level overflow** — where operations like `exp(x)` produce intermediate values > 65504 at runtime even though the inputs are within fp16 range.

### Example: softplus

A model with a `Softplus()` activation layer:
1. `cast_fp32_to_fp16()` successfully converts all weights and inputs to fp16 ✅
2. The converter decomposes `softplus(x) = log(1 + exp(x))`
3. At runtime on Apple Neural Engine, when `x > 10.4`:
   - `exp(10.4)` ≈ 32,900 (fits fp16)
   - `exp(11.0)` ≈ 59,874 (barely fits fp16)
   - `exp(11.1)` ≈ 66,686 → **OVERFLOW** → output collapses to 0

The current `check_tensor_overflow_fp16()` and `handle_overflow_op()` logic only checks scalar/tensor *values*, not whether the *computation graph* will produce intermediate overflows.

### Affected Operations

| Operation | Naive Form | fp16 Overflow Threshold |
|-----------|-----------|----------------------|
| softplus | `exp(x)` | x ≈ 10.4 |
| logsumexp | `sum(exp(x_i))` | x ≈ 7.63 |
| logcumsumexp | `cumsum(exp(x_i))` | x ≈ 11.09 |

### The Compound Effect

When `coreai-optimization` applies weight compression (palettization, quantization) AND fp16 casting together:
1. Quantization introduces rounding errors in weights
2. These errors can shift activation distributions
3. Values that were safely below the overflow threshold may now exceed it
4. The casting pass has no mechanism to detect or prevent this

### Proposed Fix

Add an `activation_overflow_audit` pass that:
1. Identifies ops in the graph whose intermediates can overflow fp16 (`exp`, `log(1+exp(...))`)
2. Flags them for stable decomposition or fp32 accumulation
3. Integrates with the existing `handle_overflow_op` / `handle_non_overflow_op` classification

### Prior Art

- `apple/coremltools` PRs #2725, #2726, #2727 fix the converter-level decomposition
- `apple/coreai-torch` PR #22 adds stable converters for softplus/mish/logsumexp
- This issue addresses the *optimization* layer, where the interaction between quantization and fp16 creates compound failures

### Environment

- coreai-optimization: latest (cloned June 21, 2026)
- Related: apple/coreai-torch#21, apple/coremltools#2687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 casting pass does not guard against activation-level overflow (softplus, exp, logsumexp) #7

Problem

Example: softplus

Affected Operations

The Compound Effect

Proposed Fix

Prior Art

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Operation	Naive Form	fp16 Overflow Threshold
softplus	`exp(x)`	x ≈ 10.4
logsumexp	`sum(exp(x_i))`	x ≈ 7.63
logcumsumexp	`cumsum(exp(x_i))`	x ≈ 11.09

FP16 casting pass does not guard against activation-level overflow (softplus, exp, logsumexp) #7

Description

Problem

Example: softplus

Affected Operations

The Compound Effect

Proposed Fix

Prior Art

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions