Skip to content

FP16 casting pass does not guard against activation-level overflow (softplus, exp, logsumexp) #7

Description

@Ashutosh0x

Problem

The FP16 casting pass in casting.py correctly guards against static tensor overflow (weights/constants > FP16_MAX = 65504), but does not account for activation-level overflow — where operations like exp(x) produce intermediate values > 65504 at runtime even though the inputs are within fp16 range.

Example: softplus

A model with a Softplus() activation layer:

  1. cast_fp32_to_fp16() successfully converts all weights and inputs to fp16 ✅
  2. The converter decomposes softplus(x) = log(1 + exp(x))
  3. At runtime on Apple Neural Engine, when x > 10.4:
    • exp(10.4) ≈ 32,900 (fits fp16)
    • exp(11.0) ≈ 59,874 (barely fits fp16)
    • exp(11.1) ≈ 66,686 → OVERFLOW → output collapses to 0

The current check_tensor_overflow_fp16() and handle_overflow_op() logic only checks scalar/tensor values, not whether the computation graph will produce intermediate overflows.

Affected Operations

Operation Naive Form fp16 Overflow Threshold
softplus exp(x) x ≈ 10.4
logsumexp sum(exp(x_i)) x ≈ 7.63
logcumsumexp cumsum(exp(x_i)) x ≈ 11.09

The Compound Effect

When coreai-optimization applies weight compression (palettization, quantization) AND fp16 casting together:

  1. Quantization introduces rounding errors in weights
  2. These errors can shift activation distributions
  3. Values that were safely below the overflow threshold may now exceed it
  4. The casting pass has no mechanism to detect or prevent this

Proposed Fix

Add an activation_overflow_audit pass that:

  1. Identifies ops in the graph whose intermediates can overflow fp16 (exp, log(1+exp(...)))
  2. Flags them for stable decomposition or fp32 accumulation
  3. Integrates with the existing handle_overflow_op / handle_non_overflow_op classification

Prior Art

  • apple/coremltools PRs #2725, #2726, #2727 fix the converter-level decomposition
  • apple/coreai-torch PR #22 adds stable converters for softplus/mish/logsumexp
  • This issue addresses the optimization layer, where the interaction between quantization and fp16 creates compound failures

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions