Skip to content

DAG-DSL StableHLO export declares result types from stale operand shapes — reshape/matmul/concat/reduce_window produce IREE-invalid modules (post-0.28.0) #673

@michalharakal

Description

@michalharakal

Summary

0.28.0 fixed #663/#666/#667/#668 for the cases their unit tests cover, but the sk.ainet.lang.dag DSL export path still produces IREE-invalid StableHLO for shape-changing ops. The skainet-iree-conformance harness (which builds every op module via the real dag { … }.toComputeGraph() path and then runs iree-compile) shows that on 0.28.0 from Maven Central:

The common root cause: shape-changing ops declare their result/return type from a stale operand-derived spec instead of the op's inferred output shape. reshape additionally loses its target shape entirely on the DAG path. This is upstream of the converter — ShapeOperationsConverter (fixed in 0.28.0 / PR #670) never receives a usable output spec or shape parameter, so its fix can't engage.

Failing test (committed)

Branch test/dag-shape-export-conformanceskainet-compile-hlo/src/jvmTest/.../DagShapeExportConformanceTest.kt. 3 tests, all RED on develop/0.28.0, exercising the real DSL path exactly as the harness does:

reshape (1,4)->(2,2)        : target shape lost -> empty module (no stablehlo.reshape)
matmul  (1,4)x(4,3)         : declares -> tensor<1x4xf32>, true result is 1x3
concat  [(1,4),(1,4)] dim 1 : op types 1x8 but function return stays 1x4

Note: the existing ReshapeConcatShapeFixTest (added in #670) passes because it builds synthetic GraphNodes with a populated outputShape. The real dag{} DSL never populates it — which is exactly the gap this test catches. The synthetic test gave false confidence.

Concrete evidence (emitted MLIR on 0.28.0)

op_reshape.mlir — empty:

func.func @op_reshape(%arg0: tensor<1x4xf32>) -> (tensor<?xf32>) {
  // Conversion failed for node n0_reshape: Reshape operation requires a target shape specification
  // Missing shape parameter for reshape node n0_reshape
  return
}

op_matmul.mlirdot_general result is 1x3, declared 1x4:

%v0 = stablehlo.dot_general %arg0, %arg1, contracting_dims = [1] x [0]
    : (tensor<1x4xf32>, tensor<4x3xf32>) -> tensor<1x4xf32>   // <-- should be 1x3
return %v0 : tensor<1x4xf32>
error: inferred shape '[1, 3]' is incompatible with return type of operation 'tensor<1x4xf32>'

op_concat.mlir — op types 1x8, return stays 1x4:

%v0 = stablehlo.concatenate %arg0, %arg1, dim = 1 : (tensor<1x4xf32>, tensor<1x4xf32>) -> tensor<1x8xf32>
return %v0 : tensor<1x4xf32>   // <-- mismatch

Conformance matrix on 0.28.0 (llvm-cpu, IREE 3.7.0)

model export compile first error
grayscale — (self-contained vmfb)
tiny-mlp inferred shape '[1, 8]' is incompatible with return type
whisper inferred shape '[4, 1536]' is incompatible with return type
mnist-cnn stablehlo.reduce_window has no custom assembly form
yolo %v26 expects different type than prior uses (concat)
leaf-embed inferred shape '[1, 1, 4, 256]' is incompatible with return type
gemma3-260m inferred shape '[1, 1, 4, 256]' is incompatible with return type

Op micro-suite: 20/27 compile; failing: reshape, matmul, concat, conv1d, gather, maxpool2d, avgpool2d.

Root cause (direction)

  1. Result-type inference: reshape, dot_general (matmul), concatenate (and the function return) declare their type from a stale output TensorSpec that echoes operand-0, not from the op's computed output. dag{}.toComputeGraph() should attach the inferred output TensorSpec to each shape-changing node (and the graph's output node), so the converter and the func.func return type agree with the value.
  2. reshape target lost: the DAG reshape(x, Shape) node reaches the converter with no output spec and no shape/newShape/outputShape parameter — the target Shape argument isn't recorded anywhere the converter (or ComputeGraphExecutor, which reads "shape"/"newShape") can see it.
  3. reduce_window emission: pooling lowers to stablehlo.reduce_window in a form IREE's parser rejects (has no custom assembly form) — needs the generic/region MLIR form, like the gather/slice emission fixes in 0.27.0.

Acceptance

  • DagShapeExportConformanceTest (3 tests) goes green.
  • The harness op micro-suite reaches 27/27 compile, and tiny-mlp/whisper/leaf-embed/mnist-cnn/yolo/gemma3-260m compile to a vmfb on llvm-cpu (or fail only on a genuinely-missing op like RoPE/upsample, not a shape/type mismatch).

Found via SKaiNET-developers/skainet-iree-conformance on SKaiNET 0.28.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions