Skip to content

Deduplicate UCS splits using LetSplit and UseSplit#486

Open
chengluyu wants to merge 14 commits into
hkust-taco:hkmc2from
chengluyu:nu-let-split
Open

Deduplicate UCS splits using LetSplit and UseSplit#486
chengluyu wants to merge 14 commits into
hkust-taco:hkmc2from
chengluyu:nu-let-split

Conversation

@chengluyu
Copy link
Copy Markdown
Member

No description provided.

chengluyu and others added 12 commits April 12, 2026 18:48
# Conflicts:
#	hkmc2/shared/src/test/mlscript/block-staging/Functions.mls
#	hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls
#	hkmc2/shared/src/test/mlscript/ucs/general/LogicalConnectives.mls
#	hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls
In `normalizeImpl`, when the alternative references the same scrutinee
as the current branch, the alternative is duplicated and specialized
separately in `+` and `-` modes. When both specializations agree, the
two duplicates could in principle be shared via a single `LetSplit`.
That detection is not yet implemented; mark the bail-out site with a
TODO and add `SpecializedSplitSharing.mls` covering both:

- cases where the duplication is wasteful (disjoint sibling classes
  like `Cat()`/`Dog()` — both specializations agree, sharing would
  suffice);
- cases where the duplication is genuine (refining patterns like
  `S(0)`/`S(_)` — positive and negative specialization yield
  different splits, sharing would be incorrect).

The latter group guards against a future over-eager sharing rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `Split.Else` branch of the `++` extension was annotated as
impossible but silently discarded `those`. Replace the comment with a
`softAssert(false)` so that the invariant is enforced at runtime and
visible in stack traces if it ever breaks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`JoinPointCtx` tracks the pending set of `SplitSymbol`s awaiting a
`LetSplit` placement — a piece of state genuinely local to the walk.
Stashing the sharing threshold in the same case class forced a
plumbing trick where an inner `normalize` call had to reconstruct the
context with `JoinPointCtx.withThreshold` just to carry a config
option across an invocation boundary.

The threshold is the only `Config` field `normalize` needs, and the
caller already has `Config` in scope. Just add `Config` to the class
constructor's `using` clause so all methods can read it via the
global `config` helper, drop the parameter from `apply`, and simplify
`JoinPointCtx` back to the symbol set it was originally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@LPTK
Copy link
Copy Markdown
Contributor

LPTK commented May 10, 2026

Pls fix the conflicts.

# Conflicts:
#	hkmc2/shared/src/test/mlscript/block-staging/Functions.mls
#	hkmc2/shared/src/test/mlscript/deforest/basic.mls
#	hkmc2/shared/src/test/mlscript/deforest/determinism.mls
#	hkmc2/shared/src/test/mlscript/deforest/todos.mls
#	hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls
else if split.isFallback then
log(s"Case 1.1.3: $pattern is unrelated with $thatPattern")
rec(tail)
S(rec(tail).getOrElse(tail))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please respond to my original comment on this!

#457 (comment)


fun pred(x) = true

:sir
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's already too much IR output in this file. Isn't the :ucs output already sufficient for many of these test cases? Please see if you can drop some :sirs here.


// * Currently, dedup' is implemented by hash-consing after the fact, which doesn't work when bindings are involved;
// * it fails here:
// * The last split is `y is [b] then ...`. It was duplicated because it serves
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make sense. People reading it won't know that "was" refers to a previous state of the codebase!

Does this make more sense?

Suggested change
// * The last split is `y is [b] then ...`. It was duplicated because it serves
// * The last split is `y is [b] then ...`. It used to be duplicated because it serves

Comment on lines 15 to 16
// TODO: This should *not* be automatically deduplicated.
// * In fact, we should probably never try to deduplicate things that are already duplicated in the source.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// * We never try to deduplicate things that are already duplicated in the source.
// * (A previous version of the UCS was using hash-consing to do so, which was a bad idea.)

// === Join point cases ===


// * Join point: non-trivial Else alternative on different scrutinee
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't addressed any of my previous comments...

See #457 (comment) and the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants