Skip to content

[mypyc] Fix incremental compilation with separate flag (1/3)#21299

Open
VaggelisD wants to merge 3 commits intopython:masterfrom
VaggelisD:sqlglot-separate-fixes
Open

[mypyc] Fix incremental compilation with separate flag (1/3)#21299
VaggelisD wants to merge 3 commits intopython:masterfrom
VaggelisD:sqlglot-separate-fixes

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

@VaggelisD VaggelisD commented Apr 23, 2026

This PR contains the minimum changes required to unblock incremental compilation in SQLGlot; I've verified it runs cleanly and builds much faster (benched on macOS arm64, Python 3.14, MYPYC_OPT=0):

Scenario Monolithic separate=True (-j 11) Speedup
Clean build 110s 60s 1.8×
No-op rebuild 110s 1.4s 80×
1-file edit rebuild 110s 3.3s 33×

Note: The following bugs have been mostly fixed by LLMs. I've tried to understand and review the code, do clean up passes, verify that it works etc but overall the changes are still fairly complicated.

Background

Under separate=True each module gets its own group: one shared lib + shim .so. Cross-group calls go through an exports table (a struct of function pointers), published via PyCapsule and copied into a local exports_<group> at module init. Most of the fixes below are places where that indirection was missed, or where a fresh-build path assumed state that isn't there cross-group or when IR is loaded from cache.

Fixes

Codegen (most of the diff):

  1. Non-ext classes have no vtable, but is_method_final fell through to the vtable path and indexed into null. Short-circuit to True for non-ext classes.
  2. emit_method_call asserted get_method(name) is not None, which fails when the FuncIR body is in another group. Use method_decl(name) instead, the decl is enough for a direct call.
  3. About a dozen call sites hardcoded NATIVE_PREFIX + cname without going through get_group_prefix, so clang got undeclared identifier CPyDef_foo on cross-group calls. Added two helpers on Emitter and migrated every site. Also marked CPyPy_* wrapper decls needs_export=True so they reach the exports table.

Runtime:

  1. PyInit-time cross-group imports re-entered the enclosing package's __init__.py mid-bootstrap. Split the shared-lib's exec_ into a self-contained capsule-setup (at PyInit) plus a deferred ensure_deps_<group>() that the shim calls before module init. The shim now uses PyImport_ImportModuleLevel + PyObject_GetAttrString so nothing triggers a dotted-attribute walk.
  2. CPyImport_ImportFrom had a fallback that called PyObject_GetItem(module, fullname) where it meant PyImport_GetModule. Modules don't support subscription so the fallback always raised TypeError.

Incremental:

  1. compile_modules_to_ir now feeds freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map now tolerates mypy-synthetic TypeInfo entries (intersection types like <subclass of X and Y>) that
    have no matching ClassIR.

Tests

Three new cases in run-multimodule.test, each of which fails on master and passes with these fixes:

  • testSeparateCrossGroupEnumMethod covers fix 1
  • testSeparateCrossGroupGenerator covers fix 2
  • testSeparateCrossGroupInheritedInit covers fix 3

Fixes 4–6 are harder to reproduce in toy fixtures; The signal ultimately was SQLGlot going from "won't build" to passing cleanly.

Follow up PR n.2 - Unblocking mypy itself

Mypy triggers another three more bugs that sqlglot doesn't. With those fixed locally, incremental rebuild technically works; however, I was surprised to find that on MacOs it doesn't win against monolithic in practice (reason on PR n.3).

I think Linux and Windows should have SQLGlot-like results, though.

Follow up PR n.3 - Restoring the incremental speed win on macOS

After PR 2, separate=True technically works but isn't actually faster than monolithic in practice. Measured on mypy:

scenario monolithic separate
no-op rebuild 73s 102s
1-line edit 78s 101s

It should be a few seconds for the separate case. The culprit seems to be setuptools' copy_extensions_to_source: it unconditionally rewrites every extension .so on every build_ext invocation, which on macOS invalidates AMFI's code-signature cache for each file (relevant mypyc issue). The next import mypy then re-verifies 400+ files at ~100ms each, ~60s total. So, the rebuild is fast but the next Python invocation pays the cost.

My plan is to patch copy_extensions_to_source inside mypycify() to skip when size and mtime match, which preserves the destination inode and keeps AMFI's cache valid. With that:

scenario separate + patch
no-op rebuild 2s
1-line edit 4s

This is really a setuptools limitation, so happy to take it upstream instead if reviewers prefer.

VaggelisD and others added 3 commits April 23, 2026 15:00
…sqlglot)

This is the minimal set of fixes needed for `separate=True` to build and run
correctly against sqlglot, a ~100-module project with cross-group class
inheritance, generator helper classes, non-ext subclasses with fast methods,
and mutually-dependent compiled modules. Each of the fixes below is a real
bug that was never hit by mypy itself (mypy's setup.py uses multi_file on
Windows only, never separate=True) or by the toy fixtures in mypyc's
TestRunSeparate.

1. Non-extension classes never have vtables -- short-circuit is_method_final
   to True for them so codegen doesn't try to index into a vtable that
   compute_vtable skipped.

2. emit_method_call: under separate=True, a method's FuncIR body may live in
   another group while only its FuncDecl is visible here. Use method_decl(name)
   instead of get_method(name).decl -- the decl is enough to emit a direct C
   call. Split native_function_type to accept a decl too.

3. Cross-group native/Python-wrapper calls weren't routing through the
   exports-table indirection at a dozen sites in emitwrapper / emitfunc /
   emitclass. Added Emitter.native_function_call(decl) and
   Emitter.wrapper_function_call(decl) helpers and migrated all offending
   sites. Also made CPyPy_* wrapper declarations needs_export=True so those
   symbols reach the exports table.

4. Defer cross-group imports to shim load time. The shared lib's exec_
   function used to PyImport_ImportModule sibling groups at PyInit time,
   which re-enters the enclosing package's __init__.py mid-flight and blows
   up on partial-init attribute walks. Split exec_ into a self-contained
   capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>()
   (runs from the shim just before per-module init). Shim uses
   PyImport_ImportModuleLevel with a non-empty fromlist so the lookup
   returns the leaf directly via sys.modules, and fetches capsules via
   PyObject_GetAttrString instead of PyCapsule_Import (which itself performs
   the same dotted attribute walk).

5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried
   PyObject_GetItem(module, fullname) where it intended PyImport_GetModule
   (comment says as much). Modules don't implement __getitem__, so the
   fallback always raised TypeError. Also Py_XDECREF the potentially-NULL
   package_path in the error path.

6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now
   syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded
   SCCs can resolve cross-SCC references. load_type_map tolerates mypy's
   synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no
   corresponding mypyc ClassIR.

Also adds three regression tests targeted to fail on TestRunSeparate
without the fixes above:

- testSeparateCrossGroupEnumMethod exercises fix #1.
- testSeparateCrossGroupGenerator exercises fix #2.
- testSeparateCrossGroupInheritedInit exercises fix #3.
Two gaps in the preceding commit that only showed up on the py314t CI
matrix (free-threading Python + compiled-mypy test harness):

1. The ensure_deps capsule call was only added to module_shim.tmpl,
   not to module_shim_no_gil_multiphase.tmpl. Free-threaded builds use
   the latter template, so cross-group exports tables were never
   populated on py314t and consumers called through NULL function
   pointers on the first cross-group invocation. Mirroring the
   non-GIL-disabled template fixes it.

2. The `_free_instance` slot (free-list for per-class fast allocation)
   was declared with needs_export=True, which puts it into the
   exports-table struct. Under Py_GIL_DISABLED, CPyThreadLocal
   expands to __thread, which can't legally appear inside a struct
   field (clang: "type name does not allow storage class to be
   specified"). The slot is only read/written by the class's own
   setup/dealloc code inside the defining group -- no cross-group
   access is needed -- so dropping needs_export is the right fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant