[mypyc] Fix incremental compilation with separate flag (1/3)#21299
Open
VaggelisD wants to merge 3 commits intopython:masterfrom
Open
[mypyc] Fix incremental compilation with separate flag (1/3)#21299VaggelisD wants to merge 3 commits intopython:masterfrom
separate flag (1/3)#21299VaggelisD wants to merge 3 commits intopython:masterfrom
Conversation
…sqlglot) This is the minimal set of fixes needed for `separate=True` to build and run correctly against sqlglot, a ~100-module project with cross-group class inheritance, generator helper classes, non-ext subclasses with fast methods, and mutually-dependent compiled modules. Each of the fixes below is a real bug that was never hit by mypy itself (mypy's setup.py uses multi_file on Windows only, never separate=True) or by the toy fixtures in mypyc's TestRunSeparate. 1. Non-extension classes never have vtables -- short-circuit is_method_final to True for them so codegen doesn't try to index into a vtable that compute_vtable skipped. 2. emit_method_call: under separate=True, a method's FuncIR body may live in another group while only its FuncDecl is visible here. Use method_decl(name) instead of get_method(name).decl -- the decl is enough to emit a direct C call. Split native_function_type to accept a decl too. 3. Cross-group native/Python-wrapper calls weren't routing through the exports-table indirection at a dozen sites in emitwrapper / emitfunc / emitclass. Added Emitter.native_function_call(decl) and Emitter.wrapper_function_call(decl) helpers and migrated all offending sites. Also made CPyPy_* wrapper declarations needs_export=True so those symbols reach the exports table. 4. Defer cross-group imports to shim load time. The shared lib's exec_ function used to PyImport_ImportModule sibling groups at PyInit time, which re-enters the enclosing package's __init__.py mid-flight and blows up on partial-init attribute walks. Split exec_ into a self-contained capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>() (runs from the shim just before per-module init). Shim uses PyImport_ImportModuleLevel with a non-empty fromlist so the lookup returns the leaf directly via sys.modules, and fetches capsules via PyObject_GetAttrString instead of PyCapsule_Import (which itself performs the same dotted attribute walk). 5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried PyObject_GetItem(module, fullname) where it intended PyImport_GetModule (comment says as much). Modules don't implement __getitem__, so the fallback always raised TypeError. Also Py_XDECREF the potentially-NULL package_path in the error path. 6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded SCCs can resolve cross-SCC references. load_type_map tolerates mypy's synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no corresponding mypyc ClassIR. Also adds three regression tests targeted to fail on TestRunSeparate without the fixes above: - testSeparateCrossGroupEnumMethod exercises fix #1. - testSeparateCrossGroupGenerator exercises fix #2. - testSeparateCrossGroupInheritedInit exercises fix #3.
for more information, see https://pre-commit.ci
Two gaps in the preceding commit that only showed up on the py314t CI matrix (free-threading Python + compiled-mypy test harness): 1. The ensure_deps capsule call was only added to module_shim.tmpl, not to module_shim_no_gil_multiphase.tmpl. Free-threaded builds use the latter template, so cross-group exports tables were never populated on py314t and consumers called through NULL function pointers on the first cross-group invocation. Mirroring the non-GIL-disabled template fixes it. 2. The `_free_instance` slot (free-list for per-class fast allocation) was declared with needs_export=True, which puts it into the exports-table struct. Under Py_GIL_DISABLED, CPyThreadLocal expands to __thread, which can't legally appear inside a struct field (clang: "type name does not allow storage class to be specified"). The slot is only read/written by the class's own setup/dealloc code inside the defining group -- no cross-group access is needed -- so dropping needs_export is the right fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the minimum changes required to unblock incremental compilation in SQLGlot; I've verified it runs cleanly and builds much faster (benched on macOS arm64, Python 3.14,
MYPYC_OPT=0):separate=True(-j 11)Note: The following bugs have been mostly fixed by LLMs. I've tried to understand and review the code, do clean up passes, verify that it works etc but overall the changes are still fairly complicated.
Background
Under
separate=Trueeach module gets its own group: one shared lib + shim.so. Cross-group calls go through an exports table (a struct of function pointers), published viaPyCapsuleand copied into a localexports_<group>at module init. Most of the fixes below are places where that indirection was missed, or where a fresh-build path assumed state that isn't there cross-group or when IR is loaded from cache.Fixes
Codegen (most of the diff):
is_method_finalfell through to the vtable path and indexed into null. Short-circuit toTruefor non-ext classes.emit_method_callassertedget_method(name) is not None, which fails when theFuncIRbody is in another group. Usemethod_decl(name)instead, the decl is enough for a direct call.NATIVE_PREFIX + cnamewithout going throughget_group_prefix, so clang gotundeclared identifier CPyDef_fooon cross-group calls. Added two helpers onEmitterand migrated every site. Also markedCPyPy_*wrapper declsneeds_export=Trueso they reach the exports table.Runtime:
__init__.pymid-bootstrap. Split the shared-lib'sexec_into a self-contained capsule-setup (at PyInit) plus a deferredensure_deps_<group>()that the shim calls before module init. The shim now usesPyImport_ImportModuleLevel+PyObject_GetAttrStringso nothing triggers a dotted-attribute walk.CPyImport_ImportFromhad a fallback that calledPyObject_GetItem(module, fullname)where it meantPyImport_GetModule. Modules don't support subscription so the fallback always raisedTypeError.Incremental:
compile_modules_to_irnow feeds freshly builtClassIR/FuncIRintodeser_ctxso later cache-loaded SCCs can resolve cross-SCC references.load_type_mapnow tolerates mypy-syntheticTypeInfoentries (intersection types like<subclass of X and Y>) thathave no matching
ClassIR.Tests
Three new cases in
run-multimodule.test, each of which fails on master and passes with these fixes:testSeparateCrossGroupEnumMethodcovers fix 1testSeparateCrossGroupGeneratorcovers fix 2testSeparateCrossGroupInheritedInitcovers fix 3Fixes 4–6 are harder to reproduce in toy fixtures; The signal ultimately was SQLGlot going from "won't build" to passing cleanly.
Follow up PR n.2 - Unblocking mypy itself
Mypy triggers another three more bugs that sqlglot doesn't. With those fixed locally, incremental rebuild technically works; however, I was surprised to find that on MacOs it doesn't win against monolithic in practice (reason on PR n.3).
I think Linux and Windows should have SQLGlot-like results, though.
Follow up PR n.3 - Restoring the incremental speed win on macOS
After PR 2,
separate=Truetechnically works but isn't actually faster than monolithic in practice. Measured on mypy:It should be a few seconds for the separate case. The culprit seems to be setuptools'
copy_extensions_to_source: it unconditionally rewrites every extension.soon everybuild_extinvocation, which on macOS invalidates AMFI's code-signature cache for each file (relevant mypyc issue). The nextimport mypythen re-verifies 400+ files at ~100ms each, ~60s total. So, the rebuild is fast but the next Python invocation pays the cost.My plan is to patch
copy_extensions_to_sourceinsidemypycify()to skip when size and mtime match, which preserves the destination inode and keeps AMFI's cache valid. With that:This is really a setuptools limitation, so happy to take it upstream instead if reviewers prefer.