Make crossbuilds faster#129791
Conversation
The reason crossbuilds are slower is because we need to build cross-targeting tools: when building arm64 from an x64 machine, we need x64 hosted crossgen2 in addition to the standard arm64 hosted one to R2R compile CoreLib on the build machine. However, the x64-hosted crossgen is limited in purpose - it only needs to be able to build for the target architecture. We were currently building all the JIT flavors so that this native-hosted crossgen can target _anything_, same as the target-hosted one. However, we don't ship the native-hosted crossgen and we only need it to target the target architecture. This updates the build so that the host architecture crossgen only has one codegen: the one for the target. This should shave good 5 minutes off the CI time.
…d jobs (dotnet#129553)" This reverts commit 95aa263.
|
Tagging subscribers to this area: @dotnet/runtime-infrastructure |
|
@JulieLeeMSFT @hoyosjs @chcosta I'm doing this so that we can revert the 75-minute floor change on ARM64 Windows (PR includes that revert). The change doesn't work and enforces a 75 minute floor on everything and has been breaking native AOT outerloops (that have 300-minute timeouts) ever since it merged (see https://dev.azure.com/dnceng-public/public/_build?definitionId=265&_a=summary). I believe the problem is that the comparisons in #129553 are done with strings, not with numbers. Instead of figuring out yaml, we should just make builds faster. |
There was a problem hiding this comment.
Pull request overview
This PR adjusts CoreCLR AOT tool build inputs so that the host-architecture crossgen2/ILC used during cross-builds only requires (and packages) a single target-specific JIT, instead of building/copying all JIT flavors. The intent is to reduce cross-build wall-clock time by trimming unnecessary native build work and content copying.
Changes:
- Centralize crossgen2/ILC JIT/jitinterface copy logic into
AotCompilerCommon.props, and switch cross-hosted packaging to a single-JIT copy/rename scheme. - Update cross-build subset selection to build
ClrJitSubset(single JIT component) instead ofClrAllJitsSubsetfor cross tools. - Simplify
global-build-job.ymltimeout handling (but this likely reintroduces Windows ARM64 job timeouts for callers that don’t overridetimeoutInMinutes).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/tools/aot/ILCompiler/ILCompiler.props | Removes per-project JIT copy logic (now handled by shared AOT props). |
| src/coreclr/tools/aot/crossgen2/crossgen2.props | Removes per-project JIT copy logic (now handled by shared AOT props). |
| src/coreclr/tools/aot/AotCompilerCommon.props | Adds shared logic to copy jitinterface + either all matching JITs (non-cross) or a single renamed JIT (cross-host). |
| eng/Subsets.props | Changes cross-tool dependency subset from ClrAllJitsSubset to ClrJitSubset. |
| eng/pipelines/common/global-build-job.yml | Removes Windows ARM64 timeout floor logic; risks regressions for templates that don’t pass a longer timeout. |
|
@MichalStrehovsky I know it can create stop energy, but I think we need to run this through a VMR build to make sure packaging is not impacted. The live + n-1 usage is quirky and it can take a full VMR cycle to get things back on track should we get it wrong. |
I want to see native AOT testing working again, I don't mind the mechanism. If someone else wants to try the "make yaml compare numbers" angle or revert the change that broke everything, that works for me too, but native AOT outeloops broken are blocking my work. |
I think the change is worthwhile. I kicked off https://dev.azure.com/dnceng/internal/_build/results?buildId=3007111 |
The justification is the same as the comment above for alljits.
|
@jtschuster looks like ILCompiler.ReadyToRun.Tests.csproj took a dependency on crossgen2_inbuild being able to target any platform (not just the platform we're building), so I had to put workarounds in ba3edd9 and a2b9874. I only now realized ILCompiler.ReadyToRun.Tests.csproj is compiling for X targets using only a single CoreLib/Framework (i.e. CoreLib can be x64 Linux, and JIT is generating WASM). This is going to bite us. The CoreLib-VM contract is different per platform. x64 CoreLib will not have all the helpers that e.g. arm32 codegen needs or vice versa. The current setup is problematic and at some point it is going to block someone's feature work. The CoreLib platform needs to match the JIT platform, I don't think clr.toolstests is the right place for this kind of testing, this should be in src/tests and run with appropriate JIT/corelib combinations. |
| <SingleJitLibraryName>$(_LibPrefix)clrjit_$(_TargetOSForJitLibraryName)_$(_TargetArchitectureForJitLibraryName)_$(TargetArchitectureForSharedLibraries)$(_LibSuffix)</SingleJitLibraryName> | ||
| <SingleJitLibrarySourceName>$(_LibPrefix)clrjit$(_LibSuffix)</SingleJitLibrarySourceName> | ||
| <SingleJitLibrarySourceName Condition="'$(_TargetOSForJitLibraryName)' == 'universal' and '$(TargetOS)' != 'windows'">$(SingleJitLibraryName)</SingleJitLibrarySourceName> | ||
| <CopyAllJitLibrariesToAotCompilerOutput Condition="'$(CopyAllJitLibrariesToAotCompilerOutput)' == '' and '$(CrossHostArch)' == ''">true</CopyAllJitLibrariesToAotCompilerOutput> |
There were additional commits since I ran a VMR build so this problem may be fixed. The packaging failure in win-arm64 highlights the need to make sure we're good there before merging.
I'm happy to keep helping to facilitate as it's not exactly an exciting task. |
Thank you! Yep, cc6adf9 is the fix for that one. |
I noticed the string/int comparison error and was working on it. #129814 |
|
@dotnet/jit-contrib could someone have a look? If the theory that the recent arm64 legs timeouts are caused by the addition of WASM jit is true, then this should put us more consistently under the 1 hour timeout. Previously, cross legs would build: clrjit_universal_arm64_arm64.dll With this change, we only build: clrjit.dll And rename it to whatever is the right name. Ideally, cross legs should build a correctly named cross jit, but renaming works too... I see the arm64 leg build is now shorter than other legs I looked at, but CI is not a stable benchmarking environment. |
The reason crossbuilds are slower is because we need to build cross-targeting tools: when building arm64 from an x64 machine, we need x64 hosted crossgen2 in addition to the standard arm64 hosted one to R2R compile CoreLib on the build machine.
However, the x64-hosted crossgen is limited in purpose - it only needs to be able to build for the target architecture. We were currently building all the JIT flavors so that this native-hosted crossgen can target anything, same as the target-hosted one. However, we don't ship the native-hosted crossgen and we only need it to target the target architecture.
This updates the build so that the host architecture crossgen only has one codegen: the one for the target. This should shave good 5 minutes off the CI time.