Cache flatc binary and schema extraction to fix 3x fbpkg export slowdown#19104
Cache flatc binary and schema extraction to fix 3x fbpkg export slowdown#19104navsud wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19104
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (3 Unrelated Failures)As of commit ad040eb with merge base c3f3d12 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@navsud has exported this pull request. If you are a Meta employee, you can view the originating Diff in D102214303. |
Summary: When running from a standalone PAR file (e.g. via fbpkg), the `flatc` binary used for XNNPACK flatbuffer serialization is extracted from the PAR zip archive on every invocation via `importlib.resources.as_file()`. For one of llama transformer XNNPack exports, this happened ~225 times (once per XNNPACK partition), adding ~8.5 seconds per extraction from the 3.4 GB PAR archive — a total of **~32 minutes** of pure I/O overhead. ## Changes ### `executorch/exir/_serialize/_flatbuffer.py` - Added `_get_flatc_path()` which caches the extracted `flatc` binary path using a module-level `contextlib.ExitStack`. The `ExitStack` keeps the `importlib.resources.as_file()` context manager alive for the process lifetime, preventing the temp file from being cleaned up between calls. - Simplified `_run_flatc()` to use the cached path directly. ### `executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py` - Added `_cached_schema_bytes` module-level cache for `schema.fbs` content, avoiding repeated `importlib.resources.read_bytes()` calls from the PAR archive. Differential Revision: D102214303
fc68795 to
ad040eb
Compare
|
@claude Review this PR |
|
Claude finished @mergennachin's task in 1m 17s —— View job Review of PR #19104: Cache flatc binary and schema extraction
SummaryThis PR addresses a real and significant performance problem: extracting the
Issues1. Thread safety —
|
Summary:
When running from a standalone PAR file (e.g. via fbpkg), the
flatcbinaryused for XNNPACK flatbuffer serialization is extracted from the PAR zip archive
on every invocation via
importlib.resources.as_file(). For one of llamatransformer XNNPack exports, this happened ~225 times (once per XNNPACK
partition), adding ~8.5 seconds per extraction from the 3.4 GB PAR archive
— a total of ~32 minutes of pure I/O overhead.
Changes
executorch/exir/_serialize/_flatbuffer.py_get_flatc_path()which caches the extractedflatcbinary pathusing a module-level
contextlib.ExitStack. TheExitStackkeeps theimportlib.resources.as_file()context manager alive for the processlifetime, preventing the temp file from being cleaned up between calls.
_run_flatc()to use the cached path directly.executorch/backends/xnnpack/serialization/xnnpack_graph_serialize.py_cached_schema_bytesmodule-level cache forschema.fbscontent,avoiding repeated
importlib.resources.read_bytes()calls from the PARarchive.
Differential Revision: D102214303