googleapis · hebaalazzeh · May 30, 2026 · Jun 4, 2026 · Jun 9, 2026
@@ -0,0 +1,136 @@
+# Python SDK Import Profiler: Documentation & Breakdown
+
+This document provides a comprehensive guide to the `import_profiler` scripts, directory files, and how to analyze the generated import trace logs to target optimization areas, along with the details of the implemented lazy-loading refactor.
+
+---
+
+## 1. File Guide & Directory Structure
+The profiling tool is located in the [scripts/import_profiler/](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/) directory:
+
+* **[profiler.py](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/profiler.py)**: The core executable script. It is designed as a single-file, self-spawning harness that performs process-isolated importing benchmarks and generates trace logs.
+* **[plan.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/plan.md)**: The current project phases and roadmap checklist.
+* **[status.md](file:///usr/local/google/home/hebaalazzeh/git/google-cloud-python/scripts/import_profiler/status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.
+
+---
+
+## 2. Profiler Mechanism (`profiler.py`)
+To benchmark import times accurately, we must bypass Python's dynamic module cache:
+
+* **The Cache Problem:** When Python imports a module (e.g. `import os`), it registers it in `sys.modules`. Any subsequent import statement for the same module instantly retrieves it from `sys.modules` in `<0.1 ms`. 
+* **Process Isolation:** The script uses a **Master/Worker Architecture**. 
+  * The **Master** process loops $N$ times.
+  * In each iteration, it launches a new interpreter subprocess using `sys.executable` pointing to itself with the `--worker` flag.
+  * The **Worker** process executes a fresh, isolated import, records memory/time metrics natively, dumps them as a JSON string to stdout, and exits.
+  * This guarantees a 100% "cold start" environment for every single import measurement.
+* **Natively Captured Metrics:**
+  * **Peak Memory (RAM):** Tracked using `tracemalloc` to capture peak memory block allocation.
+  * **Load Time:** Tracked using `time.perf_counter()` to get high-resolution timing.
+
+---
+
+## 3. How to Interpret Python `-X importtime` Trace Logs
+When running with the `--trace` flag, the script captures the raw stderr trace produced by Python's `-X importtime` option. The trace looks like this:
+
+```
+import time: self [us] | cumulative | imported package
+import time:       536 |        536 |   _io
+import time:      1077 |       2385 | _frozen_importlib_external
+import time:    773659 |     793010 |                   google.cloud.compute_v1.types.compute
+```
+
+### Explaining the Fields:
+1. **`self [us]` (Microseconds):** The time spent importing the module itself, *excluding* any time spent importing its child dependencies.
+2. **`cumulative` (Microseconds):** The total time spent loading the module *including* all nested imports. This represents the total wait time introduced by this line.
+3. **Hierarchy Indentation:** Indented packages are sub-imports triggered by the parent module. A package with higher indentation is loaded deeper in the call stack.
+
+---
+
+## 4. Breakdown & Interpretation of Generated Traces
+
+### A. Compute Client Trace
+* **Total Import Latency:** ~10.4 seconds.
+* **Primary Bottleneck - Massive Types File:**
+  * `google.cloud.compute_v1.types.compute` takes **`773.6 ms` self time** (almost a full second on a single file compilation/execution).
+* **The Boilerplate Cascade:**
+  * Since compute contains over 100 submodules (`addresses`, `instances`, `firewalls`, `networks`, etc.), eager importing of `google.cloud.compute` imports *every single service client and transport*.
+  * Each service client executes a chain of duplicate helper modules (transport configurations, pagination, options, exceptions, and `urllib3`/`requests` dependencies), aggregating to a 10-second delay.
+
+### B. AI Platform Client Trace
+* **Total Import Latency:** ~13.3 seconds.
+* **Primary Bottleneck - Massive Types Module:**
+  * `google.cloud.aiplatform_v1.types` triggers a chain of individual proto type files, loading hundreds of model and job definitions which aggregates to **`377 ms` cumulative time**.
+* **Heavy Service Clients:**
+  * `data_foundry_service` alone introduces **`403 ms` cumulative latency**.
+  * `dataset_service` adds further transport and client initialization overheads.
+
+---
+
+## 5. Lazy-Loading Design and Implementation (PEP 562)
+
+To solve eager initialization cost, we implemented a lazy-loading layer using Python PEP 562's module-level `__getattr__` and `__dir__` definitions.
+
+```
+                  Eager Import (Before)             Lazy Import (After)
+                     [10s - 13s]                         [~6ms - 279ms]
+                ┌─────────────────────┐             ┌─────────────────────┐
+                │   import package    │             │   import package    │
+                └──────────┬──────────┘             └──────────┬──────────┘
+                           │                                   │ (Instant)
+              ┌────────────┴────────────┐                      ▼
+              │ Eagerly import all      │           ┌─────────────────────┐
+              │ submodules, types, LROs │           │ Return module proxy │
+              └─────────────────────────┘           └──────────┬──────────┘
+                                                               │
+                                                               │ (Only when class
+                                                               │  is accessed)
+                                                               ▼
+                                                    ┌─────────────────────┐
+                                                    │ Dynamically resolve │
+                                                    │ via __getattr__()   │
+                                                    └─────────────────────┘
+```
+
+### Key Solution Steps:
+1. **Module Level `__getattr__`**: Overrode the attributes lookup of `__init__.py` modules so they intercept class name lookups (e.g. `compute.InstancesClient`).
+2. **Registry Mapping (`_lazy_registry`)**: Maps each class name to its exact relative or absolute submodule file path.
+3. **Dynamic Compilation**: Uses `importlib.import_module` to import the targeted submodule and load the requested class dynamically, caching it in `globals()` to make subsequent accesses execute in `<0.002 ms`.
+4. **Deferred Monolith Imports**: Deferred the loading of massive type definitions (`google.cloud.compute_v1.types.compute` or `compat.types`) so they are compiled ONLY when a user actively instantiates a request/response type object.
+
+---
+
+## 6. Circular Import Resolution
+
+Transitioning to lazy-loading disrupted Python's natural eager import ordering, revealing a deep circular dependency loop:
+`base.py` ➔ `utils/__init__.py` ➔ `initializer.py` ➔ `metadata.py` ➔ `pipeline_jobs.py` ➔ `models.py` ➔ `jobs.py` ➔ `_publisher_models.py` ➔ `base.py`.
+
+* **The Problem:** The compiler was trying to instantiate class `_PublisherModel(base.VertexAiResourceNoun)` before class `base.VertexAiResourceNoun` was defined because `base.py` was stuck waiting for `initializer.py` to compile.
+* **The Resolution:** We analyzed the usage of `initializer` in `base.py` and `utils/__init__.py` and realized it was never called at the class definition or module-level scope. We commented out the top-level imports of `initializer` in both files and moved them **locally inside the methods** (`_submit`, `__init__`, `_instantiate_client`, `_list`, `full_resource_name`). This successfully broke the compilation cycle.
+
+---
+
+## 7. Performance Gains Summary
+
+Running `profiler.py` on the lazy-loaded implementation demonstrates the following savings:
+
+### google-cloud-compute:
+* **Eager Import Time**: Reduced from **`10,356.92 ms`** to **`6.28 ms`** (**99.9% savings**).
+* **Peak RAM Usage**: Reduced from **`96.29 MB`** to **`0.28 MB`** (**99.7% savings**).
+
+### google-cloud-aiplatform:
+* **Eager Import Time**: Reduced from **`13,342.05 ms`** to **`279.16 ms`** (**97.9% savings**).
+* **Peak RAM Usage**: Reduced from **`156.56 MB`** to **`4.88 MB`** (**96.9% savings**).
+
+---
+
+## 8. Execution Reference
+Ensure you are in the correct pyenv virtual environment where packages are installed in editable mode:
+
+```bash
+# 1. Run the profiler to get baseline/optimized outcomes (e.g. 5 iterations)
+PYENV_VERSION=py312 python profiler.py --module=google.cloud.compute --iterations=5
+PYENV_VERSION=py312 python profiler.py --module=google.cloud.aiplatform --iterations=5
+
+# 2. Run the profiler to generate trace logs
+PYENV_VERSION=py312 python profiler.py --module=google.cloud.compute --trace
+PYENV_VERSION=py312 python profiler.py --module=google.cloud.aiplatform --trace
+```