Skip to content

Commit d8a0014

Browse files
authored
feat/python: upgrade to codeanalyzer-python 0.3.0, remove CodeQL (1.4.0) (#186)
* feat(python)!: upgrade to codeanalyzer-python 0.3.0 and remove CodeQL codeanalyzer-python 0.3.0 drops CodeQL in favor of PyCG for call-graph construction and removed the `using_codeql` option from AnalysisOptions, which broke CLDK's in-process Python backend (TypeError on every analysis). Upgrade the pin 0.2.0 -> 0.3.0 and remove CodeQL from CLDK entirely: - Drop the `use_codeql` knob from the public surface: PyCodeAnalyzerConfig, the deprecated CLDK(language).analysis(...) shim, the PyCodeanalyzer constructor, and the facade forwarding. Stop passing using_codeql to AnalysisOptions. - Remove the CodeQLDatabaseBuildException / CodeQLQueryExecutionException exception classes and their re-exports. - Scrub CodeQL from docstrings, the README, and the _jdk.py loader comments; describe the Python backend as Jedi + PyCG. - Drop the now-obsolete use_codeql forwarding test; fix the Neo4j parity fixture to not pass using_codeql. BREAKING CHANGE: removes the public `use_codeql` option and the CodeQL exception classes. Call-graph results may differ (PyCG vs CodeQL-augmented Jedi). Closes #185 * chore(release): 1.4.0 codeanalyzer-python 0.3.0 upgrade and CodeQL removal (#185).
1 parent bfdcee5 commit d8a0014

13 files changed

Lines changed: 83 additions & 147 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [v1.4.0] - 2026-06-27
11+
12+
### Changed
13+
- **Upgraded `codeanalyzer-python` 0.2.0 → 0.3.0**, which drops CodeQL and uses **PyCG** for
14+
call-graph construction.
15+
16+
### Removed
17+
- **`use_codeql` (BREAKING).** Because codeanalyzer-python 0.3.0 removed CodeQL, the `use_codeql`
18+
knob no longer maps to anything and is removed from CLDK's public surface: the
19+
`PyCodeAnalyzerConfig.use_codeql` field, the deprecated `CLDK(language).analysis(use_codeql=...)`
20+
parameter, and the `PyCodeanalyzer(use_codeql=...)` argument. The `CodeQLDatabaseBuildException`
21+
and `CodeQLQueryExecutionException` exception classes are removed as well. Call-graph results may
22+
differ (PyCG vs CodeQL-augmented Jedi). See #185.
23+
1024
## [v1.3.0] - 2026-06-27
1125

1226
### Added

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
**A unified, multilingual program-analysis SDK for Code LLMs.** CLDK turns raw source code into structured, LLM-ready program facts — symbol tables, call graphs, type hierarchies, and more — behind a single Python API, so you can build analysis-augmented LLM pipelines without wrangling a different static-analysis tool for every language.
2727

28-
Under the hood, CLDK orchestrates mature analysis engines (WALA, Tree-sitter, Jedi, CodeQL, ts-morph) and normalizes their output into consistent, typed [Pydantic](https://docs.pydantic.dev/) models. You get the same ergonomic interface whether you are analyzing Java, Python, or TypeScript.
28+
Under the hood, CLDK orchestrates mature analysis engines (WALA, Tree-sitter, Jedi, PyCG, ts-morph) and normalizes their output into consistent, typed [Pydantic](https://docs.pydantic.dev/) models. You get the same ergonomic interface whether you are analyzing Java, Python, or TypeScript.
2929

3030
CLDK is:
3131

@@ -125,12 +125,12 @@ Each language is analyzed by a dedicated `codeanalyzer-*` engine; CLDK normalize
125125
| Language | Analysis engine | What it provides |
126126
| --- | --- | --- |
127127
| **Java** | [`codeanalyzer-java`](https://github.com/codellm-devkit/codeanalyzer-java) | WALA + JavaParser. Bytecode-level call graphs, type hierarchies, symbol resolution, CRUD-operation and entry-point detection. Optional read-only **Neo4j** graph backend. |
128-
| **Python** | [`codeanalyzer-python`](https://github.com/codellm-devkit/codeanalyzer-python) | Jedi with optional CodeQL augmentation. Symbol tables, call graphs, and class/method resolution. Optional read-only **Neo4j** graph backend. |
128+
| **Python** | [`codeanalyzer-python`](https://github.com/codellm-devkit/codeanalyzer-python) | Jedi with PyCG-based call graphs. Symbol tables, call graphs, and class/method resolution. Optional read-only **Neo4j** graph backend. |
129129
| **TypeScript / JavaScript** | [`codeanalyzer-typescript`](https://github.com/codellm-devkit/codeanalyzer-typescript) | ts-morph with Jelly-based call graphs. Symbols, call graph, types, decorators, and call sites. Optional read-only **Neo4j** graph backend. |
130130

131131
The backend is selected by the **type** of the `backend=` config you pass to a factory: the in-process analyzer (default) or a `Neo4jConnectionConfig` for the read-only graph backend.
132132

133-
> **Analysis cache (Python):** caching is owned by `codeanalyzer-python` — the backend virtualenv, CodeQL database, and analysis cache live under `cache_dir` (default `<project>/.codeanalyzer`). CodeQL is on by default, so the first run is slow (it provisions a CodeQL DB) and later runs reuse a checksum-validated cache. Add the cache directory to your `.gitignore`.
133+
> **Analysis cache (Python):** caching is owned by `codeanalyzer-python` — the backend virtualenv and analysis cache live under `cache_dir` (default `<project>/.codeanalyzer`). The first run is slower (it provisions the backend virtualenv) and later runs reuse a checksum-validated cache. Add the cache directory to your `.gitignore`.
134134
135135
## Architecture
136136

@@ -147,7 +147,7 @@ graph TD
147147
A --> T[cldk.analysis.typescript]
148148
149149
J --> EJ[codeanalyzer-java<br/>WALA · JavaParser]
150-
P --> EP[codeanalyzer-python<br/>Jedi · CodeQL]
150+
P --> EP[codeanalyzer-python<br/>Jedi · PyCG]
151151
T --> ET[codeanalyzer-typescript<br/>ts-morph · Jelly]
152152
153153
J -. read-only .-> N[(Neo4j)]

cldk/analysis/commons/backend_config.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,9 @@ class PyCodeAnalyzerConfig(CodeAnalyzerConfig):
6767
Adds the Python-only call-graph knobs on top of :class:`CodeAnalyzerConfig`.
6868
6969
Attributes:
70-
use_codeql: If ``True`` (default), augment Jedi-based call-graph resolution with CodeQL.
7170
use_ray: If ``True``, enable Ray-based parallel processing for large projects.
7271
"""
7372

74-
use_codeql: bool = True
7573
use_ray: bool = False
7674

7775

cldk/analysis/java/codeanalyzer/_jdk.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,8 @@
1616

1717
"""Fetch + cache a self-contained Temurin JDK to run codeanalyzer.jar.
1818
19-
Mirrors the codeanalyzer-python ``CodeQLLoader`` pattern (download a platform
20-
archive from a release, extract restoring exec bits, locate the binary), adapted
21-
for a JDK:
19+
Follows a platform-binary loader pattern (download a platform archive from a
20+
release, extract restoring exec bits, locate the binary), adapted for a JDK:
2221
2322
* pinned to an exact Temurin release (reproducible) instead of "latest",
2423
* SHA256-verified,
@@ -55,7 +54,7 @@
5554

5655

5756
class JdkLoader:
58-
"""Resolve a Temurin JDK from the Adoptium API, mirroring CodeQLLoader."""
57+
"""Resolve a Temurin JDK from the Adoptium API."""
5958

6059
_API = "https://api.adoptium.net/v3"
6160

@@ -131,8 +130,7 @@ def download_and_extract(cls, dest: Path) -> Path:
131130

132131
logger.info(f"Extracting JDK to {dest}")
133132
if archive.name.endswith(".zip"):
134-
# zipfile.extractall drops the executable bit; copy each stored mode
135-
# back (same fix the CodeQL loader applies).
133+
# zipfile.extractall drops the executable bit; copy each stored mode back.
136134
with zipfile.ZipFile(archive) as zf:
137135
for info in zf.infolist():
138136
out = zf.extract(info, dest)
@@ -160,7 +158,7 @@ def ensure_jdk(java_cache_dir: Path) -> Path:
160158
cached at ``<java_cache_dir>/jdk/<release>/`` -- the existing per-language
161159
cache root, not a new location.
162160
163-
Resolution order (mirrors codeanalyzer-python's ``_ensure_codeql_bin``):
161+
Resolution order:
164162
1. the cached JDK under ``<java_cache_dir>/jdk/<release>/`` -- reused across runs;
165163
2. a system ``$JAVA_HOME`` that actually has ``jmods`` -- honored verbatim;
166164
3. otherwise download + extract the pinned Temurin JDK into the cache.

cldk/analysis/python/codeanalyzer/codeanalyzer.py

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
3030
The analysis leverages:
3131
- **Jedi**: For semantic code understanding and symbol resolution.
32-
- **CodeQL** (optional): For enhanced call graph resolution.
32+
- **PyCG**: For call-graph construction.
3333
- **Tree-sitter**: For fast syntactic parsing.
3434
3535
Key features:
@@ -111,7 +111,6 @@ class PyCodeanalyzer(PythonAnalysisBackend):
111111
analysis_level (str): The depth of analysis performed.
112112
eager_analysis (bool): Whether to force regeneration of caches.
113113
target_files (List[str] | None): Specific files to analyze.
114-
use_codeql (bool): Whether CodeQL is used for call graph enhancement.
115114
cache_dir (Path | None): Cache directory for the backend.
116115
analysis_json_path (Path | None): Path for persisting analysis results.
117116
application (PyApplication): The analyzed application model.
@@ -129,7 +128,6 @@ def __init__(
129128
eager_analysis: bool,
130129
cache_dir: Union[str, Path, None] = None,
131130
target_files: List[str] | None = None,
132-
use_codeql: bool = True,
133131
use_ray: bool = False,
134132
) -> None:
135133
"""Initialize the Python code analyzer and run analysis.
@@ -154,27 +152,23 @@ def __init__(
154152
its analysis from scratch, ignoring any cached results.
155153
If ``False``, cached results are reused when available.
156154
cache_dir: Directory for codeanalyzer-python's caches, including
157-
its virtualenv, CodeQL database, and analysis cache files.
158-
If ``None``, defaults to ``<project_dir>/.codeanalyzer``.
155+
its virtualenv and analysis cache files. If ``None``, defaults
156+
to ``<project_dir>/.codeanalyzer``.
159157
target_files: Optional list of specific files to analyze. Note
160158
that codeanalyzer-python currently supports only a single
161159
target file; if multiple are provided, only the first is
162160
used and a warning is logged.
163-
use_codeql: If ``True`` (default), uses CodeQL to enhance call
164-
graph resolution beyond what Jedi provides. Set to ``False``
165-
for faster analysis without CodeQL.
166161
use_ray: If ``True``, enables Ray-based parallel processing for
167-
analysis. Recommended for very large projects where Jedi/CodeQL
168-
analysis would otherwise be slow. Requires Ray to be installed.
162+
analysis. Recommended for very large projects where analysis
163+
would otherwise be slow. Requires Ray to be installed.
169164
Defaults to ``False``.
170165
171166
Raises:
172167
ValueError: If ``project_dir`` is ``None``.
173168
174169
Note:
175170
Analysis is performed synchronously during initialization.
176-
For large projects, this may take significant time, especially
177-
with ``use_codeql=True``.
171+
For large projects, this may take significant time.
178172
"""
179173
if project_dir is None:
180174
raise ValueError("project_dir is required for Python analysis.")
@@ -185,7 +179,6 @@ def __init__(
185179
self.analysis_level = analysis_level
186180
self.eager_analysis = eager_analysis
187181
self.target_files = target_files
188-
self.use_codeql = use_codeql
189182
self.use_ray = use_ray
190183

191184
# codeanalyzer-python owns all caching. CLDK forwards these paths
@@ -234,7 +227,6 @@ def _run_analyzer(self) -> PyApplication:
234227
input=self.project_dir,
235228
output=self.analysis_json_path,
236229
format=OutputFormat.JSON,
237-
using_codeql=self.use_codeql,
238230
using_ray=self.use_ray,
239231
rebuild_analysis=self.eager_analysis,
240232
skip_tests=True,

cldk/analysis/python/python_analysis.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@
2525
combination of:
2626
- **Jedi**: For semantic code understanding, symbol resolution, and basic
2727
call graph construction.
28-
- **CodeQL** (optional): For enhanced call graph resolution and more
29-
accurate inter-procedural analysis.
28+
- **PyCG**: For call-graph construction.
3029
- **Tree-sitter**: For fast syntactic parsing and AST operations.
3130
3231
Key capabilities include:
@@ -183,7 +182,6 @@ def __init__(
183182
eager_analysis=eager_analysis,
184183
cache_dir=cache_path,
185184
target_files=target_files,
186-
use_codeql=getattr(cfg, "use_codeql", True),
187185
use_ray=getattr(cfg, "use_ray", False),
188186
)
189187

@@ -370,7 +368,7 @@ def get_call_graph(self) -> nx.DiGraph:
370368
371369
The call graph is built using:
372370
- Jedi for semantic call resolution
373-
- CodeQL (if enabled) for enhanced inter-procedural analysis
371+
- PyCG for inter-procedural call-graph construction
374372
375373
Returns:
376374
A ``networkx.DiGraph`` where:
@@ -380,9 +378,8 @@ def get_call_graph(self) -> nx.DiGraph:
380378
- Edge attributes may include call site information
381379
382380
Note:
383-
The completeness of the call graph depends on the analysis
384-
configuration. With ``use_codeql=True``, more call relationships
385-
are typically discovered at the cost of longer analysis time.
381+
The completeness of the call graph depends on the analysis backend
382+
(Jedi plus PyCG in codeanalyzer-python 0.3.0).
386383
387384
See Also:
388385
:meth:`get_callers`: For finding callers of a specific method.

cldk/core.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
The CLDK supports the following languages:
2626
- **Java**: Full static analysis via CodeAnalyzer backend, including symbol
2727
tables, call graphs, and code metrics.
28-
- **Python**: Static analysis via codeanalyzer-python backend with optional
29-
CodeQL-augmented call graph resolution.
28+
- **Python**: Static analysis via codeanalyzer-python backend (Jedi plus
29+
PyCG call-graph construction).
3030
- **C**: Basic analysis via libclang for parsing and extracting code structure.
3131
3232
Typical usage involves instantiating :class:`CLDK` with a target language, then
@@ -252,7 +252,6 @@ def analysis(
252252
analysis_backend_path: str | None = None,
253253
analysis_json_path: str | Path | None = None,
254254
cache_dir: str | Path | None = None,
255-
use_codeql: bool = True,
256255
use_ray: bool = False,
257256
neo4j_config: "Neo4jConnectionConfig | None" = None,
258257
) -> JavaAnalysis | PythonAnalysis | CAnalysis | TypeScriptAnalysis:
@@ -300,7 +299,7 @@ def analysis(
300299
elif self.language == "python":
301300
if source_code is not None:
302301
raise CldkInitializationException("source_code mode is not supported for Python; please pass project_path.")
303-
backend = neo4j_config if neo4j_config is not None else PyCodeAnalyzerConfig(cache_dir=cache_root, use_codeql=use_codeql, use_ray=use_ray)
302+
backend = neo4j_config if neo4j_config is not None else PyCodeAnalyzerConfig(cache_dir=cache_root, use_ray=use_ray)
304303
return CLDK.python(
305304
project_path=project_path,
306305
analysis_level=analysis_level,

cldk/utils/exceptions/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,9 @@
2121
from .exceptions import (
2222
CldkInitializationException,
2323
CodeanalyzerExecutionException,
24-
CodeQLDatabaseBuildException,
25-
CodeQLQueryExecutionException,
2624
)
2725

2826
__all__ = [
29-
"CodeQLDatabaseBuildException",
30-
"CodeQLQueryExecutionException",
3127
"CodeanalyzerExecutionException",
3228
"CldkInitializationException",
3329
]

cldk/utils/exceptions/exceptions.py

Lines changed: 0 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@
2323
- **Initialization Errors**: :class:`CldkInitializationException`
2424
- **Analysis Backend Errors**: :class:`CodeanalyzerExecutionException`,
2525
:class:`CodeanalyzerUsageException`
26-
- **CodeQL Errors**: :class:`CodeQLDatabaseBuildException`,
27-
:class:`CodeQLQueryExecutionException`
2826
2927
All exceptions inherit from Python's built-in :class:`Exception` class and
3028
include a descriptive message attribute.
@@ -89,75 +87,6 @@ def __init__(self, message: str) -> None:
8987
super().__init__(self.message)
9088

9189

92-
class CodeQLDatabaseBuildException(Exception):
93-
"""Exception raised for errors during CodeQL database building.
94-
95-
This exception is raised when the CodeQL database creation fails.
96-
CodeQL databases are used for enhanced call graph analysis in Python.
97-
Common causes include:
98-
- CodeQL CLI not installed or not on PATH
99-
- Invalid project structure for CodeQL
100-
- Insufficient disk space
101-
- Build errors in the target project
102-
103-
Attributes:
104-
message (str): A descriptive error message explaining the
105-
database build failure.
106-
107-
Note:
108-
This exception is primarily relevant for Python analysis when
109-
``use_codeql=True`` is specified.
110-
111-
See Also:
112-
:class:`~cldk.analysis.python.PythonAnalysis`: Python analysis
113-
that may use CodeQL.
114-
"""
115-
116-
def __init__(self, message: str) -> None:
117-
"""Initialize the exception with a descriptive message.
118-
119-
Args:
120-
message: A descriptive error message explaining what went wrong
121-
during CodeQL database creation.
122-
"""
123-
self.message = message
124-
super().__init__(self.message)
125-
126-
127-
class CodeQLQueryExecutionException(Exception):
128-
"""Exception raised for errors during CodeQL query execution.
129-
130-
This exception is raised when a CodeQL query fails to execute against
131-
a CodeQL database. Common causes include:
132-
- Invalid query syntax
133-
- Query timeout
134-
- Database corruption
135-
- Incompatible CodeQL version
136-
137-
Attributes:
138-
message (str): A descriptive error message explaining the
139-
query execution failure.
140-
141-
Note:
142-
This exception is primarily relevant for Python analysis when
143-
``use_codeql=True`` is specified.
144-
145-
See Also:
146-
:class:`CodeQLDatabaseBuildException`: Related exception for
147-
database creation failures.
148-
"""
149-
150-
def __init__(self, message: str) -> None:
151-
"""Initialize the exception with a descriptive message.
152-
153-
Args:
154-
message: A descriptive error message explaining what went wrong
155-
during CodeQL query execution.
156-
"""
157-
self.message = message
158-
super().__init__(self.message)
159-
160-
16190
class CodeanalyzerUsageException(Exception):
16291
"""Exception raised for incorrect CodeAnalyzer usage.
16392

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "cldk"
3-
version = "1.3.0"
3+
version = "1.4.0"
44
description = "The official Python SDK for Codellm-Devkit."
55
readme = "README.md"
66
license = { text = "Apache-2.0" }
@@ -40,7 +40,7 @@ dependencies = [
4040
"tree-sitter-javascript==0.23.1",
4141
"clang==17.0.6",
4242
"libclang==17.0.6",
43-
"codeanalyzer-python==0.2.0",
43+
"codeanalyzer-python==0.3.0",
4444
"codeanalyzer-typescript==0.4.3",
4545
]
4646

@@ -88,7 +88,7 @@ include = [
8888

8989
[tool.backend-versions]
9090
codeanalyzer-java = "2.4.1"
91-
codeanalyzer-python = "0.2.0"
91+
codeanalyzer-python = "0.3.0"
9292
codeanalyzer-typescript = "0.4.3"
9393

9494
########################################

0 commit comments

Comments
 (0)