Skip to content

Commit c85c457

Browse files
committed
Merge main into feat/level2-pycg-remove-codeql
Incorporates Neo4j emit target, --emit/--app-name/--neo4j-* CLI options, EmitTarget enum, _install_into_venv helper, uv dependency, canpy rename, and _compute_external_symbols from main. Retains PyCG as analysis level 2 backend (--analysis-level, --pycg-shard, --pycg-shard-ceiling, --pycg-shard-timeout) and filter_external_edges from this branch. CodeQL is kept as an optional augmentation pass (--codeql/--no-codeql) that enriches call sites before Jedi runs; PyCG adds further edges at level 2 on top of the Jedi+CodeQL merge. Signed-off-by: Saurabh Sinha <sinha108@gmail.com>
2 parents 7b78484 + 2bae291 commit c85c457

31 files changed

Lines changed: 3326 additions & 434 deletions

.github/release.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Configures GitHub's auto-generated release notes (the "What's Changed" section
2+
# appended by `generate_release_notes` in .github/workflows/release.yml). Merged
3+
# PRs are grouped under these emoji headings by label, mirroring the emoji
4+
# categories used by the codeanalyzer-typescript backend.
5+
changelog:
6+
exclude:
7+
authors:
8+
- dependabot
9+
- github-actions
10+
categories:
11+
- title: 🚀 Features
12+
labels: [enhancement, kind/feature]
13+
- title: 🐛 Fixes
14+
labels: [bug, fix]
15+
- title: ♻️ Refactoring
16+
labels: [refactoring]
17+
- title: ⚡️ Performance
18+
labels: [performance]
19+
- title: 📚 Documentation
20+
labels: [documentation, doc]
21+
- title: 🚦 Tests
22+
labels: [test]
23+
- title: 🚨 Breaking Changes
24+
labels: [breaking]
25+
- title: 🛠 Other Changes
26+
labels: ["*"]

.github/workflows/release.yml

Lines changed: 116 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,26 @@ jobs:
3232
run: uv sync --all-groups
3333

3434
- name: Install dependencies
35-
run: uv pip install -e .
35+
run: uv pip install -e ".[neo4j]"
36+
37+
# Keep generated docs in lockstep with the code being released: regenerate the
38+
# README `canpy --help` block and the Neo4j schema.json from source, and commit
39+
# them back to main. Releases are cut from main HEAD, so this fast-forwards;
40+
# best-effort if main moved.
41+
- name: Sync generated docs (README --help + Neo4j schema)
42+
if: startsWith(github.ref, 'refs/tags/')
43+
run: |
44+
uv run python scripts/update_readme.py
45+
uv run canpy --emit schema > schema.neo4j.json
46+
if git diff --quiet README.md schema.neo4j.json; then
47+
echo "Generated docs already current."
48+
else
49+
git config user.name "github-actions[bot]"
50+
git config user.email "github-actions[bot]@users.noreply.github.com"
51+
git add README.md schema.neo4j.json
52+
git commit -m "docs: sync README --help and Neo4j schema for ${GITHUB_REF#refs/tags/}"
53+
git push origin HEAD:main || echo "::warning::could not push doc sync to main (diverged?)"
54+
fi
3655
3756
- name: Run tests
3857
id: test
@@ -51,45 +70,114 @@ jobs:
5170
- name: Build package
5271
run: uv build
5372

73+
# Platform-independent, version-locked release assets published alongside the
74+
# wheels/sdist: the Neo4j schema contract (so a consumer can validate
75+
# producer/consumer compatibility without installing the package) and the
76+
# cargo-dist-style install script.
77+
- name: Stage release assets (Neo4j schema + installer script)
78+
run: |
79+
mkdir -p release-assets
80+
uv run canpy --emit schema > release-assets/schema.json
81+
cp packaging/install/canpy-installer.sh release-assets/canpy-installer.sh
82+
ls -lh release-assets
83+
5484
- name: Get version from tag
5585
id: tag_name
5686
run: |
5787
echo "current_version=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
5888
shell: bash
5989

60-
- name: Read Changelog Entry
61-
id: changelog_reader
62-
uses: mindsers/changelog-reader-action@v2
63-
with:
64-
validation_level: warn
65-
version: ${{ steps.tag_name.outputs.current_version }}
66-
path: ./CHANGELOG.md
67-
68-
- name: Build changelog
69-
id: gen_changelog
70-
uses: mikepenz/release-changelog-builder-action@v5
71-
with:
72-
failOnError: "true"
73-
configuration: .github/workflows/release_config.json
90+
# cargo-dist-style notes: install one-liners + a download table. The categorized
91+
# "What's Changed" (merged PRs/issues grouped under emoji headings via
92+
# .github/release.yml) is appended by generate_release_notes below. Indented code
93+
# blocks avoid backticks in the heredoc.
94+
- name: Compose release notes header (install + download)
7495
env:
75-
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
96+
VERSION: ${{ steps.tag_name.outputs.current_version }}
97+
run: |
98+
REPO="codellm-devkit/codeanalyzer-python"
99+
BASE="https://github.com/$REPO/releases/download/v$VERSION"
100+
cat > "$RUNNER_TEMP/RELEASE_BODY.md" <<EOF
101+
## Install codeanalyzer-python v$VERSION
102+
103+
Shell script (installs the canpy CLI via uv / pipx / pip):
104+
105+
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/$REPO/releases/latest/download/canpy-installer.sh | sh
106+
107+
PyPI:
108+
109+
pip install codeanalyzer-python==$VERSION
110+
111+
For the optional live Neo4j push (--emit neo4j --neo4j-uri ...):
112+
113+
pip install 'codeanalyzer-python[neo4j]==$VERSION'
114+
115+
## Download
116+
117+
| File | Description |
118+
| --- | --- |
119+
| [codeanalyzer_python-$VERSION-py3-none-any.whl]($BASE/codeanalyzer_python-$VERSION-py3-none-any.whl) | Python wheel |
120+
| [codeanalyzer_python-$VERSION.tar.gz]($BASE/codeanalyzer_python-$VERSION.tar.gz) | Source distribution |
121+
| [canpy-installer.sh]($BASE/canpy-installer.sh) | Shell installer (uv / pipx / pip) |
122+
| [schema.json]($BASE/schema.json) | Neo4j schema contract |
123+
EOF
124+
echo "----- composed header -----"; cat "$RUNNER_TEMP/RELEASE_BODY.md"
76125
77126
- name: Publish release on GitHub
78-
uses: softprops/action-gh-release@v1
127+
uses: softprops/action-gh-release@v2
79128
with:
80-
files: dist/*
81-
body: |
82-
## Release Notes (from CHANGELOG.md)
83-
84-
${{ steps.changelog_reader.outputs.changes }}
85-
86-
---
87-
88-
## Detailed Changes (auto-generated)
89-
90-
${{ steps.gen_changelog.outputs.changelog }}
129+
files: |
130+
dist/*
131+
release-assets/*
132+
body_path: ${{ runner.temp }}/RELEASE_BODY.md
133+
generate_release_notes: true # appends categorized "What's Changed" (see .github/release.yml)
91134
env:
92135
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
93136

94137
- name: Publish to PyPI via Trusted Publishing
95138
run: uv publish
139+
140+
# Regenerate the Homebrew formula and push it to the shared tap. Split into its
141+
# own job (needs: release) so a tap-push failure -- e.g. a missing
142+
# HOMEBREW_TAP_TOKEN -- is isolated from the PyPI and GitHub Release steps above.
143+
# The non-Rust equivalent of what cargo-dist does for you.
144+
homebrew:
145+
needs: release
146+
if: startsWith(github.ref, 'refs/tags/')
147+
runs-on: ubuntu-latest
148+
steps:
149+
- name: Check out code
150+
uses: actions/checkout@v4
151+
152+
- name: Derive version from tag
153+
id: ver
154+
run: echo "version=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"
155+
156+
- name: Generate Homebrew formula
157+
env:
158+
REPO: ${{ github.repository }}
159+
VERSION: ${{ steps.ver.outputs.version }}
160+
run: |
161+
chmod +x packaging/homebrew/generate_formula.sh
162+
# The release job just published the sdist as a Release asset; hash the
163+
# exact bytes users will download so the formula checksum always matches.
164+
sdist="https://github.com/${REPO}/releases/download/v${VERSION}/codeanalyzer_python-${VERSION}.tar.gz"
165+
SHA256="$(curl -fLsS "$sdist" | shasum -a 256 | cut -d' ' -f1)"
166+
REPO="$REPO" VERSION="$VERSION" SHA256="$SHA256" \
167+
./packaging/homebrew/generate_formula.sh > codeanalyzer-python.rb
168+
cat codeanalyzer-python.rb
169+
170+
- name: Push formula to codellm-devkit/homebrew-tap
171+
env:
172+
TAP_TOKEN: ${{ secrets.HOMEBREW_TAP_TOKEN }} # PAT with write access to homebrew-tap
173+
VERSION: ${{ steps.ver.outputs.version }}
174+
run: |
175+
git clone "https://x-access-token:${TAP_TOKEN}@github.com/codellm-devkit/homebrew-tap.git" tap
176+
mkdir -p tap/Formula
177+
cp codeanalyzer-python.rb tap/Formula/codeanalyzer-python.rb
178+
cd tap
179+
git config user.name "github-actions[bot]"
180+
git config user.email "github-actions[bot]@users.noreply.github.com"
181+
git add Formula/codeanalyzer-python.rb
182+
git commit -m "codeanalyzer-python ${VERSION}" || { echo "no formula change"; exit 0; }
183+
git push

.github/workflows/release_config.json

Lines changed: 0 additions & 65 deletions
This file was deleted.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,3 +180,7 @@ analysis.json
180180

181181
# UV
182182
uv.lock
183+
184+
# Node / Astro docs-site build artifacts (never commit these)
185+
node_modules/
186+
.astro/

CHANGELOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,40 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.2.1] - 2026-06-22
9+
10+
### Added
11+
- **Homebrew tap**`brew install codellm-devkit/tap/codeanalyzer-python`. The release workflow auto-generates a formula (`packaging/homebrew/generate_formula.sh`) that installs the pinned PyPI release as an isolated `uv` tool, and pushes it to `codellm-devkit/homebrew-tap`. Because the package is pure-Python with heavy native dependencies (`ray`, `pandas`, `numpy`), the formula depends on `uv` and runs the release via `uvx` rather than vendoring every transitive dependency as a Homebrew resource.
12+
- **First-class external symbols**`PyApplication.external_symbols` (a `{signature → PyExternalSymbol{name, module}}` map) records call-graph targets outside the analyzed project, mirroring the `codeanalyzer-typescript` backend. `analysis.json` now carries external info that was previously only a bare target string, and the Neo4j projection emits `:PyExternal` authoritatively from it ([#44](https://github.com/codellm-devkit/codeanalyzer-python/issues/44)).
13+
- **`--no-venv` / `--venv` flag** — skip virtualenv creation and dependency installation and resolve imports against the ambient Python interpreter. Useful in CI / containers where the project's dependencies are already installed, for sandboxed runs without network, and for speed ([#46](https://github.com/codellm-devkit/codeanalyzer-python/issues/46)).
14+
15+
### Changed
16+
- The per-project analysis virtualenv is now installed with **`uv`** (parallel downloads + a shared global cache; falls back to `pip`), and is now **wired to Jedi** — previously `self.virtualenv` stayed `None`, so the install was never used by the symbol-table builder ([#47](https://github.com/codellm-devkit/codeanalyzer-python/issues/47)).
17+
- Neo4j `:PyExternal` gains a `module` property; `SCHEMA_VERSION` bumped `1.0.0 → 1.1.0` (additive) ([#44](https://github.com/codellm-devkit/codeanalyzer-python/issues/44)).
18+
19+
### Fixed
20+
- `--emit neo4j` no longer drops call edges whose target is a bare imported module name (e.g. `os`, `re`, `json`): a `:PyPackage` name can no longer shadow a call target's `:PySymbol` signature, and the node-identity tracking is keyed by `(label, value)` so deferred `PY_EXTENDS` / `PY_RESOLVES_TO` edges can't be shadowed either ([#44](https://github.com/codellm-devkit/codeanalyzer-python/issues/44)).
21+
- `--emit neo4j` (Bolt) full-run orphan prune is now scoped to the `:PyApplication` anchor, so a full-run push for one application no longer deletes another application's modules from a shared database ([#45](https://github.com/codellm-devkit/codeanalyzer-python/issues/45)).
22+
23+
## [0.2.0] - 2026-06-20
24+
25+
### Added
26+
- **Neo4j property-graph output** (`--emit neo4j`). The same in-memory analysis (`PyApplication`) is projected to a labeled property graph, mirroring the `codeanalyzer-typescript` backend. Node labels are `Py`-prefixed and relationship types are `PY_`-prefixed (e.g. `:PyClass`, `PY_CALLS`) so multiple language analyzers can coexist in one database without label or relationship-type collisions. Two writers:
27+
- **`graph.cypher` snapshot** (default) — a self-contained Cypher script (constraints + indexes, a scoped wipe of the project's prior subgraph, then batched `UNWIND … MERGE`). Load it with `cypher-shell < graph.cypher`. Needs no extra dependencies.
28+
- **Live Bolt push** (`--neo4j-uri`) — an **incremental** writer: only modules whose `content_hash` changed are rewritten, and on a full run modules whose source file vanished are pruned. Requires the optional `neo4j` driver (`pip install 'codeanalyzer-python[neo4j]'`).
29+
- **`--emit schema`** — emit the machine-readable, version-stamped Neo4j schema contract (`schema.json`: node labels, relationships, properties, constraints, indexes). Needs no project; bundled in every release as a GitHub Release asset and checked in as `schema.neo4j.json`. A `schema_version` (`1.0.0`) is stamped onto every graph's `:PyApplication` node.
30+
- **New CLI options** mirroring the TypeScript analyzer's entrypoints: `--emit {json,neo4j,schema}`, `--app-name`, `--neo4j-uri`, `--neo4j-user`, `--neo4j-password`, `--neo4j-database`. `-i/--input` is now optional (not required for `--emit schema`). The four Neo4j connection options also read from the standard `NEO4J_URI` / `NEO4J_USERNAME` / `NEO4J_PASSWORD` / `NEO4J_DATABASE` environment variables when the flag is omitted (an explicit flag wins), so the password need not appear in shell history or the process list.
31+
- **`codeanalyzer.neo4j`** package: `catalog` (the single source-of-truth schema catalog), `project` (pure IR → graph rows), `cypher` (snapshot writer), `bolt` (incremental writer), and `rows` (the output-agnostic intermediate).
32+
- **Schema conformance test** (`test/test_neo4j_schema.py`, always runs) — asserts the emitter never produces a label/relationship/property the catalog doesn't declare, and that the checked-in `schema.neo4j.json` is regenerated.
33+
- **Neo4j Testcontainers integration test** (`test/test_neo4j_bolt.py`, opt-in via `RUN_CONTAINER_TESTS=1`) — spins up a real Neo4j and asserts the pushed graph, idempotent re-push, vanished-declaration cleanup, and full-run orphan pruning.
34+
- **Install script** (`packaging/install/canpy-installer.sh`) — a `curl … | sh` installer that provisions the CLI via uv / pipx / pip, published as a release asset.
35+
- **`schema-uml.drawio`** — a clean UML of the `analysis.json` schema (the `PyApplication` containment tree).
36+
37+
### Changed
38+
- **The CLI command is now `canpy`** (was `codeanalyzer`), matching the `cants` (TypeScript) sibling. The PyPI package name is unchanged (`codeanalyzer-python`), as is the importable `codeanalyzer` module. The old `codeanalyzer` command is retained as a **deprecated alias** that prints a notice (to stderr) and then runs unchanged; it will be removed in a future release.
39+
- The README `canpy --help` block is now generated from the live CLI (`scripts/update_readme.py`, between `<!-- BEGIN/END canpy-help -->` markers) so it can't drift from the code.
40+
- The release workflow now installs the `[neo4j]` extra, syncs both the README `--help` block and `schema.neo4j.json` from source before publishing, and uploads the schema contract (`schema.json`) and installer script as GitHub Release assets.
41+
842
## [0.1.15] - 2026-05-15
943

1044
### Fixed

0 commit comments

Comments
 (0)