Progress towards tree-sitter feature by saniyafatima07 · Pull Request #3102 · mandiant/capa

saniyafatima07 · 2026-06-10T18:39:59Z

Addresses PR #1080 and #2931

Updated few outdated tree-sitter queries
Ensure all related test cases pass

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

This submission includes AI-generated code and I have provided details in the description.

gemini-code-assist

Code Review

This pull request introduces Tree-Sitter Script Analysis to capa, enabling feature extraction from script languages such as C#, Python, HTML, and ASPX templates. It adds a new Tree-Sitter-based feature extractor, auto-detection capabilities, and signature-based tools, along with comprehensive tests and updated dependencies. The code review feedback primarily addresses compatibility issues with the upgraded tree-sitter library (version 0.25.0), specifically pointing out that QueryCursor has been removed and the Parser instantiation has changed in tree-sitter versions >= 0.21.0. The feedback provides actionable suggestions to execute queries directly and update parser usage. Additionally, it identifies a bug in integer suffix parsing and recommends replacing a deprecated importlib.resources API.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-10T18:42:53Z

+    def get_function_call_names(self, node: Node) -> Iterator[Node]:
+        cursor = QueryCursor(self.query.function_call_name)
+        yield from self.get_captured_nodes(cursor, node)
+
+    def get_imported_constants(self, node: Node) -> Iterator[Node]:
+        cursor = QueryCursor(self.query.imported_constant_name)
+        yield from self.get_captured_nodes(cursor, node)


Since QueryCursor is removed, you can pass the query directly to get_captured_nodes without instantiating a cursor.

Suggested change

def get_function_call_names(self, node: Node) -> Iterator[Node]:

cursor = QueryCursor(self.query.function_call_name)

yield from self.get_captured_nodes(cursor, node)

def get_imported_constants(self, node: Node) -> Iterator[Node]:

cursor = QueryCursor(self.query.imported_constant_name)

yield from self.get_captured_nodes(cursor, node)

def get_function_call_names(self, node: Node) -> Iterator[Node]:

yield from self.get_captured_nodes(self.query.function_call_name, node)

def get_imported_constants(self, node: Node) -> Iterator[Node]:

yield from self.get_captured_nodes(self.query.imported_constant_name, node)

gemini-code-assist · 2026-06-10T18:42:54Z

+    def get_namespaces(self, node: Optional[Node] = None) -> List[Tuple[Node, str]]:
+        target_node = self.tree.root_node if node is None else node
+        cursor = QueryCursor(self.query.namespace)
+        namespace_captures: List[Tuple[Node, str]] = []
+
+        for query_name, nodes in cursor.captures(target_node).items():
+            for namespace_node in nodes:
+                namespace_captures.append((namespace_node, query_name))
+
+        return sorted(namespace_captures, key=self.get_node_capture_sort_key)
+


Since QueryCursor is removed in tree-sitter >= 0.21.0, you should execute the query directly using self.query.namespace.captures(target_node). This returns a list of (Node, str) tuples representing the captured nodes and their capture names.

Suggested change

def get_namespaces(self, node: Optional[Node] = None) -> List[Tuple[Node, str]]:

target_node = self.tree.root_node if node is None else node

cursor = QueryCursor(self.query.namespace)

namespace_captures: List[Tuple[Node, str]] = []

for query_name, nodes in cursor.captures(target_node).items():

for namespace_node in nodes:

namespace_captures.append((namespace_node, query_name))

return sorted(namespace_captures, key=self.get_node_capture_sort_key)

def get_namespaces(self, node: Optional[Node] = None) -> List[Tuple[Node, str]]:

target_node = self.tree.root_node if node is None else node

captures = self.query.namespace.captures(target_node)

namespace_captures: List[Tuple[Node, str]] = []

for namespace_node, query_name in captures:

namespace_captures.append((namespace_node, query_name))

return sorted(namespace_captures, key=self.get_node_capture_sort_key)

gemini-code-assist · 2026-06-10T18:42:54Z

+def _contains_errors(ts_language, node: Node) -> bool:
+    query = Query(ts_language, "(ERROR) @error")
+    return bool(QueryCursor(query).captures(node))


Since QueryCursor is removed in tree-sitter >= 0.21.0, you should execute the query directly using query.captures(node).

Suggested change

def _contains_errors(ts_language, node: Node) -> bool:

query = Query(ts_language, "(ERROR) @error")

return bool(QueryCursor(query).captures(node))

def _contains_errors(ts_language, node: Node) -> bool:

query = Query(ts_language, "(ERROR) @error")

return bool(query.captures(node))

gemini-code-assist · 2026-06-10T18:42:54Z

+    def get_string_literals(self, node: Node) -> Iterator[Node]:
+        cursor = QueryCursor(self.query.string_literal)
+        yield from self.get_captured_nodes(cursor, node)
+
+    def get_integer_literals(self, node: Node) -> Iterator[Node]:
+        cursor = QueryCursor(self.query.integer_literal)
+        yield from self.get_captured_nodes(cursor, node)


Since QueryCursor is removed, you can pass the query directly to get_captured_nodes without instantiating a cursor.

Suggested change

def get_string_literals(self, node: Node) -> Iterator[Node]:

cursor = QueryCursor(self.query.string_literal)

yield from self.get_captured_nodes(cursor, node)

def get_integer_literals(self, node: Node) -> Iterator[Node]:

cursor = QueryCursor(self.query.integer_literal)

yield from self.get_captured_nodes(cursor, node)

def get_string_literals(self, node: Node) -> Iterator[Node]:

yield from self.get_captured_nodes(self.query.string_literal, node)

def get_integer_literals(self, node: Node) -> Iterator[Node]:

yield from self.get_captured_nodes(self.query.integer_literal, node)

gemini-code-assist · 2026-06-10T18:42:54Z

+    def parse(self) -> Tree:
+        parser = Parser(self.query.language)
+        return parser.parse(self.buf)


In tree-sitter >= 0.21.0, the Parser constructor does not accept a Language argument. You must set the language attribute after instantiating the parser.

Suggested change

def parse(self) -> Tree:

parser = Parser(self.query.language)

return parser.parse(self.buf)

def parse(self) -> Tree:

parser = Parser()

parser.language = self.query.language

return parser.parse(self.buf)

gemini-code-assist · 2026-06-10T18:42:54Z

+    def get_content_sections(self) -> Iterator[Node]:
+        cursor = QueryCursor(self.query.content)
+        yield from self.get_captured_nodes(cursor, self.tree.root_node)


Since QueryCursor is removed, you can pass the query directly to get_captured_nodes without instantiating a cursor.

Suggested change

def get_content_sections(self) -> Iterator[Node]:

cursor = QueryCursor(self.query.content)

yield from self.get_captured_nodes(cursor, self.tree.root_node)

def get_content_sections(self) -> Iterator[Node]:

yield from self.get_captured_nodes(self.query.content, self.tree.root_node)

gemini-code-assist · 2026-06-10T18:42:55Z

+import re
+from typing import List, Tuple, Iterator, Optional
+
+from tree_sitter import Node, Tree, Parser, QueryCursor


In tree-sitter >= 0.21.0, QueryCursor has been removed. Import Query instead to execute queries directly.

Suggested change

from tree_sitter import Node, Tree, Parser, QueryCursor

from tree_sitter import Node, Tree, Parser, Query

gemini-code-assist · 2026-06-10T18:42:55Z

+from typing import Optional
+from pathlib import Path
+
+from tree_sitter import Node, Tree, Query, Parser, Language, QueryCursor


In tree-sitter >= 0.21.0, QueryCursor has been completely removed. Executing queries is now done directly via Query.captures or Language.query. Importing QueryCursor will raise an ImportError at runtime.

Suggested change

from tree_sitter import Node, Tree, Query, Parser, Language, QueryCursor

from tree_sitter import Node, Tree, Query, Parser, Language

gemini-code-assist · 2026-06-10T18:42:55Z

+    def parse_integer(self, integer: str) -> int:
+        for suffix in self.integer_suffixes:
+            if integer.endswith(suffix):
+                integer = integer[:-1]
+        for prefix, base in self.integer_prefixes:
+            if integer.startswith(prefix):
+                return int(integer, base)
+        return int(integer)


There is a bug in parse_integer when stripping suffixes. If an integer literal has multiple suffixes (e.g., 100ul in C#), the loop only strips the last character of the suffix that matched first, leaving the other suffix character intact (e.g., leaving 100u), which then fails to parse as an integer and raises a ValueError. Using rstrip with a combined set of uppercase and lowercase suffixes is more robust and correctly handles multi-character suffixes.

Suggested change

def parse_integer(self, integer: str) -> int:

for suffix in self.integer_suffixes:

if integer.endswith(suffix):

integer = integer[:-1]

for prefix, base in self.integer_prefixes:

if integer.startswith(prefix):

return int(integer, base)

return int(integer)

def parse_integer(self, integer: str) -> int:

suffixes = "".join(self.integer_suffixes)

if suffixes:

integer = integer.rstrip(suffixes + suffixes.upper())

for prefix, base in self.integer_prefixes:

if integer.startswith(prefix):

return int(integer, base)

return int(integer)

gemini-code-assist · 2026-06-10T18:42:55Z

+    def load_import_signatures(self, signature_file: str) -> Dict[str, set[str]]:
+        signatures = json.loads(importlib.resources.read_text(capa.features.extractors.ts.signatures, signature_file))
+        return {category: set(names) for category, names in signatures.items()}


importlib.resources.read_text is deprecated since Python 3.11. Use importlib.resources.files instead, which is fully supported in Python 3.10+.

Suggested change

def load_import_signatures(self, signature_file: str) -> Dict[str, set[str]]:

signatures = json.loads(importlib.resources.read_text(capa.features.extractors.ts.signatures, signature_file))

return {category: set(names) for category, names in signatures.items()}

def load_import_signatures(self, signature_file: str) -> Dict[str, set[str]]:

ref = importlib.resources.files(capa.features.extractors.ts.signatures) / signature_file

signatures = json.loads(ref.read_text(encoding="utf-8"))

return {category: set(names) for category, names in signatures.items()}

… sigs)

…heels

Revives and supersedes old PR mandiant#1080. Resolves merge conflicts and brings up to date with current master.

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

saniyafatima07 marked this pull request as ready for review June 10, 2026 18:58

saniyafatima07 marked this pull request as draft June 10, 2026 18:58

saniyafatima07 force-pushed the script-feature branch from 019b531 to 04bdd2d Compare June 17, 2026 12:40

saniyafatima07 and others added 7 commits June 17, 2026 18:30

Fix issues

43fdbcc

fix: address PR review feedback (regex optimization, typos, duplicate…

ca8d2ba

… sigs)

Update CHANGELOG.md

c6c9dd6

Fix outdated code

70e725b

Fix outdated code

06c432a

Modernize Tree-sitter: Replace local C-compilation with native PyPI w…

1d86108

…heels

feat: revive script analysis ts feature

b40bf14

Revives and supersedes old PR mandiant#1080. Resolves merge conflicts and brings up to date with current master.

saniyafatima07 force-pushed the script-feature branch from e71ca37 to 5c19383 Compare June 17, 2026 16:41

mr-tz reviewed Jun 18, 2026

View reviewed changes

Comment thread capa/capabilities/common.py

saniyafatima07 force-pushed the script-feature branch from 9f29deb to 918a9af Compare June 18, 2026 20:02

saniyafatima07 added 3 commits June 19, 2026 13:03

Update rules submodule

bcdbd40

Fix: CI builds

5f5be2a

Fix import errors

fd3e53f

saniyafatima07 force-pushed the script-feature branch from 918a9af to fd3e53f Compare June 19, 2026 08:46

Update submodules

578d8e7

saniyafatima07 force-pushed the script-feature branch 2 times, most recently from c78e0a6 to eb48b8a Compare June 19, 2026 11:39

Fix CI: trial-1

1bc45f5

saniyafatima07 force-pushed the script-feature branch from eb48b8a to 1bc45f5 Compare June 19, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress towards tree-sitter feature#3102

Progress towards tree-sitter feature#3102
saniyafatima07 wants to merge 12 commits into
mandiant:masterfrom
saniyafatima07:script-feature

saniyafatima07 commented Jun 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	from tree_sitter import Node, Tree, Parser, QueryCursor
	from tree_sitter import Node, Tree, Parser, Query

	from tree_sitter import Node, Tree, Query, Parser, Language, QueryCursor
	from tree_sitter import Node, Tree, Query, Parser, Language

Conversation

saniyafatima07 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saniyafatima07 commented Jun 10, 2026 •

edited

Loading