Composable graph tooling for analysis, construction, and refinement
A lightweight, embedded, openCypher-compatible graph engine for research and investigative workflows
- Why GraphForge?
- Installation
- Quick Start
- Cypher Features
- Datasets
- Transactions
- Architecture
- Development
- Roadmap
- License
We are not building a database for applications. We are building a graph execution environment for thinking.
Modern data science and ML workflows increasingly produce graph-shaped data — entity relationships extracted by LLMs, citation networks, dependency graphs, social connections, knowledge bases. Working with this data shouldn't require running a database server. GraphForge brings the full expressiveness of the openCypher query language to the Python notebook and script environment: zero configuration, single-file persistence, and first-class Python integration.
| NetworkX | GraphForge | Neo4j / Memgraph | |
|---|---|---|---|
| Setup | pip install |
pip install |
Run a server |
| Query language | Python API | Full openCypher | Full Cypher |
| Persistence | Manual | SQLite (automatic) | Native |
| Notebook-friendly | ✓ | ✓ | Requires connection |
| Graph size | Millions | up to ~20M edges† | Billions |
| TCK compliance | N/A | 100% (3,885/3,885) | ~100% |
Use GraphForge for: knowledge graphs, citation networks, research workflows, LLM output storage, social network analysis in notebooks.
Use a production database for: high throughput, multi-user access, or graphs beyond the limits in Scale Limits.
† Traversal queries with LIMIT scale to ~20M edges; full-scan aggregations are practical up to ~1M edges.
v0.3.9 delivers substantial performance improvements over v0.3.8: LALR(1) linear-time parsing, O(1) property equality index, LIMIT short-circuit for traversal and UNWIND, bulk ingestion API, SQLite PRAGMA tuning, and elementId(). TCK compliance is maintained at 3,885/3,885 (100%).
See CHANGELOG.md for the full list of changes.
pip install graphforge
# or
uv add graphforgeRequirements: Python 3.10–3.14
Core dependencies: pydantic>=2.6, lark>=1.1, msgpack>=1.0
from graphforge import GraphForge
db = GraphForge()
# Create nodes and relationships
db.execute("""
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 25})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
""")
# Query the graph
results = db.execute("""
MATCH (p:Person)-[:KNOWS]->(friend)
WHERE p.age > 25
RETURN p.name AS person, friend.name AS friend, p.age AS age
ORDER BY p.age DESC
""")
for row in results:
print(f"{row['person'].value} (age {row['age'].value}) knows {row['friend'].value}")# Save to SQLite
db = GraphForge("research.db")
db.execute("CREATE (:Paper {title: 'Graph Neural Networks', year: 2024})")
db.close()
# Reload later
db = GraphForge("research.db")
result = db.execute("MATCH (p:Paper) RETURN p.title AS t")
print(result[0]['t'].value) # Graph Neural Networksalice = db.create_node(['Person', 'Employee'], name='Alice', age=30)
bob = db.create_node(['Person'], name='Bob', age=25)
db.create_relationship(alice, bob, 'KNOWS', since=2020)Results contain CypherValue objects — use .value to get the Python value:
results = db.execute("MATCH (p:Person) RETURN p.name AS name, p.age AS age")
for row in results:
name: str = row['name'].value
age: int = row['age'].valueGraphForge implements the full openCypher language (100% TCK compliant as of v0.3.8).
-- Reading
MATCH (n:Person)-[:KNOWS]->(friend)
OPTIONAL MATCH (n)-[:WORKS_AT]->(company)
WHERE n.age > 25
WITH n, count(friend) AS friends
RETURN n.name, friends
ORDER BY friends DESC
LIMIT 10
-- Writing
CREATE (n:Person {name: 'Alice'})
MERGE (n:Person {name: 'Alice'})
SET n.age = 30
REMOVE n.temp
DELETE n
DETACH DELETE n
-- Iteration
UNWIND [1, 2, 3] AS x
RETURN x * 2 AS doubled
-- Subqueries
MATCH (n) WHERE EXISTS { MATCH (n)-[:KNOWS]->() }
RETURN n(n) -- Any node
(n:Person) -- Node with label
(n:Person {age: 30}) -- Node with property
(a)-[r:KNOWS]->(b) -- Directed relationship
(a)-[r:KNOWS|LIKES]->(b) -- Multiple types
(a)-[*1..3]->(b) -- Variable-length (1 to 3 hops)
(a)-[*]->(b) -- Any length
p = (a)-[*]->(b) -- Bind path to variable| Category | Functions |
|---|---|
| String | toLower, toUpper, trim, split, replace, substring, left, right, reverse, size |
| Math | abs, ceil, floor, round, sqrt, pow, exp, log, sin, cos, tan, pi, e |
| List | head, tail, last, range, size, reverse, sort, collect, reduce, filter, extract |
| Aggregation | count, sum, avg, min, max, collect, stDev, percentileDisc |
| Predicate | all, any, none, single, exists, isEmpty |
| Temporal | date, datetime, localDatetime, time, localtime, duration, now |
| Spatial | point, distance |
| Graph | id, labels, type, keys, properties, nodes, relationships, startNode, endNode |
| Conversion | toInteger, toFloat, toString, toBoolean, coalesce |
-- Dates, times, datetimes
RETURN date('2024-01-15')
RETURN datetime('2024-01-15T14:30:00[Europe/London]') -- IANA timezone
RETURN duration('P1Y2M3DT4H5M6.789S')
-- Nanosecond precision
RETURN duration('PT0.000000789S').nanoseconds -- 789
-- Extreme years (outside Python's 1-9999 range)
RETURN localdatetime('+999999999-12-31T23:59:59')
-- Arithmetic
RETURN date('2024-01-01') + duration('P1M') -- 2024-02-01
RETURN duration.between(date('2020-01-01'), date('2024-01-01'))Load 100+ real-world graphs instantly:
from graphforge import GraphForge
from graphforge.datasets import load_dataset, list_datasets
db = GraphForge()
# Load any pre-registered dataset (auto-downloads and caches)
load_dataset(db, "snap-ego-facebook") # Facebook ego networks (SNAP)
load_dataset(db, "ldbc-snb-sf0.1") # Social network benchmark (LDBC)
load_dataset(db, "netrepo-karate") # Karate club (NetworkRepository)
# Browse available datasets
for ds in list_datasets(source="snap")[:3]:
print(f"{ds.name}: {ds.nodes:,} nodes, {ds.edges:,} edges")
# Analyze immediately
results = db.execute("""
MATCH (n)-[r]->()
RETURN n.id AS user, count(r) AS degree
ORDER BY degree DESC LIMIT 5
""")Available sources:
- SNAP (Stanford): 95 social, web, email, citation, and collaboration networks
- LDBC: 10 social network benchmark datasets with temporal data
- NetworkRepository: 10 pre-registered datasets
db = GraphForge("graph.db")
db.begin()
try:
db.execute("MATCH (p:Person {id: 123}) SET p.status = 'inactive'")
db.execute("CREATE (:AuditLog {action: 'deactivate', user_id: 123})")
db.commit()
except Exception:
db.rollback()
raise
finally:
db.close()GraphForge is built in four independent layers:
┌─────────────────────────────────────────────────┐
│ Parser cypher.lark + parser.py │ Cypher → AST
├─────────────────────────────────────────────────┤
│ Planner planner.py + operators.py │ AST → Logical plan
├─────────────────────────────────────────────────┤
│ Executor executor.py + evaluator.py │ Plan → Results
├─────────────────────────────────────────────────┤
│ Storage memory.py + sqlite_backend.py │ In-memory + SQLite
└─────────────────────────────────────────────────┘
Storage uses MessagePack for efficient binary encoding of graph properties. Persistence is a single SQLite file with WAL mode for durability.
# Install with dev dependencies
uv sync --dev
# Run all checks (mirrors CI)
make pre-push
# Run tests
uv run pytest tests/unit tests/integration
uv run pytest tests/tck/ -n auto # Full TCK (3,885 scenarios)
# Coverage
make coverage| Version | Focus | Status |
|---|---|---|
| v0.3.8 | Full TCK compliance (3,885/3,885) | Released |
| v0.3.9 | Performance: LALR parser, property indexes, bulk ingest, SQLite tuning, LIMIT short-circuit | Released |
| v0.3.10 | Analytics integration: NetworkX/igraph export, parse/plan cache | Planned |
| v0.4.0 | Native SNA algorithms: PageRank, betweenness, WCC, shortest path via CALL gf.algo.* |
Planned |
| v1.0 | Production-ready: thread safety, large graph support | Future |
See CHANGELOG.md for full release history.
MIT © David Spencer — see LICENSE for details.
Built on Lark, Pydantic, MessagePack, and the openCypher specification.