Skip to content

DB v4 progress#2385

Draft
ljeub-pometry wants to merge 426 commits intomasterfrom
db_v4
Draft

DB v4 progress#2385
ljeub-pometry wants to merge 426 commits intomasterfrom
db_v4

Conversation

@ljeub-pometry
Copy link
Copy Markdown
Collaborator

What changes were proposed in this pull request?

Progress towards the new version of the underlying storage

Why are the changes needed?

Does this PR introduce any user-facing change? If yes is this documented?

How was this patch tested?

Are there any further changes required?

ljeub-pometry and others added 30 commits October 17, 2025 12:27
# Conflicts:
#	Cargo.lock
#	db4-graph/src/lib.rs
#	raphtory/Cargo.toml
ljeub-pometry and others added 9 commits April 16, 2026 17:00
* make sure we cancel all tasks when the running server is dropped

* update optd
remove ui test submodule
Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4.
- [Release notes](https://github.com/docker/setup-buildx-action/releases)
- [Commits](docker/setup-buildx-action@v3...v4)

---
updated-dependencies:
- dependency-name: docker/setup-buildx-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v4...v8)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](docker/build-push-action@v6...v7)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ben Steer <b.a.steer@qmul.ac.uk>
Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](docker/login-action@v3...v4)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 20, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
8 out of 9 committers have signed the CLA.

✅ fabubaker
✅ ljeub-pometry
✅ wyatt-joyner-pometry
✅ fabianmurariu
✅ arienandalibi
✅ miratepuffin
✅ shivamka1
✅ rachchan
❌ dependabot[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

miratepuffin and others added 20 commits April 20, 2026 12:05
* Exposed heavy queries and exclusive writes

* Add graphql schema flags

* Added batch controls

* Added flags for disabling lists

* add new flags to python

* Add tests

* chore: apply tidy-public auto-fixes

* Finish tests

* Final tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* switch connected components to use union find

* remove old imports

* remove extra python wrapper

* remove unused imports

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* make sure we cancel all tasks when the running server is dropped

* update optd

* add domain for NodeOp

* avoid unnecessarily re-filtering the domain when it is correct

* remove accidental pyo3 import

* should call list_filtered in nodes

* const_value_in_domain should be the same as const_value by default

* fix the dynamic nodeop implementation

* optimisations for and/or

* more domain optimisations

* add optimisations for id filter as well

* update optd

---------

Co-authored-by: Ben Steer <b.a.steer@qmul.ac.uk>
* init rbac

* impl introspection

* add introspection for schema as well, add test

* impl raphtory-auth member mod, impl permissions as gql apis, fix tests

* fix circular dep

* fmt

* fix build

* fix tests

* impl permissions read gql, add tests

* filters

* fix test, fmt

* fix test

* fix workflow

* fix tests

* chore: apply tidy-public auto-fixes

* impl heirarchy based permissions, fix tests

* ren "a" to "access"

* chore: apply tidy-public auto-fixes

* ref

* ref

* ref

* fmt

* add client tests

* skip none

* introspect should allow access to metagraph only to avoid loading graphs only for introspect

* chore: apply tidy-public auto-fixes

* impl graph metadata gql api, add tests

* gate vectorised graph and receive graph behind read gate

* impl namespace_permissions, add/fix tests, update postman collection

* ref

* ref

* fix permissions

* ref

* best match namespace permission resolution

* ref

* fix at least read/write

* intro levels and ordering of permissions

* remove fail-open in require_jwt_write_access

* req ns write perm

* gate permissions api

* ref

* rid dead results

* add discover tests

* change fail open to fail close, add test

* rid wild card, add test

* fix inference issues

* fix inference, add test

* impl filtered receiveGraph

* fix clam-core version

* make rbac explicit by passing permissions store, fix tests

* GraphAccessFilter is now a OneOfInput enum

* ref

* remove default PermissionsPlugin from schema; add conditional RBAC registration via AtomicBool entrypoints and OneOfInput GraphAccessFilter with And/Or composability

* raphtory-graphql/src/main.rs deleted. raphtory-server is now the single binary entry point.

* fix: permission denial in graph/graphMetadata returns null not error, remove unused GqlGraph::permission field

* fmt

* fix error messages

* intro AuthPolicyError

* fix tests

* support RSA

* use raphtory-server binary in stress test workflow

* fix test

* fix tests

* chore: apply tidy-public auto-fixes

* update clam-core submodule to 0.18.0

* fmt

* fix test

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…#2588)

* Change temporal values from strings to prop values and add aggregates

* fmt

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Change temporal values from strings to prop values and add aggregates

* fmt

* Add tests for untested functions and added nodes.ids to list of blocked calls when lists are disabled

* Adding some missing graphql docs

* update graphql docs, create GraphqlNodeID

* added option to change graph semantics

* fix warnings and a couple of missing apis

* final fixes

* fix rust tests

* minor test fixes

* Naive datetimes converted to UTC

* fix breaking test

* add eq method for PyProp

* fix test

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* add fast path for last on properties

* some fixes for failing tests

* fix last and last_before for in memory segment

* remove TProp

* fixes for remaining failing tests

* fix some of the last_window issue and support Int32,UInt32 and UInt64 as time column

* avoid some work when loading edges

* variant of last_window for persistent graph

* chore: apply tidy-public auto-fixes

* update of last_window for tprop node persistent graph

* update of temporal_value for tprop edge

* add benchmark for temporal_value

* fixes post benchmark

* minor changes related to edge lists in PS

* remove useless print

* further improvements for last, removing allocations

* call is_empty on TCell

---------

Co-authored-by: Lucas Jeub <lucas.jeub@pometry.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Updating Parquet* structs to support manually passed export vids, eids, and layer_ids

* Allowed IDs to be passed to parquet serialization. Will allow us to pre-compute new IDs and turn them into RecordBatches

* Changed Parquet encoding to take GraphView instead of GraphStorage. Lock the graph to get parallel iterators over edges. We filter to respect GraphView filtering behaviour.

* Fixed node and edge parallel iterator creation

* Making the parquet encoders generic over the writer (now sink). We still use ArrowWriter<File> for now, but we will add support for loading into a graph

* Changed Parquet writer from ArrowWriter to generic sink for nodes, edges, and graph.

* Fixed possible ParquetDelEdge layer_id and layer_name issues by calling explode_layers() on each EdgeView.

* Fixed path error

* Made all the encode_* functions generic over the sink. A sink factory function can now be passed to these functions to determine how the sinks will be created. This will allow us to pass a sink which is a crossbeam_channel to send RecordBatches elsewhere.

* Adding Receiver side on materialize

* Hid new materialize behind IO feature and added a test to test the new materialize function

* Adding logic to ingest data using load_*_from_df functions

* Fixed deadlock. It had to do with LayerMappers being shared between edge_meta and node_meta.

* Removed unused variable bindings

* Fixed deadlock caused by DictMapper deep_clone not creating a new lock and reusing the old one.

* Working on making materialize stream RecordBatches properly instead of encoding everything and then ingesting everything (which would keep everything in memory at once).

* Changed std::thread::scope for a rayon::scope

* Added a test that times the old and new materialize functions

* Debugging materialize_using_recordbatches to see why it freezes/hangs when run on a big graph.

* Changed to make encoding using its own thread pool and ingestion use another thread pool.

* Switched materialize test to use graph paths and have disk backed storage so that it doesn't run out of memory

* Improved ingestion time on the "load_*_from_df" path by avoiding rescanning each segment for each row. Now using this path in the new materialize_using_recordbatches function.

* Switched assert_graph_equals to be parallel instead of multi-threaded as much as possible

* Rustfmt

* Use graph_equals instead of our custom GraphSummary. Update tests to separate out running materialize and parquet decoding. Test using SF10 for now.

* Set up environment variables to configure database properly before materialize test.

* Added Jemalloc

* Removed some unnecessary #[cfg(feature = "io")] gates. Use constants for parquet encoded column names.

* Added a test to time loading SF10 dumped parquet files using the df_loader functions

* Brought zips back in df_loaders/edges.rs for passing data such as vids, eids, flags, etc... to helper functions

* Removed flushing of graph before ingesting RecordBatch in df_loaders. General cleanup

* Removed unused imports, changed jemalloc to only be used on MacOS, and changed std::thread::scope for rayon::scope.

* Moving df_loaders out of io feature

* Move LOAD_POOL out of "io" feature

* Move ENCODE_POOL out of "io" feature

* Removing some #[cfg(feature = "io")] gates related to materialize_using_recordbatches

* Moved folder from serialise::parquet out of serialise folder (so out of "io" feature). Added serialise::parquet.rs file for everything that couldn't be moved out because it depended on dependencies from io feature.

* Fixed feature gating behind io and progress

* Moved SNB SF1, SF3, SF10 tests to their own separate file

* Added test for a filtered graph

* Renamed parquet folder to parquet_encoder

* Fixed encoders to pass relevant information in NodesT, EdgesC, and EdgesT. This includes Node GIDs and node types. Propagated changes to materialize_using_recordbatches. Filtered test passes.

* Lower channel size

* Fixes after merge

* Fixed test

* Fixed io feature gating

* Added layer creation before creating the temporal graph to ensure empty layers are created.

* Updated edges iteration in parquet encoders so that EIDs get resolved compacted for each layer. This saves a lot of disk space when saved to a directory.

* Clean up after filtered sf1 test

* No need to set the env vars for raphtory settings, they are imported and copied from the graph on disk.

* Added layer names to the parquet files to avoid filename collision when creating the arrow writer for parquet encoding.

* Cleaned up test_materialize.rs imports

* Switched old materialize for the new one to run tests

* Fix bug in resolve_node_and_meta_for_node_col where nodes were not being resolved, only looked up, which was causing failures related to metadata not being added for nodes that haven't already been resolved.

* Materialize edge deletions before edge c props (edge metadata) to fix materialization bug regarding persistent graphs

* Attempting to fix temporal properties not being serialized properly on persistent graphs

* Got rid of layer_n in parquet filenames. They were causing problems with ordering of parquet files when loading data. Instead, we now have atomically incrementing counters for file ids.

* Preserve property mappers in materialize

* Fix bugs in materialize. Switch rayon::scope for std::thread::scope to avoid a deadlock when the scope's num_threads is 1. Removed resizing of segments to the max eid to avoid empty segments when a graph is filtered. This was leading to empty graph errors.

* Remove sf3 paths in test_materialize_sf10.rs,

* Remove channel for producer in materialize

* Added flag to resolve nodes when materializing in load_node_props_from_df, and internalise otherwise

* First try at is_materializing flag in load_node_props_from_df

* Fixed test_materialize_sf10.rs feature gating on imports

* Added t_len for NodeStorageInner

* Clean up imports a lil

* Fix normalise_temporal_map not properly defining a stable deterministic ordering for events at the same timestamp for Prop::List (Vec and Array should be the same) and Prop::Map (ordering of elements should be stable, previously depended on HashMap iteration order which is undefined).

* Added edge.properties().temporal().iter_ids() and used it in the serialization of ParquetTEdge. Cleaned up materialize tests so that they don't try to call an "old" materialize anymore

* Clean up test file

* Get rid of old materialize

* Revert edge endpoint VID parquet column names to "rap_src_id" and "rap_dst_id". GIDS are now "rap_src_id" and "rap_dst_id". This is inconsistent with other column's naming scheme, but it is backwards compatible with already encoded parquet files.

* Changing parquet column names so they're consistent

* Update parquet files

---------

Co-authored-by: Lucas Jeub <lucas.jeub@pometry.com>
* add edge id to test query to make sure the sorting works (test should not depend on the order of edges)

* add sorting for neighbour ids
* make sure we cancel all tasks when the running server is dropped

* update optd

* add domain for NodeOp

* avoid unnecessarily re-filtering the domain when it is correct

* changes to better support Bn edge sized graphs

fixing last compile error

track count temporal edges

* remove accidental pyo3 import

* small import updates

* should call list_filtered in nodes

* const_value_in_domain should be the same as const_value by default

* possible improvements to UI for very large graphs

* still need to check that the edge exists in the layer, even if we have the edge ref already

* no optimisation in with_debug as they make debugging more annoying

* filtering by node is really bad for window so change this back

* fix materialize double-adding temporal edges

* for a persistent graph the update history and properties for exploded edges are not the same

* need to look at explode() for history on persistent graphs

* attempt at faster node_valid

* include updates from static graph in node_valid check for layers

* cleanup

* fix search feature

* make component test easier to debug on failure

* add our own union find implementation based on the old connected components algorithm (maybe can be optimised but at least it seems correct)

* clean up dependencies

* storage dependency is definitely used

* avoid compiling the vectors feature in benchmarks unless it is actually needed

* implement has_layer_inner directly

* optimise last for filtered additions

* add fast path for getting edge ref out again

* attempt to optimise SVM

* use optimised active check

* some inlines

* minimise the size of the MemEdgeRef while still including src/dst information

* add src/dst to MemEdgeEntry as well

* remove sorted_vector_map dependency and clean up

* no real reason to capture src/dst on the MemEdgeRef/MemEdgeEntry as these should be cheap to look up

* fix subgraph filtering

* chore: apply tidy-public auto-fixes

* more optimisations for windowing

* cleanup

* remove dbg

* when working with disk storage, in-memory references don't always exist

* minor cleanup

* bring num_nodes up to speed

* more fixes for layered graphs

* replace some kmerge with fast_merge

* more optimisations for windowing

* add check for filtering that excludes layer

---------

Co-authored-by: Ben Steer <b.a.steer@qmul.ac.uk>
Co-authored-by: Fabian Murariu <murariu.fabian@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* make sure we cancel all tasks when the running server is dropped

* update optd

* add domain for NodeOp

* avoid unnecessarily re-filtering the domain when it is correct

* changes to better support Bn edge sized graphs

fixing last compile error

track count temporal edges

* remove accidental pyo3 import

* small import updates

* should call list_filtered in nodes

* const_value_in_domain should be the same as const_value by default

* possible improvements to UI for very large graphs

* still need to check that the edge exists in the layer, even if we have the edge ref already

* no optimisation in with_debug as they make debugging more annoying

* filtering by node is really bad for window so change this back

* fix materialize double-adding temporal edges

* for a persistent graph the update history and properties for exploded edges are not the same

* need to look at explode() for history on persistent graphs

* attempt at faster node_valid

* include updates from static graph in node_valid check for layers

* cleanup

* fix search feature

* make component test easier to debug on failure

* add our own union find implementation based on the old connected components algorithm (maybe can be optimised but at least it seems correct)

* clean up dependencies

* storage dependency is definitely used

* avoid compiling the vectors feature in benchmarks unless it is actually needed

* implement has_layer_inner directly

* optimise last for filtered additions

* add fast path for getting edge ref out again

* attempt to optimise SVM

* use optimised active check

* some inlines

* minimise the size of the MemEdgeRef while still including src/dst information

* add src/dst to MemEdgeEntry as well

* remove sorted_vector_map dependency and clean up

* no real reason to capture src/dst on the MemEdgeRef/MemEdgeEntry as these should be cheap to look up

* fix subgraph filtering

* chore: apply tidy-public auto-fixes

* more optimisations for windowing

* cleanup

* remove dbg

* when working with disk storage, in-memory references don't always exist

* minor cleanup

* bring num_nodes up to speed

* more fixes for layered graphs

* replace some kmerge with fast_merge

* more optimisations for windowing

* add check for filtering that excludes layer

* make list properties always return numpy arrays

---------

Co-authored-by: Ben Steer <b.a.steer@qmul.ac.uk>
Co-authored-by: Fabian Murariu <murariu.fabian@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Test node types are served correctly by the server

* Run fmt

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add read only version of graph to allow python access
Add explicit flush for graph
Add fix for metadata in namespace

* tidy

* tidy

* Read only graph

* Test metadata

* chore: apply tidy-public auto-fixes

* Patch the cache

* read only index

* Adding tests for metadata segments

* added new tests

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add read only version of graph to allow python access
Add explicit flush for graph
Add fix for metadata in namespace

* tidy

* tidy

* Read only graph

* Test metadata

* chore: apply tidy-public auto-fixes

* Patch the cache

* read only index

* Adding tests for metadata segments

* added new tests

* chore: apply tidy-public auto-fixes

* Fixes for check metadata

* Function names

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants