Skip to content

doc: add storing data with memcs chapter#5683

Open
maryiaLichko wants to merge 2 commits into
latestfrom
doc-memcs-engine
Open

doc: add storing data with memcs chapter#5683
maryiaLichko wants to merge 2 commits into
latestfrom
doc-memcs-engine

Conversation

@maryiaLichko

@maryiaLichko maryiaLichko commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@maryiaLichko maryiaLichko self-assigned this Jun 4, 2026
@github-actions github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:22 Destroyed
@github-actions github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:33 Destroyed
@github-actions github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:44 Destroyed
@github-actions github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 11:00 Destroyed
* Apache Arrow support — data can be exported in Arrow format without conversion, enabling zero-copy interoperability.
* Dictionary encoding — reduces memory usage for string columns with repeated values.
* `LZ4 <https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)>`_ compression — compresses column data to reduce memory footprint.
* SQL integration — supports querying via Tarantool SQL engine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL is not supported by memcs.

- Boolean: ``boolean``
- Temporal types: ``datetime``
- UUID: ``uuid``
- Decimal: ``decimal``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal, decimal32, decimal64, decimal128, and decimal256

- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``
- Floating-point types: ``double``, ``float``
- Strings: ``string``
- Boolean: ``boolean``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booleans aren't supported correctly, let's remove them from here.


MemCS supports a wide range of data types, including:

- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int8, uint8, int16, uint16, int32, uint32, int64, uint64

MemCS supports a wide range of data types, including:

- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``
- Floating-point types: ``double``, ``float``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float32 and float64

- Strings: ``string``
- Boolean: ``boolean``
- Temporal types: ``datetime``
- UUID: ``uuid``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not supported.


- ``plain`` — default layout, no encoding
- ``null_rle`` — RLE encoding for nullable fields

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and dict layout


.. _memcs-memory:

Memory Consumption

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove from here till the end, it looks too much AI.

- Dictionaries only grow — previously produced batches remain compatible

.. _memcs-column:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a chapter "RLE encoding of NULLs":

By default, NULL values are stored explicitly and use up the same space as any other valid column value (1, 2, 4, 8, 16 or 32 bytes depending on an exact field type), however RLE encoding of NULLs is also supported (null_rle). For reference, RLE-encoding of a column with 90% evenly distributed NULL values reduces memory consumption of that column by around 5 times.

.. admonition:: Enterprise Edition
:class: fact

The `memcs` engine uses a single-threaded transaction processor (TX thread), similar to `memtx`. However, unlike `memtx`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to something like:

The memcs engine uses a single-threaded transaction processor (TX thread), similar to memtx, and stores data in the memtx arena but in contrast to memtx it doesn’t organize data in tuples. Instead, it stores data in columns. Each format field is assigned its own BPS tree-like structure (BPS vector), which stores values only of that field. If the field type fits in 32 bytes, raw field values are stored directly in tree leaves without any encoding. The strings are stored in the format similar to "Arrow Variable-size Binary View Layout", also called "German Strings".

The main benefit of such data organization is a significant performance boost of columnar data sequential scans compared to memtx thanks to CPU cache locality. That’s why memcs supports a special C api for such columnar scans: see box_index_arrow_stream() and box_raw_read_view_arrow_stream(). Peak performance is achieved when scanning embedded field types.

Querying full tuples, like in memtx, is also supported, but the performance is worse compared to memtx, because a tuple has to be constructed on the runtime arena from individual field values gathered from each column tree.

Other features include:

  • Point lookup.
  • Stable iterators.
  • Insert / replace / delete / update.
  • Batch insertion in the Arrow format.
  • Transactions, including cross-engine transactions with memtx (with memtx_use_mvcc_engine = false).
  • Read view support.
  • Secondary indexes with an ability to specify covered columns and sequentially scan indexed + covered columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dictionary-encoded columns in MemCS

2 participants