doc: add storing data with memcs chapter by maryiaLichko · Pull Request #5683 · tarantool/doc

maryiaLichko · 2026-06-04T10:21:17Z

Deployment: https://docs.d.tarantool.io/en/doc/doc-memcs-engine/platform/engines/memcs/

done with AI help

Gumix · 2026-06-04T16:21:48Z

+* Apache Arrow support — data can be exported in Arrow format without conversion, enabling zero-copy interoperability.
+* Dictionary encoding — reduces memory usage for string columns with repeated values.
+* `LZ4 <https://en.wikipedia.org/wiki/LZ4_(compression_algorithm)>`_ compression — compresses column data to reduce memory footprint.
+* SQL integration — supports querying via Tarantool SQL engine.


SQL is not supported by memcs.

Gumix · 2026-06-04T16:26:04Z

+- Boolean: ``boolean``
+- Temporal types: ``datetime``
+- UUID: ``uuid``
+- Decimal: ``decimal``


decimal, decimal32, decimal64, decimal128, and decimal256

Gumix · 2026-06-04T16:27:00Z

+- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``
+- Floating-point types: ``double``, ``float``
+- Strings: ``string``
+- Boolean: ``boolean``


Booleans aren't supported correctly, let's remove them from here.

Gumix · 2026-06-04T16:28:14Z

+
+MemCS supports a wide range of data types, including:
+
+- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``


int8, uint8, int16, uint16, int32, uint32, int64, uint64

Gumix · 2026-06-04T16:28:44Z

+MemCS supports a wide range of data types, including:
+
+- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``
+- Floating-point types: ``double``, ``float``


float32 and float64

Gumix · 2026-06-04T16:30:31Z

+- Strings: ``string``
+- Boolean: ``boolean``
+- Temporal types: ``datetime``
+- UUID: ``uuid``


Not supported.

Gumix · 2026-06-04T16:31:48Z

+
+- ``plain`` — default layout, no encoding
+- ``null_rle`` — RLE encoding for nullable fields
+


and dict layout

Gumix · 2026-06-04T16:33:32Z

+
+.. _memcs-memory:
+
+Memory Consumption


Let's remove from here till the end, it looks too much AI.

Gumix · 2026-06-04T16:41:51Z

+- Dictionaries only grow — previously produced batches remain compatible
+
+.. _memcs-column:
+


Please add a chapter "RLE encoding of NULLs":

By default, NULL values are stored explicitly and use up the same space as any other valid column value (1, 2, 4, 8, 16 or 32 bytes depending on an exact field type), however RLE encoding of NULLs is also supported (null_rle). For reference, RLE-encoding of a column with 90% evenly distributed NULL values reduces memory consumption of that column by around 5 times.

Gumix · 2026-06-04T16:51:02Z

+..  admonition:: Enterprise Edition
+    :class: fact
+
+The `memcs` engine uses a single-threaded transaction processor (TX thread), similar to `memtx`. However, unlike `memtx`,


Please change to something like:

The memcs engine uses a single-threaded transaction processor (TX thread), similar to memtx, and stores data in the memtx arena but in contrast to memtx it doesn’t organize data in tuples. Instead, it stores data in columns. Each format field is assigned its own BPS tree-like structure (BPS vector), which stores values only of that field. If the field type fits in 32 bytes, raw field values are stored directly in tree leaves without any encoding. The strings are stored in the format similar to "Arrow Variable-size Binary View Layout", also called "German Strings".

The main benefit of such data organization is a significant performance boost of columnar data sequential scans compared to memtx thanks to CPU cache locality. That’s why memcs supports a special C api for such columnar scans: see box_index_arrow_stream() and box_raw_read_view_arrow_stream(). Peak performance is achieved when scanning embedded field types.

Querying full tuples, like in memtx, is also supported, but the performance is worse compared to memtx, because a tuple has to be constructed on the runtime arena from individual field values gathered from each column tree.

Other features include:

Point lookup.

Stable iterators.

Insert / replace / delete / update.

Batch insertion in the Arrow format.

Transactions, including cross-engine transactions with memtx (with memtx_use_mvcc_engine = false).

Read view support.

Secondary indexes with an ability to specify covered columns and sequentially scan indexed + covered columns.

maryiaLichko self-assigned this Jun 4, 2026

maryiaLichko added the memcs label Jun 4, 2026

github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:22 Destroyed

github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:33 Destroyed

github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 10:44 Destroyed

doc: add storing data with memcs chapter

4719c89

maryiaLichko force-pushed the doc-memcs-engine branch from 9355c21 to 4719c89 Compare June 4, 2026 10:58

github-actions Bot temporarily deployed to branch-doc-memcs-engine June 4, 2026 11:00 Destroyed

doc: add storing data with memcs chapter

5471e9c

github-actions Bot deployed to branch-doc-memcs-engine June 4, 2026 11:05 View deployment

Gumix reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: add storing data with memcs chapter#5683

doc: add storing data with memcs chapter#5683
maryiaLichko wants to merge 2 commits into
latestfrom
doc-memcs-engine

maryiaLichko commented Jun 4, 2026 •

edited

Loading

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Gumix Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		MemCS supports a wide range of data types, including:

		- Integer types: ``uint64``, ``int64``, ``uint32``, ``int32``


		- ``plain`` — default layout, no encoding
		- ``null_rle`` — RLE encoding for nullable fields

		- Dictionaries only grow — previously produced batches remain compatible

		.. _memcs-column:

Conversation

maryiaLichko commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maryiaLichko commented Jun 4, 2026 •

edited

Loading