From 79e8657466bf75a37e14dc7302ed86b1c6062880 Mon Sep 17 00:00:00 2001 From: Sylvain Corlay Date: Thu, 23 Apr 2026 15:31:21 +0200 Subject: [PATCH] Fix typo fixes --- src/components/fundable/descriptions/Decimal32InArrowCpp.md | 2 +- .../fundable/descriptions/ParquetNullOptimizations.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/components/fundable/descriptions/Decimal32InArrowCpp.md b/src/components/fundable/descriptions/Decimal32InArrowCpp.md index b27adaa6..29c0bdff 100644 --- a/src/components/fundable/descriptions/Decimal32InArrowCpp.md +++ b/src/components/fundable/descriptions/Decimal32InArrowCpp.md @@ -2,7 +2,7 @@ Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics. -Fixed-width decimal data in Arrow is usually represented the Decimal128 data type. +Fixed-width decimal data in Arrow is usually represented by the Decimal128 data type. This data type has non-trivial memory costs (16 bytes per value) and computational costs (operations on 128-bit integers must be emulated on most if not all architectures). Arrow recently gained Decimal32 and Decimal64 data types which, as their names suggest, encode fixed-width decimal data more compactly. diff --git a/src/components/fundable/descriptions/ParquetNullOptimizations.md b/src/components/fundable/descriptions/ParquetNullOptimizations.md index fb2aed2d..77767d45 100644 --- a/src/components/fundable/descriptions/ParquetNullOptimizations.md +++ b/src/components/fundable/descriptions/ParquetNullOptimizations.md @@ -2,7 +2,7 @@ Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. Together with Apache Arrow for in-memory data, -it has become for the *de facto* standard for efficient columnar analytics. +it has become the *de facto* standard for efficient columnar analytics. While Parquet and Arrow are most often used together, they have incompatible physical representations of data with optional values: data where some values can be @@ -18,7 +18,7 @@ the data is declared nullable (optional) at the schema level. We propose to optimize the conversion of null values from Parquet in Arrow C++ for flat (non-nested) data: -1. decoding Parquet definition levels directly into a Arrow validity bitmap, rather than using an +1. decoding Parquet definition levels directly into an Arrow validity bitmap, rather than using an intermediate representation as 16-bit integers; 2. avoiding decoding definition levels entirely when a data page's statistics shows @@ -27,7 +27,7 @@ for flat (non-nested) data: As a subsequent task, these optimizations may be extended so as to apply to schemas with moderate amounts of nesting. -This work will benefit to applications using Arrow C++ or any of its language +This work will benefit applications using Arrow C++ or any of its language bindings (such as PyArrow, R-Arrow...). Depending on the typology of Parquet data, this could make Parquet reading 2x