Read (and eventually write) Apache Arrow and Parquet files to and from J. Uses C API.
-
Ensure that you have installed the Arrow GLib (C) Packages for your OS. Instructions can be found at: arrow.apache.org/install.
-
From your J session:
install 'github:interregna/JArrow@main'
load 'data/arrow' install 'github:interregna/JArrow@main'
load 'data/arrow'
readParquetTable '~addons/data/arrow/test/test1.parquet'
┌─┬───────────────┐
│a│0 1 2 3 4 5 6 7│
├─┼───────────────┤
│b│8 7 6 5 4 3 2 1│
└─┴───────────────┘
readsParquetTable '~addons/data/arrow/test/test2.parquet'
┌────────┬──────────┬────────┬─────────┬───────┬────────┬───────┬───────┬────────┬────────┬────────┬──────────┬──────────┬───────────┬────────────┬─────────┬─────────┬───────┬───────────────┐
│Column 1│Column Two│shortCol│ushortCol│intcCol│uintcCol│int_Col│uintCol│int16Col│int32Col│int64Col│float32Col│float64Col│longlongCol│ulonglongCol│DoubleCol│StringCol│boolCol│datetime64Col │
├────────┼──────────┼────────┼─────────┼───────┼────────┼───────┼───────┼────────┼────────┼────────┼──────────┼──────────┼───────────┼────────────┼─────────┼─────────┼───────┼───────────────┤
│0 │ 100 │0 │0 │0 │100 │100 │100 │300 │500 │100 │ 600 │ 700 │100 │100 │ 100 │This │1 │946684800000000│
│1 │88.75 │1 │1 │1 │ 88 │ 90 │ 88 │263 │443 │ 88 │531.25 │613.75 │ 88 │ 88 │88.75 │ is │0 │946771200000000│
│2 │ 77.5 │2 │2 │2 │ 77 │ 80 │ 77 │227 │387 │ 77 │ 462.5 │ 527.5 │ 77 │ 77 │ 77.5 │all │0 │946857600000000│
│3 │66.25 │3 │3 │3 │ 66 │ 70 │ 66 │191 │331 │ 66 │393.75 │441.25 │ 66 │ 66 │66.25 │ valid │0 │946944000000000│
│4 │ 55 │4 │4 │4 │ 55 │ 60 │ 55 │155 │275 │ 55 │ 325 │ 355 │ 55 │ 55 │ 55 │text │1 │947030400000000│
│5 │43.75 │5 │5 │5 │ 43 │ 50 │ 43 │118 │218 │ 43 │256.25 │268.75 │ 43 │ 43 │43.75 │ │0 │947116800000000│
│6 │ 32.5 │6 │6 │6 │ 32 │ 40 │ 32 │ 82 │162 │ 32 │ 187.5 │ 182.5 │ 32 │ 32 │ 32.5 │data. │0 │947203200000000│
│7 │21.25 │7 │7 │7 │ 21 │ 30 │ 21 │ 46 │106 │ 21 │118.75 │ 96.25 │ 21 │ 21 │21.25 │ │0 │947289600000000│
└────────┴──────────┴────────┴─────────┴───────┴────────┴───────┴───────┴────────┴────────┴────────┴──────────┴──────────┴───────────┴────────────┴─────────┴─────────┴───────┴───────────────┘
readCSVTable '~addons/data/arrow/test/test1.csv'
┌──┬───────────────────────────...
│ID│1 2 3 4 5 8 10 11 12 14 15 ...
├──┼───────────────────────────...
│y │100.669 100.669 100.669 100...
└──┴───────────────────────────...
NB. Note this is json-line format, not json-format. See: https://jsonlines.org
readsJsonTable'~Jaddons/data/arrow/test/test1.json'
┌───────┬──────────┐
│name │date │
├───────┼──────────┤
│Gilbert│12-13-2014│
│Alexa │09-04-1983│
│May │01-01-1924│
│Deloise│04-25-1894│
└───────┴──────────┘
readsFeatherTable '~addons/data/arrow/test/test1.feather'
┌────┬───┬──────┐
│team│pos│points│
├────┼───┼──────┤
│A │G │17 │
│A │F │17 │
│B │G │15 │
│B │F │ 5 │
│C │G │11 │
│C │F │10 │
│D │G │ 5 │
│D │F │14 │
└────┴───┴──────┘(6!:16) and (6!:17) can be used to convert Arrow datetime64 types to and from ISO 8601 format (e.g. 2000-01-11T22:58:04).
fromdate32 can be used to convert Arrow date32 types to YYYY M D tuples.
readsTable minimizes display time in the UI but uses more space
readTable minimizes space but can take more time to display
-
In Jqt, identify your path for ~Projects jpath '~Projects'
-
Git clone the JArrow repo within ~Projects
-
Restart Jqt and open the Arrow project Project > Open > Projects > jarrow
-
Re-build the addon. Ctrl + F9
-
Run the addon. F9 (Re-build addon scripts, reload and run tests)
Examples:
see test/test1.ijs
- Error catching for empty pointers, missing files, and general errors.
- Dereference / cleanup gobjects and allocated memory
- Additional data types
- Dictionaries (need to store lookup tables)
- Lists
- Maps
- Tensors
- Documentation (see: ~/addons/gui/cobrowser/scriptdoc.ijs)
- CSV reader
- JSONL reader
- Arrow Feather (IPC v1) reader
- IPC files (".arrow" files) —
readArrowTable,writeArrowTable - IPC streams (".arrows" files) —
readArrowsTable,writeArrowsTable - Feather v2 writer —
writeFeatherTable(alias ofwriteArrowTable) - Parquet writer —
writeParquet - Flight client
- Flight server
- Non-local filesystems (S3)
- IPC streaming with event-driven calls
writeArrowTable tablePtr;'~out.arrow' NB. IPC file format
writeArrowsTable tablePtr;'~out.arrows' NB. IPC streaming format
writeFeatherTable tablePtr;'~out.feather' NB. Feather v2 (= IPC file)
writeParquet tablePtr;'~out.parquet' NB. ParquetA small terse vocabulary giving JArrow a consistent handle-based columnar interface for J users.
NB. handle open / close / schema / read / project (gaps 1, 2, 7, 10)
h=: ho '~data.feather' NB. auto-detects format from extension
hs h NB. Arrow schema
tbl=: hr h NB. read full table
tbl=: 1000 hr h NB. read first 1000 rows
tbl=: ('price';'qty') hp h NB. project columns
hc h NB. close
NB. stream table fallback over .arrows files (gap 5)
sh=: sbo '~stream.arrows'
b=: sbn sh
sbc sh
NB. Arrow C ABI bridge — zero-copy hand-off (gap 3)
'sa aa'=: ax tbl NB. pending per-type J→Arrow builders
tbl=: ai sa;aa NB. import the same pair
NB. compute kernels via Arrow function registry (gap 6)
afsum col NB. sum kernel
afmean col / afmin col / afmax col / afcount col
'percentile_99' afk col NB. any registered kernel by name
NB. CSV / JSONL writers (gap 8)
tbl wcsv '~out.csv'
tbl wjsonl '~out.jsonl'
NB. Custom extension types (gap 9)
extreg '' NB. register quant4 / quant8 / embed_f32
extq4 codes; scale; zero NB. decode quant4 to floatsThe verb prefixes are the vocabulary:
| prefix | category | gap |
|---|---|---|
h |
handle: open / read / close | 1, 2, 7, 10 |
sb |
stream batches | 5 |
a |
Arrow C ABI bridge | 3 |
md |
schema metadata | 4 |
af |
Arrow function (compute) | 6 |
wcsv |
CSV / JSONL write | 8 |
ext |
Custom extension types | 9 |
Some verbs are skeletons that depend on extra GLib bindings. JArrow now attempts those optional binds at load time and leaves unsupported symbols inactive on older Arrow GLib installations.
sbo/sbn currently use a safe single-table fallback over
readArrowsTable. The lower-level GLib read_next iterator crashes this
J console build, so the public verb avoids that native path.
ax is intentionally guarded with a controlled assertion until the per-type
J table to GArrowArray builders are implemented. This avoids the prior crash
path through an unfinished read_next batch-reader bridge.
The IPC file (.arrow) and Feather v2 (.feather) formats are the
random-access-preserving choice. .arrows (streaming, no footer) is
the right shape for pipes and sockets. Parquet is the cross-ecosystem
format.
- The IPC reader verbs
readArrowTable,readArrowsTable, andreadFileBufferTablewere defined with=.(local) instead of=:(global) — the public-interface transfers block at the bottom ofarrow.ijssilently failed to expose them. Now fixed; these verbs are reachable as documented. - Added IPC writer wrappers; writer close is used as the flush point before deallocation.