How it works

The problem with traditional CSV viewers

Most tools that open CSV files work like this:

Open the file
Read every byte
Parse every row into strings
Store all rows in memory
Show the grid

For a 14 GB file this means 14+ GB of RAM consumed before you see a single row. On a machine with 16 GB RAM, this either crashes the app or triggers heavy swapping that makes the machine unusable.

How Columnar works instead

Columnar uses a lazy loading architecture with three components:

1. Memory-mapped file

The file is opened with mmap (memory mapping). The OS maps the file’s bytes into the process’s virtual address space without copying them into RAM. The file appears as a byte array your code can index into directly. The OS handles paging — only the parts you actually access are loaded from disk.

This means “opening” the file takes milliseconds regardless of size.

2. Byte-offset index

On open, a single pass through the file records the byte position of every row:

row 0 → byte 0
row 1 → byte 47
row 2 → byte 103
row 3 → byte 158
...
row 28,000,000 → byte 14,201,847,293

This index is a Vec<u64> — 8 bytes per row. For 28 million rows that is 225 MB, well under 1 GB on any modern machine.

The index is built using the csv crate’s ByteRecord iterator with position() to record offsets. This handles quoted newlines inside fields correctly — a naive newline scan would break on CSV files with embedded newlines.

3. On-demand row parsing

When the UI needs rows (because you scrolled, searched, or sorted), it calls get_page:

display positions 1000..1080
  → look up sorted_order[1000..1080]   (display → original row index)
  → look up row_offsets[original_idx]  (row index → byte position)
  → seek to that byte in the mmap
  → parse just those 80 rows
  → return

Parsing 80 rows takes microseconds. The grid stays smooth regardless of total file size.

Sort and search

Both sort and search require examining every row’s data for a given column. They do this with a single parallel pass using Rayon:

Sort: Extract the sort column from every row in parallel → build Vec<(row_idx, sort_key)> → parallel sort → write new display order.

Search: For each row in parallel → parse the row → check all cells for the query string → collect matching display positions.

One full-file pass instead of random access per comparison. On a 14 GB file this takes 5–15 seconds for sort, a few seconds for search.

Memory usage

Component	Memory
Row offset index	`row_count × 8 bytes`
Display order	`row_count × 4 bytes`
Visible rows (80)	`80 × col_count × avg_cell_size`
File mmap	Counted by OS, not process RSS

For 28 million rows: 225 MB (offsets) + 112 MB (order) = ~337 MB total working set. The 14 GB file itself does not count against your RAM.