How it works
The problem with traditional CSV viewers
Section titled “The problem with traditional CSV viewers”Most tools that open CSV files work like this:
- Open the file
- Read every byte
- Parse every row into strings
- Store all rows in memory
- Show the grid
For a 14 GB file this means 14+ GB of RAM consumed before you see a single row. On a machine with 16 GB RAM, this either crashes the app or triggers heavy swapping that makes the machine unusable.
How Columnar works instead
Section titled “How Columnar works instead”Columnar uses a lazy loading architecture with three components:
1. Memory-mapped file
Section titled “1. Memory-mapped file”The file is opened with mmap (memory mapping). The OS maps the file’s bytes into the process’s virtual address space without copying them into RAM. The file appears as a byte array your code can index into directly. The OS handles paging — only the parts you actually access are loaded from disk.
This means “opening” the file takes milliseconds regardless of size.
2. Byte-offset index
Section titled “2. Byte-offset index”On open, a single pass through the file records the byte position of every row:
row 0 → byte 0row 1 → byte 47row 2 → byte 103row 3 → byte 158...row 28,000,000 → byte 14,201,847,293This index is a Vec<u64> — 8 bytes per row. For 28 million rows that is 225 MB, well under 1 GB on any modern machine.
The index is built using the csv crate’s ByteRecord iterator with position() to record offsets. This handles quoted newlines inside fields correctly — a naive newline scan would break on CSV files with embedded newlines.
3. On-demand row parsing
Section titled “3. On-demand row parsing”When the UI needs rows (because you scrolled, searched, or sorted), it calls get_page:
display positions 1000..1080 → look up sorted_order[1000..1080] (display → original row index) → look up row_offsets[original_idx] (row index → byte position) → seek to that byte in the mmap → parse just those 80 rows → returnParsing 80 rows takes microseconds. The grid stays smooth regardless of total file size.
Sort and search
Section titled “Sort and search”Both sort and search require examining every row’s data for a given column. They do this with a single parallel pass using Rayon:
Sort: Extract the sort column from every row in parallel → build Vec<(row_idx, sort_key)> → parallel sort → write new display order.
Search: For each row in parallel → parse the row → check all cells for the query string → collect matching display positions.
One full-file pass instead of random access per comparison. On a 14 GB file this takes 5–15 seconds for sort, a few seconds for search.
Memory usage
Section titled “Memory usage”| Component | Memory |
|---|---|
| Row offset index | row_count × 8 bytes |
| Display order | row_count × 4 bytes |
| Visible rows (80) | 80 × col_count × avg_cell_size |
| File mmap | Counted by OS, not process RSS |
For 28 million rows: 225 MB (offsets) + 112 MB (order) = ~337 MB total working set. The 14 GB file itself does not count against your RAM.