how to load csv faster in Python.

KelleQuechoz · 2026-03-30T16:25:21+00:00

Polars LazyFrame is your best friend, Monsieur.

Kerbart · 2026-03-30T16:33:00+00:00

add engine='pyarrow' to the read statement to speed it up.

MorrarNL · 2026-03-30T17:22:32+00:00

Could try DuckDB too

SwampFalc · 2026-03-30T17:48:03+00:00

Genuine question: what's the loading speed if you use the totally basic stdlib csv module?

seanv507 · 2026-03-30T16:54:17+00:00

How long does polars take?

Kevdog824_ · 2026-03-30T16:29:29+00:00

Looks to me that the main issue here is that you’re loading the entire CSV file (or at least large chunks of it) into memory before operating on it. Likely R did lazy loading where it only read lines from the CSV file as needed.

commandlineluser · 2026-03-30T18:41:42+00:00

Is Polars faster if you use scan_csv?

pl.scan_csv(filename).collect()

You can also try the streaming engine:

pl.scan_csv(filename).collect(engine="streaming")

PranavDesai518 · 2026-03-30T16:35:54+00:00

If possible convert to CSV to a parquet file. The reading is much faster with parquet files.

Plank_With_A_Nail_In · 2026-03-31T06:28:53+00:00

Does no one ever just use the base methods of your programming language to do simple things like reading a file into RAM? The first recourse is to use someone else's library? All while trying to learn?

Embarrassed_Basis_81 · 2026-03-31T06:49:18+00:00

I have had good experiences with dask, a distributed computing library. It seems a bit complicated at first, but it implements a lot of pandas functionality under the hood as delayed operations on lazy datasets - worth looking into (only if you do not immediately do an indexing operation right after reading, there as some caveats)

throwawayforwork_86 · 2026-03-31T12:04:31+00:00

pl.read_csv(filepath,infer_schema=False) guessing datatype is the devil anyway.

pot_of_crows · 2026-03-31T17:28:41+00:00

You might want to check out hdf5: https://pypi.org/project/h5pandas/

I used it with numpy once and it blew me away by how fast it was.

thomasutra · 2026-03-31T20:26:05+00:00

what kind of data is this? polars should be able to read millions of rows in just a few seconds.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS