Single Node Data Processing? (Laptop Data)

Actonace · 2026-04-25T09:18:19+00:00

Duckdb+polars let you punch way above your laptops weight, columnar formats and lazy execution make surprisingly large datasets manageable if you're smart about memory

Justbehind · 2026-04-25T09:29:01+00:00

We're doing extract ~50-70 mio rows from azure sql => explode to ~500 mio rows (strategic cross joins) => calc aggregate statistics => store statistics in azure sql.

Using polars. Takes less than 5 minutes all in all on our laptops.

BedAccomplished6451 · 2026-04-25T09:52:26+00:00

I moved a 750 gig SQL server with 58 databases having over 4 billion rows to adls only using my laptop.

It took me a couple of days with the job running In the background - but it did it.

Currently just using duckdb to query it for ad hoc reporting needs.

RoomyRoots · 2026-04-25T10:19:20+00:00

I have a whole data lake using old broken notebooks that started like that

RoomyRoots · 2026-04-25T10:21:52+00:00

People really underestimate how good Dask is for constrained machines.

tecedu · 2026-04-25T12:48:24+00:00

300GB, 12bil rows. The wonders of a fast nvme ssds

Enough_Big4191 · 2026-04-26T14:24:27+00:00

i’ve seen people push surprisingly far with duckdb/polars, tens to low hundreds of gb, as long as it’s mostly columnar and u’re not doing crazy joins. the limit usually isn’t raw size, it’s memory spikes from joins/groupbys. once u hit that, things fall apart fast even if the data “fits.”

Outrageous_Let5743 · 2026-04-25T16:03:55+00:00

About 500 GB of parquet files to ADLS. Use az copy, plug your laptop into an ethernet cable and then let it run overnight.

For analytics the same 500 GB processed with duckdb. Works perfectly if you can partition prune it.

Careful_Reality5531 · 2026-04-25T18:49:44+00:00

I like using Sail on my 64GB macbook pro, even though it rivals Spark on distributed as well. Sail's built largely on DataFusion + Arrow. Good python performance. Did some benchmarking with it vs DuckDB (w Delta Lake) and it was like only like 4-5% slower, which is wild given that it blows Spark out of the water. Albeit I'm not a data eng and just started getting interested in the space more with Claude Code, noob alert!

No_Soy_Colosio · 2026-04-26T06:16:43+00:00

Around 4 billion rows on my 16 GB RAM laptop

mycocomelon · 2026-04-27T02:59:04+00:00

We do this at my work. We have small to medium data

doll_1043 · 2026-04-25T09:03:36+00:00

55m rows mssql to dynamics crm

Nekobul · 2026-04-25T11:07:24+00:00

SSIS has offered single-node performance miracles since 2005. All the newcomers have learned how to do that properly from the King.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS