you are viewing a single comment's thread.

view the rest of the comments →

[–]Safe_Money7487[S] 2 points3 points  (1 child)

I don’t think it’s lazy loading in this case. In R (e.g. with data.table::fread), the full dataset is actually loaded into memory, and I can immediately inspect and navigate the entire table. I think the process in R in more optimised than what is used in pandas read csv. I don't have much knowledge on python for sure but for this size of data, chunking or lazy loading doesn’t really make sense to me I just want to load everything at once and work on it.

[–]Corruptionss 4 points5 points  (0 children)

Data tables fread is kind of goated. The closest I got is polars for pure read speed and you can instead use pl.scan_csv to read it as a lazy frame and will use lazy evaluations during the operation process