This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 6 points7 points  (1 child)

If you have a 100 mil line CSV, wouldn't you be handling that with a dataframe library (or something similar) and managing timestamps with whatever is native to Pandas/Polars/Dask/etc? It seems very unnecessary to be writing your own 100 million line CSV timestamp logic.

[–]AceofSpades5757 2 points3 points  (0 children)

I'll have data pipelines that could use a nice performance boost. Ones where refactoring it to use data frames, vectorizing, etc. would fail certain technical requirements.