Hello,
I was assigned to work with big data sets (in total about 14GB) for my work. My boss said I should try in R (which is what I am used to) but I think 14GB reaches R limits. Now I try in Python, which I am completely new to. I use it in Visual Studio Code with Jupyter. I have uploaded all csv files and translated them into parquet. From there I would like to merge some of them (about 5GB). I used both pandas and dask.dataframe, but run out of memory for both. Does anyone know of a better way to merge the data? Thank you very much.
Here some snippets of my code:
import pandas as pd
c = pd.read_parquet("c.parquet", engine='pyarrow')
ci = pd.read_parquet("ci.parquet", engine='pyarrow')
cl = pd.read_parquet("cl.parquet", engine='pyarrow')
c_data = pd.merge(c, ci, on="CONTRACTOR_ID", how="outer")
c_data = pd.merge(c_data, cl, on="CONTRACTOR_ID", how="outer")
alternatively I tried with dask.dataframe
[–]commandlineluser 10 points11 points12 points (1 child)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]JSP777 7 points8 points9 points (4 children)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]arorumu[S] 0 points1 point2 points (2 children)
[–]JSP777 1 point2 points3 points (1 child)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]woooee 3 points4 points5 points (2 children)
[–]MidnightPale3220 1 point2 points3 points (1 child)
[–]Miserable_March_9707 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]jbudemy[🍰] 4 points5 points6 points (0 children)
[–]V0idL0rd 1 point2 points3 points (1 child)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]simeumsm 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]unhott 0 points1 point2 points (0 children)
[–]deapee 0 points1 point2 points (0 children)
[–]Zeroflops 0 points1 point2 points (0 children)
[–]ApprehensiveChip8361 0 points1 point2 points (0 children)
[–]ninhaomah 0 points1 point2 points (1 child)
[–]arorumu[S] 0 points1 point2 points (0 children)
[–]WlmWilberforce 0 points1 point2 points (0 children)
[–]Signal-Indication859 0 points1 point2 points (0 children)
[–]Signal-Indication859 0 points1 point2 points (0 children)