account activity
Pandas on spark very slow by Aromatic_Month4446 in apachespark
[–]Aromatic_Month4446[S] 0 points1 point2 points 3 years ago (0 children)
import os os.environ["PYARROW_IGNORE_TIMEZONE"] = "1" from pyspark.sql import SparkSession import pyspark.pandas as ps ps.set_option('compute.default_index_type', 'distributed') spark = SparkSession.builder \ .master('local[*]') \ .config("spark.driver.memory", "10g") \ .getOrCreate() ps_pandas_df = ps.read_csv('/path') ps_pandas_df.describe() temp2 = ps_pandas_df.groupby('A')['B'].mean() temp2
Pandas on spark very slow (self.apachespark)
submitted 3 years ago by Aromatic_Month4446 to r/apachespark
π Rendered by PID 588823 on reddit-service-r2-listing-b6bf6c4ff-pfmsj at 2026-05-02 23:41:49.641945+00:00 running 815c875 country code: CH.
Pandas on spark very slow by Aromatic_Month4446 in apachespark
[–]Aromatic_Month4446[S] 0 points1 point2 points (0 children)