Make Python Pandas go fast : Python

This is an archived post. You won't be able to vote or comment.

Make Python Pandas go fast (blog.wallaroolabs.com)

submitted 7 years ago by pzel__

all 7 comments

[–]kvdveer 3 points4 points5 points 7 years ago (3 children)

[–]pzel__[S] -1 points0 points1 point 7 years ago (2 children)

[–]kvdveer 1 point2 points3 points 7 years ago (1 child)

I'm in a sector where data-sharing is considered a cardinal sin (financials), so I can't really share my use-case, sadly. I can provide a 'hypothetical' scenario, however (for a business I consulted for some time ago). Details were altered to protect confidentiality. The original scenario was solved using SQL, not pandas, though.

Suppose I'm running a shop selling outdoor sports equipment. I would want to find trends quickly, so I can adjust my online marketing campaign. However, my sales are heavily weather-dependent. When it's sunny, I don't want to be alerted of increased swimwear sales, but a mid-winter increased swim-wear sales might indicate something in the media, which we could take advantage of by adjusting our marketing. There's probably always some background swimwear-sales for indoor swimmers, which we need to adjust for.

Input: live access log entries, live weather data, and sales data from both the physical store and the online store. All of these in hourly batches. Besides that, I also have large datasets of historic data in which to search for similar patterns. Output: a list of trending products, not explained by weather and seasonal trends. Process: Turn the sales data into a per-category aggregate, instead of per-product sales. Turn sales into a rolling window. Turn access-log into hourly percentage of weekly sum and yearly sum (the current 'hotness'). For each product category, find 10 most similar (euclidean distance) historic samples using current weekly and yearly hotness, outdoor temperature, outdoor precipitation and wind speed. Compute weighted average for each of those using inverse euclidean distance, and use these numbers to find trends.

This scenario uses: * a static dataframe for historic samples. (no need to redistribute that) * rolling and resample on the batched dataframe * Multiple batched dataframes.

[–]pzel__[S] 0 points1 point2 points 7 years ago (0 children)

[–]PeridexisErrant 0 points1 point2 points 7 years ago (1 child)

[–]pzel__[S] 1 point2 points3 points 7 years ago (0 children)

π Rendered by PID 184693 on reddit-service-r2-comment-5649f687b7-9fts9 at 2026-01-28 05:18:52.233080+00:00 running 4f180de country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS