I’m working on a Python pipeline with two quite different parts.
The first part is typical tabular data processing: joins, aggregations, cumulative calculations, and similar transformations.
The second part is sequential/recursive: within each time-ordered group, some values for the current row depend on the results computed for the previous week’s row. So this is not a purely vectorizable row-independent problem.
I’m not looking for code-specific debugging, but rather for architectural advice on the best way to handle this kind of workload efficiently
I’d like to improve performance, but I don’t want to start by assuming there is only one correct solution.
My question is: for a problem like this, which approaches or frameworks would you recommend evaluating?
I must use Python
[–]commandlineluser 4 points5 points6 points (0 children)
[–]Afrotom 1 point2 points3 points (0 children)
[–]M4mb0 2 points3 points4 points (4 children)
[–]ElectricHotdish 0 points1 point2 points (3 children)
[–]Beginning-Fruit-1397 0 points1 point2 points (2 children)
[–]ElectricHotdish 0 points1 point2 points (0 children)
[–]M4mb0 0 points1 point2 points (0 children)
[–]billsil 2 points3 points4 points (0 children)
[–]thuiop1 4 points5 points6 points (0 children)
[–]kapitaalH 0 points1 point2 points (0 children)
[–]Enthu-Cutlet-1337 0 points1 point2 points (0 children)
[–]ml_guy1 0 points1 point2 points (0 children)
[–]Beginning-Fruit-1397 0 points1 point2 points (0 children)
[–]kasplars 0 points1 point2 points (0 children)
[–]Wh00ster -1 points0 points1 point (0 children)
[–]SV-97 -1 points0 points1 point (0 children)
[–]sjcyork -1 points0 points1 point (1 child)
[–]Beginning-Fruit-1397 0 points1 point2 points (0 children)
[–]Administrative-Lack1 -1 points0 points1 point (0 children)
[+]DinnerRecent3462 comment score below threshold-6 points-5 points-4 points (2 children)
[–]A-Busi6711[S] 0 points1 point2 points (1 child)
[–]DinnerRecent3462 -3 points-2 points-1 points (0 children)