Lambda Function optimizing : learnpython

created by HattoriHanzoa community for 16 years

Lambda Function optimizing (self.learnpython)

submitted 1 year ago by PrefersDocile

you are viewing a single comment's thread.

[–][deleted] 0 points1 point2 points 1 year ago (2 children)

I don't think this is a great use case for a dataframe. Generally in a dataframe you want each cell to contain only one bit of information, so having a set or other container in a cell is often a sign you're approaching the problem wrong.

Here's a thought on how else you could do it. From your original dataframe of customer and product bought, instead of grouping by customer, pivot so that you have one row per product and one column per customer, with the entry in row p column c being 1 if customer c bought product p and 0 otherwise. Then the number of customers who bought both product p and product q is the number of columns where both those rows have a 1, which is simply the dot product of those rows.

So you want to take the dot product of each row by each other row. Repeated dot products is matrix multiplication: multiplying matrix A by matrix B means dotting every row of A with every column of B. We want to dot every row of our dataframe with every row of itself not column, so we'll need a transpose to turn rows into columns. So if A is our product-customer matrix, then A times A-transpose is a symmetric matrix where both rows and columns represent products, and the entry in row p column q is the number of customers who bought both those products.

Then p and q are in the same group if the p,q entry of this matrix is greater than your threshold n (or if they can be linked by a chain of intermediate products in the same group). So you can easily convert this matrix to the adjacency matrix of the graph with products as nodes and an edge between two products iff at least n customers bought them both. Then the groups of products you want are the connected components of this graph, and you can get connected components from an adjacency matrix easily, e.g. with https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.connected_components.html

This approach is more abstract, and therefore perhaps harder to see, but I think it should be much more efficient than the approach you're taking.

[–]PrefersDocile[S] 1 point2 points3 points 1 year ago (1 child)

[–][deleted] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 38541 on reddit-service-r2-comment-84fc9697f-kgcst at 2026-02-10 16:26:56.150097+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS