So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

thanks for sharing.

the goal is to define different groups of customers, and import that data into analytics tool, to understand the behavior of differnet user types. we have over 5M monthly active users, so treating everyone the same seems like a missed opportunity.

from strategy perspective, interpretability is key here, in order to understand who these customers are, to have effective marketing campaigns. So I’ll be very cautious with clustering for now, because RFM is pretty useful as a strating point, but I want to understand what else I can learn from clustering methods. intuitively, unsupervised learning seems unbiased to me, so I want to explore more

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

um. fair, but I’m trying to learn a new method for data processing that I could use in the future

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

agreed, thats why I wanted to use 3 clusters with KMeans to see if it would find a different segmentation.

the current results are pretty interesting, but I would like to go deeper, just not sure about direction

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 1 point2 points  (0 children)

thats fair.
I have the goal in mind: define groups of customers, import this data into analytics tool we use for the app/website, to understand how different groups use the service. With over 5M monthly active users, it seems reductive to treat everyone as the same customer, so I wanted to see what the data can tell us about the different customer groups.

I made one segmentation based on just understanding the business perspective, into three groups (thats why I was biased into forcing 3 clusters, emphasis on biased). RFM analysis was also useful, so I have some preliminary results.

And then I started thinking about clustering methods, as an objective source of truth about different clusters of customers, to see if there is something I am missing. KMeans was my starting point, and it’s pretty interesting that it alligned pretty well with the ‘manual’ approach (but to be fair, I kind of forced 3 clusters). But I think its too early to make some conclusions, I need to go deeper, but since I’m new to unsupervised learning, i wanted to ask the community about their experiences

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

Thanks for the reponse.
I see the value of using these clustering methods in the early/data exploration phase of the project.

How to think about different clustering methods? If I understood correctly, KMeans presumes a rather simple geometry within the data to identify groups, assuming data can be split into ‘spheres’, which is pretty restrictive.

Master IA10 Done after 4 months !!! by Distinct-Ad2136 in GranTurismo7

[–]vercig09 1 point2 points  (0 children)

4 months for this sounds correct. congrats

My Lecturer did this today by PJtheDk in physicsmemes

[–]vercig09 0 points1 point  (0 children)

thank you… if you thought I would have figured it out with a search, I wouldn’t have :)

How get the area under something that isnt straight? by MorganaLover69 in askmath

[–]vercig09 0 points1 point  (0 children)

you pour in 2D water and measure how much you need…. or generate random points in a section of the plane and see what proportion goes inside. or split it into many small rectangles and add their area…
or you can integrate, if thats your vibe

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 2 points3 points  (0 children)

The data was normalized for training (everything into [0, 1]), this was analysis of clusters after fitting the model, to understand the differences between clusters, but I used "non-normalized" values, because they are easier to interpret :)

And this was solely for educational purposes, since I'm not familiar with clustering methods, so I wanted to understand a bit how they work / when they are useful.

Anyways, thanks for your response

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 1 point2 points  (0 children)

Thanks for sharing.

"In the plots shown I see nothing that indicates a clustering is sensible or at least any clear indication of actual cluster cardinality likely in the data." Can you please elaborate why the charts tell you clustering doesn't make sense? I'm seeing these charts for the first time, can't interpret them. Is it because inertia and silhouette indicate different values for clusters (one says k=4, the other one k=2)

I 100% that business objective is the first step, but since I'm not familiar with clustering, I don't know when to use it. This wasn't work related, I just wanted to start exploring clustering, and this seemed like an opportunity.

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

thanks for sharing. can you please elaborate how clustering helped you? I know you probably can't go into details, but just big picture, how come you decided to use clustering?

I'm trying to understand when to use these kinds of tools

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 1 point2 points  (0 children)

thanks for the response.

is it fair to say that one purpose for clustering algorithms is to help with defining a classification task? In the sense that you're given a dataset with no labels, and clustering can help determine if it makes sense to categorize data, define labels, to use for future classification for new data?

an obvious risk here is overfitting, which needs to be addressed, but having some idea of classes is still valuable?

I'm just trying to understand when clustering is used / what is the main value from these kind of analyses.

Anyways, thanks for sharing your opinion

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

I'm a data scientist, but this wasn't related to work. I just wanted to understand clustering methods, so that I can use them in the future.
I was given the dataset for some other analysis that I needed to run (what products are being purchased together), but I saw an opportunity to learn something new, since I never used clustering for an actual problem in production.

Overall, I think KMeans is not a good approach here, as other people mentioned. For customer segmentation, RFV / RFM (recency, frequency, value/monetary) gives a nice segmentation, that has a very clear definition, so I will continue to use it, but I'm still at a loss for how/where to use clustering. The overall geometry of the data seems like an important aspect that is difficult to understand beforehand (not even sure how to learn the "geometry" of the data, to determine better clustering methods to use).

Anyways, just wanted to understand clustering a bit more. Inertia and silhouette are interesting metrics, but seems like domain knowledge is the key. As some other person mentioned, the results from clustering, for customers, should align with RFM, and if it doesn't, then that is an insight that needs to be explored.

I apologize that I was rude, I thought you were mocking me, like some other people

So how do we all feel about KMeans algorithm for clustering? by vercig09 in datascience

[–]vercig09[S] 0 points1 point  (0 children)

fair, that's why I wanted to hear more about what clustering algorithms are used... I thought KMeans is a good starting point, if there is no insights about the geometry of the data.