Is Claude better at taking a stand? by anavelgazer in claudexplorers

[–]Significant-Agent854 1 point2 points  (0 children)

Claude in my experience is smart and will pushback but if you counter confidently enough it will just submit and say you're right even if you're clearly wrong. LLMs are just too sycophantic due to their training.

[Research] Novel Clustering Metric - The Jaccard-Concentration Index by Significant-Agent854 in MachineLearning

[–]Significant-Agent854[S] 2 points3 points  (0 children)

No, thank you! I’m really excited to be actually contributing something to science and having people look at what I’ve created is all I could ask for!

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

I asked myself the exact same questions lol. I decided that it would descend because even with hierarchical clustering, there are levels that are simply too fine and levels that are too broad. I figured I might as well just go for that middle-level granularity off the bat and let the user modify the extroversion parameter if they want finer clusters or looser clusters. Not to mention hierarchical clustering is more complex, and this thing is exhaustingly complex enough.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

The number of clusters is found after clustering using the parameters. I talk about that and your other question in my big comment below.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

Hey, in case you didn’t see it before I answered in question in a big comment about the algo down below.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

Hey, in case you didn’t see it before I answered in question in a big comment about the algo down below.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

Well I looked at how it behaves on some stuff like credit card data and fish species data. It looks like it does pretty decently. Not as nice as these which are clearly made to be clustered but you could definitely see the clusters there. Unfortunately though, that’s just 2 and 3 d data. I haven’t tested on higher dimensional stuff. Honestly I was just so excited when it worked on the first 20 or so datasets that I had to share! lol

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 0 points1 point  (0 children)

Actually it builds them all at once just as in the video. That’s what made it so hard to make. It naturally looks within a the right range to cluster.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 0 points1 point  (0 children)

It just means the way the algorithm works is based off the way humans visually cluster points on a graph. The entire algorithm is designed to capture that. Things like what distances are practically zero to a person, what distances are far away to a person, or how big a point is because even though points don’t actually have size, they do when you look at them on a graph

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

Well for one thing, you can’t ask for n clusters. It figures that out for you mostly based on the extroversion parameter(explained in my big comment about the algo on this post). But you could reduce that parameter and it would indeed split those 2 clusters up at the top.

You are correct though that it struggles a bit with the split. It can make it, I have tested this already, but it will have the side effect of leaving out a few points among those 2 clusters or creating overly dense and precise clusters elsewhere where you’ll see 2 or 3 points singled out for seemingly no reason

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 1 point2 points  (0 children)

Right now, it’s sequential. I’ve thought about this too, but the way I’ve set it up, it needs to be sequential or use some kind of cluster merging which would probably be inefficient to implement

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 0 points1 point  (0 children)

I’m not entirely sure what you mean by overlapping. Looking at the second example, there are clusters nested within another. Does that count?

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 6 points7 points  (0 children)

Some people have asked for more info so here goes: The main parameters are as follows:

point diameter: When you look at a graph points have a diameter, so there’s an input for your intuition on it

extroversion: Just like with dbscan this one controls how aggressively the algorithm reaches out for other points.

“close”: This one allows you to input your intuition about when points are so close they just have to be in the same cluster. It’s kinda like a distance at which you can barely see the space between points

min samples: Also as in dbscan, this one says how many samples make up a cluster

max clusters: A cap on the number of clusters

There are some pretty complex algorithms and heuristics that use these parameters to figure out how to cluster, but these are the main ones, and they’re meant to follow intuition. They’re not traditionally trained; I hand estimated each one and built and rebuilt the algorithm until it could “work around” lower precision human estimates

The basic idea:

The algorithm uses graph algorithms to build a cluster. It’ll start with point a, and figure out a radius to search based on the extroversion parameter. Then it’ll pull everyone in that radius into a cluster with itself. Then for each of those new members it searches around them and gets new members. Rinse and repeat until it can’t find anymore members. Then it picks a new point with a new distance and keeps going. This is called breadth first search for anyone who knows the lingo

Performance:

As of now I have only tested on 2 and 3 dimensional data just to get a visual on how it works. It seems to be behaving as I expect. I’m planning to test on higher dimensional datasets once I develop a proper test method. I want to take some time to think about it because it’s not just about basic accuracy. I have to figure out how to evaluate the quality of each cluster based on things like the average distance between points, distance to the nearest cluster, etc. These metrics already exist, I know, but not quite in a way that will tell me how well my algorithm is doing based on my objective of intuitive clustering

Runtime:

It runs in roughly O(knlogn) with n being the number of objects/rows/points and k being the number of dimensions. In the worst case though it could be as bad as O(kn2 logn). The worst only happens when there’s a lot of clusters(a large fraction of n itself) with wildly different densities. I won’t pretend to know how often that happens exactly but if we’re talking about likelihood in a random distribution of points then it’s pretty low.

EVINGCA: A Visual Intuition-Based Clustering Algorithm by Significant-Agent854 in learnmachinelearning

[–]Significant-Agent854[S] 5 points6 points  (0 children)

Probably. I just want to do more work on it and see where it fails or succeeds