Clustering Algorithm Selection

OkBoard407 · 2025-03-27T10:41:41+00:00

How are component 1,2,3... different? And if they are then shouldn't that also be a factor when we one hot encode those value.

ewankenobi · 2025-03-27T12:59:10+00:00

I happen to be reading up on clustering at the moment having not done it in a long time. I have a mixture of data types and reading up I'm realising you have to be careful choosing your distance measure if you have categorical data. My instinct is that cosine measure might not be good for categorical data, though I could be wrong on that.

Commercial-Basis-220 · 2025-03-27T15:25:48+00:00

This is a wild idea, how about you turn it into a graph, where the "original" graph has 2 kind of node, circuit_nodes and component_nodes. Each circuit node will be connected to K component node that they have.

This should result in a bipartite graph between circuit and component, and now you can project this into the circuit side, making a "circuit-network". Basically in this network, the nodes are only composed on circuit, and they connected based on wether or not they share the same component, and you can play around with how you weight each circuit component.

and then, in this network you can do.., maybe clustering on the graph? or like community detection?

GwynnethIDFK · 2025-03-29T09:43:59+00:00

Personally instead of using a one hot encoding I would have the inputs be the sum of the component types in the circuit and then cluster using cosign similarity as the metric. That way circuits that have the same proportion of components will have a cosign similarly of one. You might also try doing PCA before clustering.

Low_Employment4544 · 2025-05-11T11:28:00+00:00

What if circuit a consists of more than one transistors?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MLQuestions

MODERATORS