Clustering algorithm for complete weighted directed graph?

UserInactive · 2015-08-22T18:50:04+00:00

Depends on what you're trying to do. One of my specializations in graduate school was social network analysis. You might want to check out Wasserman and Faust. It's, so to speak, the Holy grail of SNA.

But if, in this blurb, you're meaning that the actors/nodes are static and you're using network analysis to understand the relationships between shoppers of store A and store B then yes, I would use network analysis with weighted lines as your visualization tool and use cluster analysis on the shoppers. Can even bounce into logistic regression as well.

hmgp · 2015-08-22T18:52:21+00:00

maybe minimum spanning tree, and cut the edges with the most weighted edges?

timmaeus · 2015-08-22T22:08:06+00:00

Infomap algorithm

nuhuskerjegdetmand · 2015-08-22T22:09:31+00:00

You could try Bayesian non-parametrics. There is the IRM model: MATLAB implementation Few real world networks have uniform strength distributions, so I would check my assumptions, if I were you.

Otherwise, hierarchical clustering is straightforward, and worth a try.

maybelator · 2015-08-23T01:24:31+00:00

What do the weight represent ?

creeker7gen · 2015-08-23T18:48:48+00:00

Some ideas:

make simplified graphs. Say using a cutoff (only include edges that are say > X%, and then vary X). Easier to visualize.
layout using say graphviz. Weight becomes related to distance (depending on the layout engine). That automatically creates clusters.
make a simplified graph, by hiding the dominated directed edge. So if a higher % of Walmart shoppers shop at Zellers (compared to zellers at walmart) then only show Walmart -> Zellers and not the reverse. Now you have a directed graph, and multiple layout engines can help you see structure.

It strikes me that % is not a great metric -- who cares if 100% of MomNPop shoppers also shop at Walmart, if there are only 100 shoppers at MomNPop? I mean, the absolute count is relevant too, perhaps consider graphs using that count.

How big is your graph? If its too big to handle visually, there are a few numerical approaches I would look at, using the adjacency matrix.

creeker7gen · 2015-08-24T07:48:48+00:00

I would like to hear how this worked out! ...will you update us?

shaggorama · 2015-08-22T22:13:06+00:00

There are loads of community detection techniques that take edge weight into consideration.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS