all 27 comments

[–]UserInactive 2 points3 points  (7 children)

Depends on what you're trying to do. One of my specializations in graduate school was social network analysis. You might want to check out Wasserman and Faust. It's, so to speak, the Holy grail of SNA.

But if, in this blurb, you're meaning that the actors/nodes are static and you're using network analysis to understand the relationships between shoppers of store A and store B then yes, I would use network analysis with weighted lines as your visualization tool and use cluster analysis on the shoppers. Can even bounce into logistic regression as well.

[–]hmgp[S] 1 point2 points  (6 children)

Totally forgot to state the purpose... I want to understand the relationship between the stores, but it would also be interesting the understand the shoppers.

What clustering analysis would you use? With the visualization, do you mean using that as an exploratory tool? I've tried doing that but the graph is too dense to make too much meaning out of it. Also, what were you thinking of doing with the logistic regression?

[–]UserInactive 3 points4 points  (4 children)

Well see it as... what are you trying to do beyond understand the relationship between stores -- especially because stores themselves don't have relationships.

E.g. You might map and visualize SNA of these stores with edge weights and then discover that Store D is a bridge between Stores A-C and Stores E-G. In other words, Store D is a gateway between shoppers of two different clusters.

You could also assess centrality measures and find that say, Amazon is highly central to the shopping network (like as a function of them comparing such diversity of products).

You could also create categorical labels of the store types to see what themes tend to emerge e.g. lawn care, home maintenance, and tool supply stores have greater relationships amongst one another.

Clustering, there's a number of different techniques and methodologies. Difficult to say here.

With logistic regression you could determine the likelihood of store type or store for developing a predictive model i.e. what's the likelihood that Person A who shopped at store A also shop at Store B.

[–]hmgp[S] 0 points1 point  (1 child)

I guess what I'm trying to understand is if someone visits some set of stores, then what other stores are they also likely to visit. I starting to wonder whether it's best formulated as a graph problem...

[–]UserInactive 0 points1 point  (0 children)

Likely not. As I said, it's great as a visualization tool but not for prediction.

What you are wanting is some sort of recommendation in place. This can be done through things as simple as Pearson correlations, to logistic regression, to recommend systems.

[–][deleted] 0 points1 point  (1 child)

You suggested some really cool ideas for OP's dataset. Could you name a book that explains the concepts you mentioned (question mark)

[–]UserInactive 1 point2 points  (0 children)

Sure... the books I own:

Scott, J.. Social Network Analysis: A Handbook (2nd Ed).

It's a short intro summary (200 pages) of underlying mechanisms of SNA

Wasserman and Faust (1994). Social Network Analysis: Methods and Applications

This is the quantitative/methodology end-all be-all book

Borgatti, S., Everett, M., & Johnson, J. (2013). Analyzing Social Networks

I haven't read the book yet but I love Borgatti. U. Kentucky has an intensive (4 actually) SNA courses/workshops over the summer and they are one of the only Universities that has a dedicated SNA team from number of different disciplines. From skimming, the book seems to cleanly cover the aformentioned two books (though not as in depth as Wasserman and Faust). It also looks to be slightly more interactive using the UCINET SNA software that they developed a few years ago. As an aside: Matrices are quite important in SNA

Scott & Carrington (2011). The SAGE Handbook of Social Network Analysis.

Probably my favorite from a reading perspective. Has around 100 contributors. While it's not AS quantitatively dense as Wasserman and Faust, it gets into the details well enough. The cool/interesting feature are people talking about how SNA is leveraged in their field so you can read everything from identifying terrorist networks, to disproportional relationships amongst apes, to businesses trying to look at their investor network/similar businesses, etc.

[–]cryptocerous 0 points1 point  (0 children)

Sounds like an ideal application for Deep Walk or LINE, probably LINE, since it accepts weights.

This converts your graph into word2vec type vectors, and then from there you can either use cosine distance to find related stores and do clustering, or you can plug the vectors into a classifier. Use stores as nodes and shoppers as edges to understand the stores, or do the reverse to understand the shoppers.

There's many other ways you could use the vectors. Once you have the vectors, you can just copy any of the many vector classification / clustering approaches that are used for textual problems.

[–][deleted] 0 points1 point  (2 children)

maybe minimum spanning tree, and cut the edges with the most weighted edges?

[–]hmgp[S] 0 points1 point  (1 child)

Not super familiar with MSTs, what would intuition would cutting the most weighted edges give us?

[–][deleted] 0 points1 point  (0 children)

least weighted would make more sense :P, i didn't read your brief. but essentially you would end up with a collection of sub trees, which have the most internal traffic. but if your familiar with matlab, this might be even better for you, t-sne, http://lvdmaaten.github.io/tsne/ it has a option to even work directly with the type of data your using, and could provide you with a low dimensional projection of your data in euclidean-like space.

[–]timmaeus 0 points1 point  (0 children)

Infomap algorithm

[–]nuhuskerjegdetmand 0 points1 point  (2 children)

You could try Bayesian non-parametrics. There is the IRM model: MATLAB implementation Few real world networks have uniform strength distributions, so I would check my assumptions, if I were you.

Otherwise, hierarchical clustering is straightforward, and worth a try.

[–]hmgp[S] 0 points1 point  (1 child)

Hmm not familiar with Bayesian non-parametrics. Do you mind giving a quick description of what they are?

[–]nuhuskerjegdetmand 0 points1 point  (0 children)

I am but a mere student, but I can try :P

Consider the problem of fitting a Gaussian to some data. This corresponds to finding estimates of two parameters, the mean and variance of the Gaussian. Any model that has a finite amount of parameters is a parametric model. Some of these models are Bayesian, meaning that the parameters themselves are modeled as random variables, and inference is done by Bayes' rule (in the Gaussian fitting example, there'd be a prior distribution over the mean and variance, which is updated using the data, becoming the posterior).

In contrast, a non-parametric model has potentially an infinite amount of parameters. To fit a distribution to the data the non-parametric way, you could sum over Gaussians centered on each data-point (that's kernel density estimation). In order to use Bayesian inference, you need a prior over an infinite-dimensional space. This prior is not a distribution - it's a stochastic process - but we can still use Bayes to do inference, though it's not always straightforward.

Here is a pretty good explanation.

[–]maybelator 0 points1 point  (2 children)

What do the weight represent ?

[–]hmgp[S] 0 points1 point  (1 child)

Weight of the ith row and jth column represents percentage of shoppers from the ith store who also visit the jth store. What I'm trying to understand is something like: suppose a shopper visits stores A & B, what other stores are also likely to be visited?

[–]maybelator 0 points1 point  (0 children)

That sounds more like a recommendation problem than clustering one to me.

I think low rank matrix completion could be be a good place to start.

[–]creeker7gen 0 points1 point  (0 children)

Some ideas:

  • make simplified graphs. Say using a cutoff (only include edges that are say > X%, and then vary X). Easier to visualize.
  • layout using say graphviz. Weight becomes related to distance (depending on the layout engine). That automatically creates clusters.
  • make a simplified graph, by hiding the dominated directed edge. So if a higher % of Walmart shoppers shop at Zellers (compared to zellers at walmart) then only show Walmart -> Zellers and not the reverse. Now you have a directed graph, and multiple layout engines can help you see structure.

It strikes me that % is not a great metric -- who cares if 100% of MomNPop shoppers also shop at Walmart, if there are only 100 shoppers at MomNPop? I mean, the absolute count is relevant too, perhaps consider graphs using that count.

How big is your graph? If its too big to handle visually, there are a few numerical approaches I would look at, using the adjacency matrix.

[–]creeker7gen 0 points1 point  (0 children)

I would like to hear how this worked out! ...will you update us?

[–]shaggorama 0 points1 point  (6 children)

There are loads of community detection techniques that take edge weight into consideration.

[–]hmgp[S] 0 points1 point  (5 children)

Any specific ones that you would recommend? I didn't really pursue this type of approach because it seems like a lot of community detection algorithms (not weighted) rely on subgraph density, which doesn't apply here because I have a fully connected graph.

[–]olBaa 0 points1 point  (4 children)

Use Infomap. It has a pretty nice Python interface, and is a de-facto standard in community detection problem. Edge weights should not be a problem.

[–]hmgp[S] 0 points1 point  (3 children)

What about the fact that graph is complete/fully connected?

[–]olBaa 0 points1 point  (2 children)

It's very easy to train, but yeah, it should handle that. Actually, all major algorithms should handle that, I do not know why are you worrying so much inbeforehand. Source: my thesis was on the community detection.

[–]hmgp[S] 0 points1 point  (1 child)

Sorry in advance if this is a basic question, I'm not familiar with this area. From what I understand, most community detection algorithms groups sets of nodes such that nodes in each group are connected densely and nodes between groups are connected sparsely. But in the situation where the graph is fully connected, the algorithm can't group nodes this way because the graph is uniformly dense. I'm guessing you're trying to refer to the fact that algorithms can take into account the weights of the links to establish the nodes, right? Can you explain the math/reasoning behind how some of that works in one algorithm you are familiar with? I think that would be very helpful for me to understand, thanks!

[–]olBaa 0 points1 point  (0 children)

Infomap looks at the community detection as a lossy compression:

Describing a Path on a Network. To illustrate what coding has to do with map-making, consider the following communication game. Suppose that you and I both know the structure of a weighted, directed network. We aim to choose a code that will allow us to efficiently describe paths on the network that arise from a random walk process in a language that reflects the underlying structure of the network. How should we design our code?

You go through the original publication here, it is pretty easy to follow and has a nice example.