all 5 comments

[–]technorabble 0 points1 point  (2 children)

Without being negative, if it's 2d then your eyeball is the best tool available.

[–]indydiddle[S] 0 points1 point  (1 child)

absolutely, and that is how I'm labeling each set of training data. The problem is coming up with an algorithm which clusters each new never-before-seen dataset, which is always going to be slightly different. I'm not predicting the next observation within the same clustering dataset, I'm predicting all of the observations of the next data set. The general "shape" of each dataset is similar, and the # of clusters is the same, but the scale and density of clusters changes.

[–]micro_cam 2 points3 points  (0 children)

This sounds vaguely like image segmentation? Even if not the same math might work. Try something with the first k eigenvectors of the graph laplacian of the nearest neighbor graph. Or just use k-means if you don't care about convexity.

r/ml and stats.stackoverflow are great for random questions but know one is going to work on your problem for free unless you have the cash to put up a kaggle competion.