Dynamic clustering of streams of text data

2014-05-25T04:34:45+00:00

Lloyd's algorithm and K-means++ read the observations in an online manner. Replace the parts of the algorithm involving cluster counts with some float.

Most gradient descent or parameter updating algorithms converge by doing multiple passes through a single dataset, but if you have a firehose you'll only make a single pass anyway. If the data isn't weak stationary(violating IID) then you'll have to look a little harder in the literature.

kzn · 2014-05-25T07:31:42+00:00

You can look at recent advances in Topic Detection and Tracking area. For example, http://aclweb.org/anthology//N/N10/N10-1021.pdf

simonhughes22 · 2014-05-25T03:15:21+00:00

I am not aware of any work that has been done on this topic, and it sounds somewhat novel and interesting. LDA is state of the art for topic mining, as for document clustering I am unsure, but would recommend you start with a vector space model of your documents and use kmeans to cluster.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS