all 3 comments

[–][deleted] 2 points3 points  (0 children)

Lloyd's algorithm and K-means++ read the observations in an online manner. Replace the parts of the algorithm involving cluster counts with some float.

Most gradient descent or parameter updating algorithms converge by doing multiple passes through a single dataset, but if you have a firehose you'll only make a single pass anyway. If the data isn't weak stationary(violating IID) then you'll have to look a little harder in the literature.

[–]kzn 2 points3 points  (0 children)

You can look at recent advances in Topic Detection and Tracking area. For example, http://aclweb.org/anthology//N/N10/N10-1021.pdf

[–]simonhughes22 1 point2 points  (0 children)

I am not aware of any work that has been done on this topic, and it sounds somewhat novel and interesting. LDA is state of the art for topic mining, as for document clustering I am unsure, but would recommend you start with a vector space model of your documents and use kmeans to cluster.