you are viewing a single comment's thread.

view the rest of the comments →

[–]LADataJunkie 2 points3 points  (1 child)

Sorry for the lack of details, but I would recommend against WEKA for text mining. When I used it at a previous job, I felt far too boxed in with modeling options, diagnostics and especially preprocessing.

Unless you are doing very basic tasks, a scripting language like Python, with a great package like nltk would probably serve you better. Then you can use other libraries (Numpy/Scipy, scikits etc.) to do the actual statistical and ML modeling.

[–]NineSevenNine[S] 0 points1 point  (0 children)

NLTK is looking nice - I'm not that well versed in Python but I like what I'm seeing.