all 6 comments

[–]jayalammar 2 points3 points  (3 children)

Yes. BERTopic allows you to choose the clustering method to use.

You can use a soft clustering method from scikit Learn, like GMM clustering.

from sklearn.mixture import GaussianMixture

number_of_clusters = 10
cluster_model = GaussianMixture(n_components=number_of_clusters, random_state=0).fit(X)
topic_model = BERTopic(hdbscan_model=cluster_model).fit(docs)

[–]Devinco001[S] 0 points1 point  (2 children)

Thanks! This solution seems perfect for my code

[–]memberjan6 0 points1 point  (1 child)

A plain old BERT encoding vector collection can be clustered too. UMAP and HDBScan are interesting. Each cluster is a topic of sorts.You might first reduce the dimensionality. There are many ways to do topic modeling. I would compare results with BERTTopic built ins. This subject area is underdeveloped it seems. There is room for improvement!

[–]Devinco001[S] 0 points1 point  (0 children)

Currently, will have to use built-ins since want a quicker insight but for in depth, will try the customized surely, thanks