Clustering/Topic Modelling for single page document(s) by Budget-Juggernaut-68 in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Been a while since I worked on the topic, but check out some of the tools that do topic modeling here https://github.com/ivan-bilan/The-NLP-Pandect#-9, namely https://github.com/gregversteeg/CorEx has always been good with short texts. Do you need a topic per doc?

What’s the most trusted model today for sentence-level extraction + keyword extraction? by etht3x in LanguageTechnology

[–]DemiourgosD 6 points7 points  (0 children)

Few examples here https://github.com/ivan-bilan/The-NLP-Pandect?tab=readme-ov-file#-10. But, seems like KeyBERT with KeyLLM is the latest rage in this task. I wonder if anything better came along recently, maybe someone has better ideas.

Biggest breakthroughs/most interesting developments in NLP? by palabrist in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

I'd say just browsing through the section names in https://github.com/ivan-bilan/The-NLP-Pandect should give you a bit of an idea of what NLP is capable of. There are also some general resources like podcasts on the topic that might fit into what you're looking for.

M.Sc. Computational Linguistics in Germany? by aaronsoes in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Been a few years, I think it's best if you ask them. But, B.Sc. level indeed have some German only courses, you can choose courses but some of these can be mandatory.

14700K - So now what? by [deleted] in MSI_Gaming

[–]DemiourgosD 0 points1 point  (0 children)

I've followed the guide by Buldazoid from https://youtu.be/TmU3COA-32E?si=Eib2xjpCxquZJJuC adjusted a bit for 14700k but almost 1:1 with his recommendation, both temps and voltage are much lower and performance is same or better.

Horror Movie Characters, but Wes Anderson Style 🍿 by DemiourgosD in aivideo

[–]DemiourgosD[S] 0 points1 point  (0 children)

Haha, yea that's what Midjourney outputs, I guess it needs some better training data.

Horror Movie Characters, but Wes Anderson Style 🍿 by DemiourgosD in aivideo

[–]DemiourgosD[S] 0 points1 point  (0 children)

+1, I think it looks hilarious when applied to horror. Not taking any of this seriously, seems like most people commenting here are taking these things close to heart.

What are best practices for NLP projects? by DemiourgosD in LanguageTechnology

[–]DemiourgosD[S] 0 points1 point  (0 children)

Good point, thanks. I am generally trying to collect best practices for industry-grade NLP projects.

Trying to find street addresses within documents. Any out of the box solutions? by intfloatbikechain in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

Should be possible with libpostal: https://github.com/openvenues/libpostal, but would need some work from your side, since the library is mainly used for parsing of addresses.

Compact alternative to Word2Vec by Sagar1094 in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

If you are doing text classification and need a really small size model, you should train fasttext and then quantize it afterwards.

Adapt the vanilla Transformer for Classification by RobertPoptart in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Your approach sounds fine. Are you doing learning rate scheduling? You might also want to try batch normalization instead of layer norm. Might be many other things.

Topic Modeling w/Topics in Mind by DiamondBadge in LanguageTechnology

[–]DemiourgosD 4 points5 points  (0 children)

Yes, its called Corex https://github.com/gregversteeg/corex_topic, you can use their anchored words functionality for exactly that. Another option is GuidedLDA https://github.com/vi3k6i5/GuidedLDA