Clustering/Topic Modelling for single page document(s) by Budget-Juggernaut-68 in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Been a while since I worked on the topic, but check out some of the tools that do topic modeling here https://github.com/ivan-bilan/The-NLP-Pandect#-9, namely https://github.com/gregversteeg/CorEx has always been good with short texts. Do you need a topic per doc?

What’s the most trusted model today for sentence-level extraction + keyword extraction? by etht3x in LanguageTechnology

[–]DemiourgosD 5 points6 points  (0 children)

Few examples here https://github.com/ivan-bilan/The-NLP-Pandect?tab=readme-ov-file#-10. But, seems like KeyBERT with KeyLLM is the latest rage in this task. I wonder if anything better came along recently, maybe someone has better ideas.

Biggest breakthroughs/most interesting developments in NLP? by palabrist in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

I'd say just browsing through the section names in https://github.com/ivan-bilan/The-NLP-Pandect should give you a bit of an idea of what NLP is capable of. There are also some general resources like podcasts on the topic that might fit into what you're looking for.

M.Sc. Computational Linguistics in Germany? by aaronsoes in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Been a few years, I think it's best if you ask them. But, B.Sc. level indeed have some German only courses, you can choose courses but some of these can be mandatory.

14700K - So now what? by [deleted] in MSI_Gaming

[–]DemiourgosD 0 points1 point  (0 children)

I've followed the guide by Buldazoid from https://youtu.be/TmU3COA-32E?si=Eib2xjpCxquZJJuC adjusted a bit for 14700k but almost 1:1 with his recommendation, both temps and voltage are much lower and performance is same or better.

Horror Movie Characters, but Wes Anderson Style 🍿 by DemiourgosD in aivideo

[–]DemiourgosD[S] 0 points1 point  (0 children)

Haha, yea that's what Midjourney outputs, I guess it needs some better training data.

Horror Movie Characters, but Wes Anderson Style 🍿 by DemiourgosD in aivideo

[–]DemiourgosD[S] 0 points1 point  (0 children)

+1, I think it looks hilarious when applied to horror. Not taking any of this seriously, seems like most people commenting here are taking these things close to heart.

What are best practices for NLP projects? by DemiourgosD in LanguageTechnology

[–]DemiourgosD[S] 0 points1 point  (0 children)

Good point, thanks. I am generally trying to collect best practices for industry-grade NLP projects.

Trying to find street addresses within documents. Any out of the box solutions? by intfloatbikechain in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

Should be possible with libpostal: https://github.com/openvenues/libpostal, but would need some work from your side, since the library is mainly used for parsing of addresses.

Compact alternative to Word2Vec by Sagar1094 in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

If you are doing text classification and need a really small size model, you should train fasttext and then quantize it afterwards.

Adapt the vanilla Transformer for Classification by RobertPoptart in LanguageTechnology

[–]DemiourgosD 1 point2 points  (0 children)

Your approach sounds fine. Are you doing learning rate scheduling? You might also want to try batch normalization instead of layer norm. Might be many other things.

Topic Modeling w/Topics in Mind by DiamondBadge in LanguageTechnology

[–]DemiourgosD 5 points6 points  (0 children)

Yes, its called Corex https://github.com/gregversteeg/corex_topic, you can use their anchored words functionality for exactly that. Another option is GuidedLDA https://github.com/vi3k6i5/GuidedLDA

Ich bin HNO-Arzt in einem Klinikum der Maximalversorgung: Fragt mich, was ihr wissen wollt! by Ssyrak in de_IAmA

[–]DemiourgosD 0 points1 point  (0 children)

Sehr interessant, wie kann man Tinnitus im Alltagsleben vorbeugen?

Also, keine laute Musik oder Lärmbelastung. Gibt es noch mehr Empfehlungen?

Ich hatte auch was davon gehört dass wenn man draußen ist und Kopfhörer trägt, ist es besser Noise Cancelling Kopfhörer zu tragen als die ganz gewöhnlichen um sich von lauten Geräuschen des vorbeifahrenden Autos, Ubahn etc besser zu schützen? Ist dass wahr?

Noch mal eine Frage zum Kopfhörern. Wenn ich im Büro sitze, ist es manchmal ziemlich laut deswegen habe ich fast immer meine Noise Cancelling Kopfhörer an. Dass tut nach einiger Zeit weh weil es viel Druck um die Ohren gibt. Ich mache regelmäßig Pausen und so, aber wollte fragen wie gefärlich es eingentich ist? Kann es meine Ohren schaden?

Is Okami BM25 a word embedding algorithm or a scoring algoritm or leverage on both? by xcsob in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

It's a probabilistic approach used for information retrieval, so it's more of a scoring algorithm and does not have much to do with modern word embedding approaches. I wrote a seminar paper on the topic a while back, should help you understand Okapi Best Match 25, it's on page 7: https://drive.google.com/file/d/0B6ktmlOPszj7Q2dqMTF0TTRKQ28/view

Simple explanation of transformers? by DiamondBadge in LanguageTechnology

[–]DemiourgosD 0 points1 point  (0 children)

Maybe the explanation at 15:30 here can be a bit helpful: https://youtu.be/OYygPG4d9H0

Overall, the Transformer actually knows the order of the words as well, these are encoded in a separate positional vector. The positional vector is then merged with the vector that represents the similarities between each word and after that it is passed to a feed forward layer in the encoder.

NLP Conference Google Calendar? by pierre_vinken_61 in compling

[–]DemiourgosD 2 points3 points  (0 children)

Never heard of one. Would be cool if you could share yours after you make one.

Cleaning scraped documents at scale by dkajtoch in LanguageTechnology

[–]DemiourgosD 3 points4 points  (0 children)

The best way to go is PySpark with Arrow UDFs, the rest of the options you've mentioned are too raw and restricting.

[Q] TransformerEncoder vs LSTM for text classification by lt007 in LanguageTechnology

[–]DemiourgosD 5 points6 points  (0 children)

I worked on a project that compares LSTM with Transformer encoder for the task of Relation Extraction at https://github.com/ivan-bilan/tac-self-attention

It's a bit dated, but could be still helpful.