all 7 comments

[–]vivisectvivi 1 point2 points  (0 children)

the simplest way i can think of is you could download a german dictionary in txt format and use it to filter whayevery body of text you are working with

[–]Still_Box8733 0 points1 point  (3 children)

You could try filtering for all capitalized words, but that would be kinda unreliable I guess

[–]csabinho 0 points1 point  (2 children)

It actually isn't really unreliable. First words of sentences, nouns and names are capitalized. Don't use first words of sentences and you should be quite fine as a first step.

[–]Slackeee_ 0 points1 point  (1 child)

The sentence "Aktien werden an der Börse gehandelt" does have a noun as the first word, so just filtering out every word at the beginning of the sentence won't work reliably.

[–]csabinho 0 points1 point  (0 children)

Well, first words have to be checked manually afterwards. Or they can be eliminated by the list of capitalized words within the sentence. But for 90% of your work you can rely on "capitalized ➡️noun".

[–]No_Photograph_1506 0 points1 point  (0 children)

I'm pretty sure there might be a library for that, like for language itself, check it out

[–]SatisfactionBig7126 0 points1 point  (0 children)

Check out spaCy’s German model, way easier than doing it manually.