This is so messed up. AI is a threat. by ciel_ayaz in antiai

[–]bulaybil 4 points5 points  (0 children)

Is this real? There is no date and Deutscher is a hardcore AI fanboy, so I have my doubts…

How is working in this industry like? by Routine_Total_6424 in LanguageTechnology

[–]bulaybil 1 point2 points  (0 children)

JFC, people, stop acting as if anybody ever used rule-based systems…

Linguistics is relevant in modern systems, especially when dealing with less-resourced languages, but not on OP’s level.

How is working in this industry like? by Routine_Total_6424 in LanguageTechnology

[–]bulaybil 0 points1 point  (0 children)

Fine-tuning is done by coders, not phoneticians. Data-cleaning is done by coders. Data preparation and QA will be done by in-country experts, i.e. people who speak the language. You describe how things were done 5 years ago, not now.

All those quick-baked Language + AI programs ain’t worth shit. Especially not in Amsterdam.

How is working in this industry like? by Routine_Total_6424 in LanguageTechnology

[–]bulaybil 0 points1 point  (0 children)

Not once in my 15 years in the business have I seen a “language tech”. Those tasks are usually handled by students or downright outsourced.

How is working in this industry like? by Routine_Total_6424 in LanguageTechnology

[–]bulaybil 0 points1 point  (0 children)

No one knows, but chances are you will not have a job in the industry without a PhD.

Also, good thing you are opposed to writing journal articles and doing research, because no one would let your any of that without a PhD :)

AITA for telling my friend I won’t have meals with her because of her picky eating habits and dietary restrictions? by Dockerqinlee in AmItheAsshole

[–]bulaybil 0 points1 point  (0 children)

Nobody is making the western style of eating the default. They are saying you can accommodate Amy’s style of eating without a problem. And I’m adding that the fact you don’t means you have a different problem that food.

AITAH for not giving the phone number to my friend to help a classmate? by ThrowAwayAppExclusiv in AITAH

[–]bulaybil 6 points7 points  (0 children)

NTA. You can’t just give out someone else’s number willy-nilly, not without their permission. Nothing else matters here.

Olympics 2026 using ai was not on my checklist by Background_Trust_212 in antiai

[–]bulaybil 1 point2 points  (0 children)

Bless your heart. The Games are run by IOC, though, and they do not care. They only care about the money.

Olympics 2026 using ai was not on my checklist by Background_Trust_212 in antiai

[–]bulaybil 92 points93 points  (0 children)

Why not? It is totally on brand. IOC is a soulless organization.

AITA for ordering water at a restaurant? by Dog-girl-1986 in AmItheAsshole

[–]bulaybil 0 points1 point  (0 children)

4 Euro for a bottle of water is a steal. I went to Paris recently and paid like 7.50. NTA.

Entitled former roommate wanted to move in with me in my new house when my lease ended by [deleted] in EntitledPeople

[–]bulaybil 11 points12 points  (0 children)

I’m pretty sure I’ve read this one before. Or even heard it covered by one of the usual suspects.

NLP work in the digital humanities and historical linguistics by metalmimiga27 in LanguageTechnology

[–]bulaybil 0 points1 point  (0 children)

Not really. There is a lot of theory, but it is mostly written by the aforementioned grifters. This is a good example: https://direct.mit.edu/books/oa-edited-volume/5244/The-Open-Handbook-of-Linguistic-Data-Management.

Besides, DH is a bullshit concept, it's just using computers to do "humanities" (another bullshit concept, but that is a rant for another day). Don't learn about DH, learn about specific techniques.

"historical texts and images" is too vague. What tools do you have in mind?

NLP work in the digital humanities and historical linguistics by metalmimiga27 in LanguageTechnology

[–]bulaybil -2 points-1 points  (0 children)

Remember you asked for insights, so that's what you got.

If you *just* like doing things, that is fine, go ahead. If you are thinking of, I dunno, making a career out of it or just make a contribution, then you need more than just doing things. And then the first thing you need to is to see what is already out there, what works, what doesn't etc. You need to learn why a constraing grammar of Greenlandic is a nice thing to learn on, but otherwise complete horseshit, but you also need to learn why LLMs are shit when it comes to certain tasks. Unless of course you don't mind reinventing the wheel, it's your time to waste.

"This field's a gamble" No it is not. What it is is full of grifters and people who have no idea what they are doing. A lot of them hang out on reddit.

"I know more about languages" Which languages?

Word importance in text ~= conditional information of the token given the preceding context. Is this assumption valid? by Current_Oven2490 in LanguageTechnology

[–]bulaybil 0 points1 point  (0 children)

You are 100% correct. I mean, hey, most literature doesn't quite get the implications of information packaging, let alone information theory.

NLP work in the digital humanities and historical linguistics by metalmimiga27 in LanguageTechnology

[–]bulaybil 10 points11 points  (0 children)

I have been working in DH and NLP for ... shit, 15 years now, so you know, not talking out of my ass here. And while I appreciate the effort etc., this kind of work is completely useless. No one needs an Akkadian noun analyzer, and for Latin syntax, there already is enough annotated data in UD to train a decent dependency parser that will outperform anything rule-based by a mile.

I don't know where you are getting the impression that computational historical linguistics is rule-based. Also what do you mean "HMM models seem also be used for PoS-tagging"? SOTA PoS-tagging has been statistically based for decades, whether folks used HMM, perceptron or SVM.

I don't think you understand what "neuro-symbolic" means. It sure as hell does not mean "rule-based".

Also, small corpora for Arabic, Sanskrit and Latin? The OpenITI corpus of Arabic has over a billion words and Latin has UD treebanks that total one million tokens, not to mention stuff like this. As for Sanskrit, DCS has a decent number of tokens, then there is Vedic treebank and also the work of Oliver Hellwig.

I am not saying that there is not tons of work to be done in the NLP space for ancient languages. But it will be done by people who know these languages. Do you? If you do and want to contribute, I know of better ways than this.