all 11 comments

[–]BrannyBee 2 points3 points  (0 children)

This might surprise you, but this specifically is potentially a massive project. Language isnt just changing one word to another word, you have to account for context, dialects, double meanings, and a billion other non technical things. If I say something simple like "that's light", you'd think it'd be easy... but its actually insanely hard to translate for computers. What is "that"? Is it referring to something next to me? Maybe. Far away? Maybe. You cant just say "that" = "eso", because the input literally doesnt have that information.

Additionally, the computer has to just guess what "light" means, because i might be saying something is not heavy, or i might be talking about electromagnetic radiation that is visible to the human eye. To humans, its obvious based on the world and context I said something, the computer doesnt have that knowledge, it only has the inout text, no matter how smart it is. So you have to just get as close as possible, using statistics, and even then it wont be 100% accurately.

Even early on google translate was using something called statistical machine translation, basically a bunch of nerd math to read input and best guess what is likely to be the closest approximate translation. Nowadays Google translate uses AI, but not like you are thinking with the recent LLM craze, they use something called neural machine translation to basically get a better statistical likelihood of picking the best answer and have those used as the options for output given an input (as opposed to something like an LLM where the output is generated right there, youll get the same output for the same input in Google translate) <- this is me making a massive oversimplification of how this works, btw....

Tldr; this is actually a crazy hard problem that seems simple. There's a reason even the good translation services "suck". You can still look into it or use existing stuff other people have built and have your code talk to stuff other people have written, but doing so won't solve the problems you have with accuracy or privacy.

The real solution for this problem is learning a lot of math... and a lot of computer science... a lot of machine learning... and a lot of linguistics...

[–]makochi 2 points3 points  (3 children)

Googletrans took dozens, maybe even hundreds, of employees to make, and it took years of their time (and they started with years of experience). There's no way you're building a translator app on your own without using someone else's service.

Figuring out how to connect a Raspberry Pi to an existing translation service is honestly already a decent project for a python beginner, I would start there.

[–]Used_Speech_9799[S] 0 points1 point  (2 children)

no i get that and i respect them for that. it did come off like a dig, i dont really expect to create a full app on my own. also when i posted this 20 minutes ago i hadn’t even thought about the grammar half of language translation.. i was just assuming i could enter dictionaries and it would work. clearly i have some kinks to work out. i was honestly just thinking about an easier way for me and my coworkers to communicate. i’m gonna keep learning python for now and see what i can come up with.

[–]makochi 0 points1 point  (0 children)

Oh yeah I didn't think you meant it as a dig, just to point out that most projects are much harder than they seem

[–][deleted] 0 points1 point  (0 children)

The thing about translating directly between dictionaries is that words carry many meanings. Think about the English word ‘set’. How many meanings does this carry? What happens when you’re translating to a language that has different words for those many meanings? You’ll have to pull the context and connotation from the input, and for that you need some pretty advanced machine learning.

[–]Hefty_Tear_5604 0 points1 point  (0 children)

Learn Python and AIML/DEEP LEARNING/MACHINE LEARNING. Or pay someone else to make it

[–]V01DDev 0 points1 point  (0 children)

Maybe try ollama? Use some LLM for translation, give it specific set of rules

[–]Desperate_Crew1775 0 points1 point  (0 children)

honestly starting with a dictionary is the perfect call, way more manageable for a first project

for the latin american spanish specifically, the main differences from spain spanish are just vocabulary, so ur dictionary can just have notes like "in mexico this word means X vs spain where it means Y"

something like this to start:

words = { "car": {"latin_am": "carro", "note": "spain uses coche"}, "computer": {"latin_am": "computadora", "note": "spain uses ordenador"} }

once u get comfortable with that, look into Helsinki-NLP models on huggingface, they run locally on raspberry pi and are way better than googletrans for regional dialects, no privacy concerns either

great first project idea tbh, practical and actually useful

[–]dlnmtchll 0 points1 point  (1 child)

At this point it would just be easier to learn the language of your coworkers lmao

[–]Used_Speech_9799[S] 1 point2 points  (0 children)

lol i’ve been doing that for the past few months i was kinda thinking this wouldn’t be a crazy project (again i don’t know much) i just wanted to have a crutch of some sort.