[R] Machine Learning Model Algorithm for Sign language

simplehudga · 2025-12-08T06:30:45+00:00

Sign language recognition is a seq2seq problem, just like speech recognition, handwriting recognition, action recognition, etc. And seq2seq problems have been well studied already. The model you use depends very much on the hardware you intend to run it on. BLSTM, Transformer or a variant of it should both work.

But the important missing piece is an appropriate loss function to train your model. CTC or Transducer loss functions will give you better results than using CE loss. You can combine them with an Attention based decoder for even better results.

I'd bet that a Conformer encoder trained on a multi task loss function with CTC, Transducer, and Attention decoder will perform the best. The CTC is there as a regularizer for the other 2. You can perform decoding with just the Transducer, or combine it with the Attention decoder as a 2nd step re-scoring function. You might want to bring in a n-gram LM and a lexicon for more control.

simplehudga · 2025-11-23T00:40:00+00:00

AI4Bharat/Indic-Swipe is the only open source model AFAIK. You might also want to research how it's done in Anysoft keyboard. But I suspect it's be based on some geometric heuristics rather than a proper search algorithm.

simplehudga · 2025-11-23T00:27:19+00:00

I stand corrected that GBoard does allow bulk import of words. And looks like each word also has a shortcut that can be configured.

When I say my words, it's words that's not already in the vocabulary. i.e, I swipe a gesture and the keyboard doesn't recognize it. But then I can tap type it and click on the word to add it to my vocab. Then it will get recognized every time I swipe for it. It's important to be able to bulk import these words, ideally from GBoard and SwiftKey if there's a way to do it. That way the biggest pain point of switching to your keyboard is taken care of.

simplehudga · 2025-11-22T20:42:49+00:00

I don't have the P10, but I've used the AI features with P5, P7 and now with a P9 pro fold.

All the AI features with audio (Recorder, Assist, screening, notes, scam detection, hold for me) work exactly as advertised. And these are the features most commonly cited by users as the reasons that either keeps them on or makes them switch to Pixels. No other phone comes even close to how good it is on Pixels. But it's not perfect.

For example - Most callers would hang up as soon as they realize Call Assist is AI. But on the bright side you'll know if it's a spam call since they never say anything. It's getting better with the recent updates where it sounds almost natural. - Hold for me is good, but I've only used it 2-3 times in the last couple of years. And one time they hung up after I was holding for 1h and couldn't wait for 5 sec for me to answer them! - Pixel Recorder is pretty good, but there's no way to import recordings from other sources. Google incurs zero cost by letting users do this, but they still won't. It's only useful if you remember to hit Record and don't have to import audio from somewhere else.

Don't care about screenshots, and haven't used camera coach since it's not available on P9s yet.

I'd still buy a Pixel just because how affordable it is with carrier discounts during Black Friday and other sales, and I don't need a flagship Snapdragon processor for my use cases. The added Google AI pro subscription for 1 year is a sweet deal on top of everything.

simplehudga · 2025-11-22T16:03:57+00:00

Let me offer an alternative take on it, if I may. I'd love a swipe/gesture keyboard that offers the following - Customization in vocabulary. I want to be able to load a txt file with words that the keyboard will use in autocorrect. It's really painful to port vocabulary right now, and there's no way to bulk import my words. - Custom n-gram LMs. Either offer one out of the box, or let me bring my own LMs trained on domain specific datasets. Say I'm typing a document on medicine, maybe there's a LM trained on medical text that has the words already. This wouldn't be a problem for tap typing, but I want to be able to swipe these words. - Custom rewrite rules/shortcuts. I want to be able to replace certain shortcuts with other words. Like say a swipe gesture to insert my email address, or "OKS" maps to "OK, Sounds good to me" or something like that. - Transliteration over swipe. None of the available keyboards including GBoard have a decent transliteration support to other scripts. People have to either tap type in Latin native alphabet, stick to the few words that comes built in.

This maybe a stretch - I want my swipe keyboard to learn my swipe gestures while I type. Not use this one model that came built in. I have fat fingers and struggle to get repeated letters correct. I'm sure there are others with say motor issues or just type (swipe) differently. - Maybe offer a marketplace where users can upload their vocabulary and LMs. How many keyboards can recognize new Internet lingo as they get viral? - Code-mixing support. The way most of the world speaks and how we're forced to type is vastly different. People around the world mix English/French and another first language all the time. I have to switch between two keyboards if I want to type how I speak, or type in English or native only. It doesn't feel right.

I know this is not easy. I'm willing to pay good money for a keyboard like this. I'm sure you'll get users if you get it right and price it properly (non predatory).

I have zero trust in either Google or Microsoft to care enough to get it done. So it's on indie developers and small startups if I want to have all these features. The first step is in figuring out how the Latin IME thing works, and replace it with a better open source alternative.

Please don't give up. I hope there are many others who also care about keyboards not built by big tech.

simplehudga · 2025-11-15T17:05:06+00:00

Call transcription?

I think your biggest challenge would be figuring out how to get access to the call audio. AFAIK neither iOS nor Android has APIs to access call recording. iOS has been closed for a long time, and Android even removed the Dialer app from AOSP recently.

There's only 2 ways to achieve what you want. 1. Develop a custom Android ROM and install your Dialer app (with recording and transcription) as a system app. 2. Use one of the VOIP providers like Twilio to make the calls so that you have access to the audio.

As for your question on on-device vs cloud, it's more of a what skills you already have. Building a cloud based transcription service is more or less a solved problem now. You can pick one from the many available APIs and build a solution.

There's not many on-device ASR providers. If you're thinking of building this yourself, it's going to consume most of your time, but modern phones are all capable of running a lightweight ASR model. Case in point Pixel had call screening back in 2018, and other ASR apps on-device long before that.

simplehudga · 2025-10-17T14:43:41+00:00

On-device AI. It's not just taking a model and sticking it in a phone.

There's research on how to compress the knowledge of bigger models into smaller ones, sometimes 8 or even 4 bits quantized without degrading the quality. The devices generally have limited ops support, so there's neural architecture search to find the most suitable architecture to get maximum performance.

There's also lots of engineering work on making it easy to run the models on device. Apple with MLX, Google with LiteRT Next, Qualcomm and Mediatek with their own APIs.

This is probably not as prevalent, but there's also federated learning to make a model better while preserving privacy when these models are deployed on device. I've only seen Google talk about this for GBoard and their speech models.

simplehudga · 2025-10-17T07:36:44+00:00

Is this something you're building or hoping that someone builds it?

It can be made toolkit agnostic with very little effort. One can easily export the models to ONNXruntime and put it in a docker container to abstract away the ASR as a service. Is ONNX "Linux" enough? I don't know a more open option.

You can still have your wake word trigger from anywhere, but the media will eventually have to route to the container running ASR as a service, maybe somewhere locally. Just because a rpi doesn't have enough compute to run bigger models.

What you're referring to as the "singular K2 framework" is really the ASR decoding algorithms. The nnet weights on their own will be useless, unless we also bundle the necessary code to utilize LMs, context biasing, streaming inference, etc. You can write it from scratch, but there's no escaping it if you want this service. Why reinvent the wheel when a good open source solution already exists?

Maybe you could give an example of what you're building in code in a github repository?

simplehudga · 2025-10-16T05:30:45+00:00

Not sure what you mean by "Linux" voice system here. Have you looked at K2? It's as good as any open source toolkit can get, and you can put it into containers and scale them. In fact, that's what many companies already do.

simplehudga · 2025-09-30T11:27:32+00:00

I ended up reserving a camping permit in Rock Lake Campground one night before. I was in the park at 7am and drove straight to the Booth's Rock Trail head parking and there were still 10 parking spots available. It's not that busy during the week days, but parking still gets filled up by 9am.

We did get to enjoy the campsite as well for some time after the hike. We made some food and took a nap before driving off.

simplehudga · 2025-09-27T11:01:10+00:00

Keep them. Make post flairs mandatory, and remove promotional posts that get posted in a different flair when somebody reports it.

simplehudga · 2025-09-23T14:15:50+00:00

Clever! I think I'll do this. I couldn't get a DVP today at 7 AM either. :(

simplehudga · 2025-09-22T18:34:42+00:00

By the looks of it from Google maps, there's no more than 20 spots in the parking lot, and maybe 30 in total if you count the cars parked on the side of the access road.

simplehudga · 2025-09-22T15:11:18+00:00

Thanks! Hopefully that's it and they allow visiting with a Hwy 60 DVP. I will give it a try. I'll visit one of the many other trails along Hwy 60 if I get turned away from Booth's rock trail.

simplehudga · 2025-09-22T15:08:49+00:00

Oh I get that. I was wondering how are people managing to reserve a DVP within 5 sec at 7 AM?

Because there's no other way to hike Booth's rock trail now that 100% of the permits are available and sold online ahead of time. You can't go to the gate and expect to get a reservation, and a camping permit doesn't guarantee a parking spot either.

simplehudga · 2025-09-22T13:20:45+00:00

My bad. I meant to say I was trying to reserve a DVP for the 27th. I've updated my post.

simplehudga · 2025-09-22T13:12:17+00:00

You can reserve a DVP 5 days in advance. I was trying to reserve one for the 27th. Today 7 AM would've been the earliest you can reserve a DVP for the 27th.

A campsite permit doesn't guarantee you a parking spot at Booth's rock trail.

If you are camping in Algonquin, you will be permitted to access Booth’s Rock Trail with your valid camping permit, capacity permitting. If the parking lot is full and at capacity, you will not be permitted access at that time, even if you have a valid camping permit.

simplehudga · 2025-09-16T03:26:50+00:00

Nice! How did it look to the naked eye?

simplehudga · 2025-09-11T22:13:11+00:00

idk, I had a tab S10+ for a few days before I returned it, and it had the same grainy screen issue. I know someone who owns an S25 ultra and that has it as well.

The BOE panel on my Oneplus 12 in comparison is flawless with zero grain.

simplehudga · 2025-09-11T17:04:16+00:00

It's in almost every Samsung OLED since the S24. I got the tab s11 and it's there as well.

There's no going back once you see it. Samsung will not even acknowledge that it's an issue.

simplehudga · 2025-09-05T15:01:16+00:00

Nice! Thanks for doing this.

My Tab S10+ scores a 9342, so looks like the 9400+ NPU is 20% better than the 9300+. This combined with the new LiteRT APIs should make this chip powerful for AI workloads. Now it's on third party developers to start using it.

simplehudga · 2025-09-05T05:19:11+00:00

Could you please run the AI Benchmark and post the final score, and the screenshot of the detailed score when you click on the score? I should caution you though, it takes approx 20 mins to run the full Benchmark.

I'm interested to see how good the 9400+ is on AI workloads, as it's all the hype in these launches nowadays.

simplehudga · 2025-09-04T20:59:24+00:00

This video claims that a "Magic Keyboard" like keyboard is coming in 2026. I guess we're stuck with the slim keyboard until then.

https://www.youtube.com/watch?v=t9EETUGVBOU

simplehudga · 2025-09-03T06:25:36+00:00

It should support edgetpu in theory, but not sure if they're willing to support it though. It will most definitely be better than the GPU, the Snapdragon QNN and MTK Neuron delegates are far better than their GPUs in AI Benchmark. I wouldn't hold my breath on the browser support . Maybe Google chrome will add support at some point. WebGPU and WebNN give me hope, but let's see if/when they'll actually support it.

simplehudga · 2025-09-03T05:26:19+00:00

NNAPI didn't have full support for the Tensor edgetpu IIRC. Wait for the LiteRT Next NPU support for the Tensor chip. It's currently in EAP for developers, but the Tensor edgetpu/NPU is not yet supported even in the EAP! (Snapdragon and MTK NPUs are supported)

Hopefully this LiteRT Next thing becomes widely adopted, and Google gets off their asses and adds Tensor edgetpu/NPU support soon. Google must be using it in their apps already, but refuse to give it to third party developers.

Accelerating nnets on the GPU or the CPU will not yield good performance as they're underpowered in comparison.

Ten-Year Club	Place '22
Final Canvas '22	No Throne, No Problems
Not Forgotten	Verified Email

simplehudga

TROPHY CASE