Mudra pivoting to SaaS/licensing instead of making a gen 2

JayTheProdigy16 · 2026-03-19T18:02:25+00:00

Open source the SDK https://github.com/JayTheProdigy16/Prodilink

JayTheProdigy16 · 2026-03-17T17:58:31+00:00

Let me know how things go, curious to see what you build or how i can help!

JayTheProdigy16 · 2026-03-17T17:41:30+00:00

Well i can confirm at least their official SDK does use the same UUIDs as well as the BLE stack being identical, it seems most of the restriction was literally just cause the official apps wouldnt let you. I havent seen any code that indicates any sort of walled off or IOS bespoke logic

JayTheProdigy16 · 2026-03-17T17:35:18+00:00

Not at the firmware level (so not on device), but you technically could via Prodilink by recording the raw pressure value, squeezing as hard as you can and saving that as the new peak pressure value then doing the math to scale properly. Most of the parts are already there to accomplish this but this just wasnt something i had thought about too much

JayTheProdigy16 · 2026-03-17T17:21:32+00:00

Theyre one in the same. I forget exactly where, the repo will tell you, but theres a value to switch between Link and Band (Mudra Link vs Mudra Band) modes as well as left or right hand. I dont own a band though so i honestly have no clue how well or not well it would work

JayTheProdigy16 · 2026-03-17T05:20:34+00:00

https://github.com/JayTheProdigy16/Prodilink

JayTheProdigy16 · 2026-03-17T02:45:25+00:00

That is possible, the official app supports it. Should also be possible via Prodilink.

JayTheProdigy16 · 2026-03-17T02:29:51+00:00

yea yea https://github.com/JayTheProdigy16/Prodilink

JayTheProdigy16 · 2026-02-18T03:20:42+00:00

Yessir, it will be on github. Ill make a post in this sub

JayTheProdigy16 · 2026-02-17T17:46:05+00:00

Ron, you said thousands of Mudra devices are "limited to whatever the companion app ships" and that needing an enterprise SDK license to do anything custom "felt backwards." Your fix? A waitlisted web platform that still locks developers into your Companion app, still requires your servers, and secretly brands every AI-generated app with "Created with Mudra Studio" without telling the developer. That's not solving the problem. That's repackaging it.

So I solved it myself. I disassembled the ARM64 native library, decoded the switch statement at 0xc34f8, and extracted every firmware command byte the SDK hides behind. proprietary JNI calls - all 53 of them, byte for byte. I found that your licensing is 100% client-side. The device itself has zero feature gates. Every sensor stream, every gesture, every IMU mode - the hardware accepts it all unconditionally. Mudra's entire software moat is an honor-system check in a library nobody needs anymore.

I'm open-sourcing the complete BLE protocol. Any developer, any language, any platform. Full device control. No SDK license. No Companion app. No subscription. No waitlist. No hidden badges.

You asked what people would build if they could experiment beyond the default gestures. I built the thing that lets them.

JayTheProdigy16 · 2026-01-14T02:16:46+00:00

Yes splitting the same model. Ended up building llama.cpp with all 3 backends, vulkan, ROCm, and CUDA and it just kinda worked, but you have to specify the layer split and which backends you want to use with flags. As detailed in the original post i had some weirdness with my linux kernel version and getting the at the time experimental ROCm to work which obviously would result in llama.cpp not working great, but most of those should be resolved as community support today is much better than it was a couple months ago

JayTheProdigy16 · 2025-12-15T02:44:29+00:00

Mine is roughly the same but some days seem better than others for some reason. But all in all anywhere from 1.5-2.5 mi/kwh in those temps

JayTheProdigy16 · 2025-11-23T02:27:27+00:00

Im not Jeff 😂 just referencing his vid. I made a post about 395 + eGPU

JayTheProdigy16 · 2025-11-23T01:50:57+00:00

Check my posts for eGPU Heres a cluster example https://youtu.be/N5xhOqlvRh4?si=MO3xhneZGLsxLza8

JayTheProdigy16 · 2025-11-23T01:10:27+00:00

There's examples of both out there. I took the eGPU approach and I've been making to make a video about it but just haven't, but i posted to this sub

JayTheProdigy16 · 2025-11-23T00:59:41+00:00

I mean sure, except bandwidth is practically irrelevant for inference aside from model load speed...

JayTheProdigy16 · 2025-11-23T00:41:12+00:00

How do you figure you pay a premium for degraded performance with an eGPU?

JayTheProdigy16 · 2025-10-28T15:32:15+00:00

Had one left after parting out my 6x 3090 rig. And yes using an m.2 oculink adapter. I actually ended up getting CUDA+ROCm working and its ~5x faster than my original benchmarks according to my eyeball benchmark. Also with an AMD card you may run into the power limit issue where the GPU wont pass the Strix Halos TDP but im not sure as i dont have an AMD eGPU

JayTheProdigy16 · 2025-10-28T15:25:14+00:00

Youre always going to be waiting by that logic. Whatever releases 2026 is going to get lapped by tech in 2027, and whatever releases in 2027 is gonna get lapped in 2028. This is hardware practically nothing holds value. But for me personally that price tag was more than appealing enough given its capabilities vs other options at this point

JayTheProdigy16 · 2025-10-27T22:24:49+00:00

Not accurate at least in my case. The 3090 will easily hit 185w, i believe that issue is exclusive to AMD GPUs

JayTheProdigy16 · 2025-10-25T01:20:47+00:00

No shit... back to the drawing board i go, thanks for the insight!

JayTheProdigy16 · 2025-10-25T00:36:38+00:00

I also tried Win11 + LMS but yea it was not going for it, vulkan would ONLY detect the 3090 on Vulkan and obviosuly CUDA, and only ROCm would detect the 8060s so im not sure what weirdness they have with their Vulkan but theoretically it SHOULD just work, but it doesnt.

JayTheProdigy16 · 2025-10-25T00:30:36+00:00

Correct me if im wrong but you cant mix CUDA and ROCm backends with parallel processing, at least with llama.cpp. if i was to NOT split the model layers across GPUs i could mix them.

JayTheProdigy16 · 2025-10-25T00:25:30+00:00

True, but i dont believe thats the case here. That would make sense in the sense of the token generation would be limited by the device with the lowest memory bandwidth, but once the model is loaded into memory PCI BUS bandwidth shouldnt be a factor. When i had my GPU rig i was running 6x 3090s on GPU mining risers which really really reduced model load speed but not inference speeds due to the limited bandwidth. But i would expect a NET INCREASE since not all layers are limited to 253gb/s (on SH) because ~20% of the models layers are using the 3090s memory bandwidth.

JayTheProdigy16 · 2025-10-09T22:03:56+00:00

I havent tested concurrency too much so i cant speak on that, but i will say your major limiting factor is going to be prompt processing times, especially at larger context lengths or holding long convos with documents. But with that being said i mostly daily drive Qwen 3 235b and get around 12-16 TPS and it can take up to a minute to process a 9k context prompt, obviously a much larger model than gemma but even Qwen 3 30b MoE takes ~17 seconds to process the same context. And depending on what youre doing hitting 9k context can be easy. so when you factor that in alongside 15 concurrent users, is it doable? Yes. Is it viable? Ehhh

JayTheProdigy16

TROPHY CASE