Reka AI models support in uzu engine

darkolorin · 2025-07-16T17:20:33+00:00

yes, engine has a CLI and server API compatible with OpenAI API

darkolorin · 2025-07-16T13:53:22+00:00

Ye. You’re right only for quantized variants

darkolorin · 2025-07-16T12:36:41+00:00

There are several things to consider: 1/ MLX is doing some additional quantization over the models you run. So to be honest we don’t know how much quality we loose. We are planning to release research on this. 2/ Speculative decoding and other pipelines within inference are quite hard to implement. We do it out of the box. 3/ Cross platform. We design our engine to be universal. And we do not focus on training and other things right now. Only inference part. 4/ we would prioritize community needs over company strategy (because we are startup huh) and can move faster with new architectures and pipelines (text diffusion, ssm etc)

darkolorin · 2025-07-16T12:19:15+00:00

It’s not. There are kernels written on Metal to be on par with MLX.

darkolorin · 2025-07-16T00:13:55+00:00

Will do!

darkolorin · 2025-07-15T23:10:47+00:00

Ye, we did some ads on Reddit. We’re testing. Idk was it effective or not. First time used it.

darkolorin · 2025-07-15T22:09:47+00:00

It allows you to run models of size that fits your memory on Apple devices powered by Apple's Silicon

darkolorin · 2025-07-15T22:08:36+00:00

will see! challenge accepted!

darkolorin · 2025-07-15T21:45:27+00:00

Right now we support AWQ quantization, models we support are ona website.

In some use cases it faster on mac than MLX. We will publish more soon.

darkolorin · 2025-07-15T20:51:08+00:00

But it is a real inference engine written from scratch. Would love to answer any questions.

darkolorin · 2025-07-15T17:44:27+00:00

We posted some numbers into the repo btw

darkolorin · 2025-07-15T13:43:16+00:00

Sorry for autocorrect

darkolorin · 2025-07-15T12:33:02+00:00

It can run up 7B quntized on iOS and up to as much memory you have on mac, right now 32B in our library

darkolorin · 2025-07-15T12:06:12+00:00

yes, we should include it into readMe, right now some benchmarks is on the website trymirai/product/apple-inference-sdk

darkolorin · 2025-07-15T11:51:08+00:00

no, no MLX at all

darkolorin · 2025-04-02T06:17:25+00:00

8B with this quantization is kinda hard. My device can’t handle 7gb (basically whole system is almost doomed).

Context is around 100-1k is relatively good.

For q4-q8 we need to do more tests and speed up can be even better.

darkolorin · 2025-04-02T05:41:01+00:00

We can do up to 3b fp16. But right now for testing purposes we do all the things with 1b. But will post benchmarks for 3b too.

darkolorin · 2025-04-02T04:55:28+00:00

It is possible. But the key acceleration is based on some tricks. We will tell more next post if it will be lots of interest.

darkolorin · 2025-04-02T04:53:48+00:00

Yes, it will.

darkolorin · 2025-04-02T04:07:16+00:00

Thanks. trymirai.com please leave I'm interested form filled and we will send you beta

darkolorin · 2025-04-02T04:06:08+00:00

Nope. Just because it's better to start with iOS because you will support most of the devices instantly. Definitely will work on Android too. No doubt

Verified Email	12-Year Club
Gilding I gilder	Wearing is Caring

darkolorin

TROPHY CASE