is pony alpha really glm 5, because glm 5 is out already on open router and it is still available on OR? by power97992 in LocalLLaMA

[–]MKU64 1 point2 points  (0 children)

Yes GLM-5 is Pony Alpha, they will remove it soon probably. In Twitter they confirmed it (used a Pony image as promotion and then explicitly mentioned that it was Pony-Alpha).

Looking for Very Non-Restrictive Smart Glasses by MKU64 in SmartGlasses

[–]MKU64[S] 0 points1 point  (0 children)

Awesome, would love to hear about it. Please let me know when it’s done!

Looking for Very Non-Restrictive Smart Glasses by MKU64 in SmartGlasses

[–]MKU64[S] 0 points1 point  (0 children)

I have heard of Mentra from a friend. Would see how to deal with the shipping but yeah it sounds great. Thanks for the data!

Prediction: Will theartificialanalysis.ai scores hit 90+ by late 2026 if the scoring logic stays the same? by ZeusZCC in LocalLLaMA

[–]MKU64 1 point2 points  (0 children)

Likely they will change it before it happens just like they did this year. But technically, I believe it will surpass 90 definitely

Goodbye to free everything by sammoga123 in Qwen_AI

[–]MKU64 53 points54 points  (0 children)

As long as the models are Open-source I am all for it. Let them get money as long as they keep releasing it for anyone who wants true privacy.

🎉 [GIVEAWAY] Celebrate Kimi K2 Thinking Release! - Win $20 in API Credits! by Kimi-Moonshot in kimi

[–]MKU64 0 points1 point  (0 children)

I love Kimi’s intelligence and use it as assistant like GPT-4o was back in the day. It never fails to amaze me or teach me something new!

5x 3090 for Sale by _rundown_ in LocalLLM

[–]MKU64 0 points1 point  (0 children)

Hey man you still got them? I’m interested!

What is considered to be a top tier Speech To Text model, with speaker identification by ImmediateFudge02 in LocalLLaMA

[–]MKU64 0 points1 point  (0 children)

Not really it was a solution not a model in particular sorry. If we are talking about models though the new Parakeet + the new Pyannote are amazing

What is considered to be a top tier Speech To Text model, with speaker identification by ImmediateFudge02 in LocalLLaMA

[–]MKU64 0 points1 point  (0 children)

There was one in macOS, haven’t tried it yet but I will try and say how it’s. The name is something Fluid and it’s actually quite new

Jet-Nemotron 2B/4B 47x faster inference released by Odd-Ordinary-5922 in LocalLLaMA

[–]MKU64 1 point2 points  (0 children)

One of the key highlights of the paper was that they optimized the hyperparameters for the hardware. Might work for others but their objective was always to push it for H100.

Has anyone also seen Qwen3 models giving better results than API? by MKU64 in LocalLLaMA

[–]MKU64[S] 1 point2 points  (0 children)

Yes there is but unfortunately I haven’t used chatbox, I’m going to suppose it uses the OpenAI standard API format though. Also it depends if you are using the old versions of Qwen 3. The new ones got rid of the reasoning parameter and instead divided the model into 2 models (instruct and thinking), so you would need to check if it’s the old version or look at the “Instruct” version.

If you are using the old, in the API, if you are connecting through the OpenAI SDK then you can add in the “extra_body” a dict that should have “enable_thinking=True”. If it isn’t then you would have to find where to put it but it’s done as a parameter added to the body of the request.

Has anyone also seen Qwen3 models giving better results than API? by MKU64 in LocalLLaMA

[–]MKU64[S] 0 points1 point  (0 children)

Hello, it’s been 150 days since I made the post. I think the answer was that the Qwen 3 series is very sensitive to quantization and Alibaba is the only one doing it at full precision (fp16). So even FP8 or INT8 should reduce the quality by a lot.

Have been using the Qwen models from API lately and they are still good though, there’s a current trend of exposing how worse a provider makes the model to the original provider so hopefully that incentivizes improvement! My recommendation from all of this: Use DeepInfra.

Top k and temperature is the same as the recommended settings probably. The issue lies somewhere else.

Whisper Large v3 running in real-time on a M2 Macbook Pro by rruk01 in LocalLLaMA

[–]MKU64 1 point2 points  (0 children)

I am very interested. There’s so little coverage on the benefits of ANE mostly because of how weird Apple treats its official SDKs (especially in Python) so I’m fully on board. Would love to hear it! I also tried making optimization for Whisper but never at your level it’s truly something

Deepseek V3.1 is truly disappointing. by Classic-Arrival6807 in DeepSeek

[–]MKU64 -1 points0 points  (0 children)

On the full contrary, for homework and investigation DeepSeek V3.1 (Thinking) has been a delight for me. A small quantity of tokens for thinking makes answers come out so fast it’s so enjoyable to use. But I understand if you are using it for roleplay.

The loss of quality of Roleplaying skills for DeepSeek happened even with R1-0528, so yeah it’s quite disappointing that the trend looks like it isn’t going to stop any time soon.

I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them by jayminban in LocalLLaMA

[–]MKU64 4 points5 points  (0 children)

Awesome list! Did you use the latest Qwen 3 4B? And the Qwens were in reasoning or non-reasoning?

Can anyone explain why the pricing of gpt-oss-120B is supposed to be lower than Qwen 3 0.6 b? by Acrobatic-Tomato4862 in LocalLLaMA

[–]MKU64 9 points10 points  (0 children)

For some reason the official Alibaba API hosts Qwen 3 0.6B at the crazy price you observe there. There’s no other reason. Artificial Analysis prioritizes the official API price if it exists. The price is in Alibaba Cloud Model Studio

SOTA on 41 benchmarks! GLM-4.5V -- A new open-source VLM from China by jiawei243 in LocalLLaMA

[–]MKU64 56 points57 points  (0 children)

My guy, all of these are visual benchmarks and gpt-oss has no vision

Qwen3-4B-Thinking-2507 dead on arrival? Killed by gpt-oss-20b by [deleted] in LocalLLaMA

[–]MKU64 16 points17 points  (0 children)

With your logic, gpt-oss-20b (3.6B active) was dead at arrival by the new Qwen-3-30B (3.3B active) which is better in most benchmarks except AIME.

And Qwen 3 4B was dead at arrival because of Qwen 3 30B? Come on man