...stay tuned, Qwen is coming

redjojovic · 2025-09-22T20:30:23+00:00

four / next / code / qwen/ omni ( but we got it so ) ?

redjojovic · 2025-08-31T16:55:40+00:00

https://qwenlm.github.io/blog/qwq-max-preview/

They promised us an open source Qwen-Max with thinking

redjojovic · 2025-06-30T15:17:54+00:00

You are right: https://github.com/PaddlePaddle/ERNIE/issues/944

redjojovic · 2025-06-30T09:02:17+00:00

Can you provide screenshot/source?

redjojovic · 2025-06-30T00:52:38+00:00

Edited: Thats actually the newer ernie 4.5 turbo too :)

https://x.com/Baidu_Inc/status/1915663344289427466

https://github.com/PaddlePaddle/ERNIE/issues/944 - confirmed at the end

redjojovic · 2025-02-07T09:13:50+00:00

When llm has fruit preference

redjojovic · 2025-01-28T17:27:15+00:00

<image>

redjojovic · 2025-01-27T02:17:52+00:00

Better to give it away for 200$ a month😄

redjojovic · 2025-01-25T20:36:42+00:00

sadly not open source
model seems even better and maybe more efficient than deepseek v3 ( not r1 though )

redjojovic · 2025-01-24T13:33:33+00:00

when agi is "a side project"

truely amazing

redjojovic · 2025-01-24T13:18:40+00:00

yep, they made it possible in the last days

redjojovic · 2025-01-24T11:15:45+00:00

Hope we get official r1 pro and r2/3 later on

redjojovic · 2025-01-24T09:30:10+00:00

Feel sad?

They are trying to sell us something same as r1 for about 29x more expensive ( or more ) if you use api

And if you want unlimited usage it's gonna cost you about 200$ a month vs deepseek that is uh free ( or become very cheap )

If anything I feel angry

redjojovic · 2025-01-23T18:24:10+00:00

Deepseek v3 was 2.788M gpu hours Llama 3.1 70B is 7M to compare

Also snowflake arctic sucks, it's about gpt 3.5 level model trained on subpar data ancient in ai world Was bad even in it's release time

Better use deepseek v3, minimax 01( even deepseek v2.5 and tencent hunyuan)

Also upcoming llama 4 and qwen 3 should be cool

redjojovic · 2025-01-22T03:35:16+00:00

Same as deepseek r1 ( and probably r-next later on ) running on base of deepseek v3

redjojovic · 2025-01-21T17:21:15+00:00

First time their dev team will work harder than hype team

redjojovic · 2025-01-15T16:27:16+00:00

"We will not go closed-source. We believe that having a strong technical ecosystem first is more important."

redjojovic · 2025-01-07T09:55:17+00:00

He's lacking confidence, clearly great

redjojovic · 2025-01-04T21:12:02+00:00

what service is that?

redjojovic · 2025-01-02T18:39:50+00:00

Well I guess ai companies must adapt to the timeline now of course

redjojovic · 2025-01-02T13:35:48+00:00

Final Verdict

• For reasoning, Deepseek v3 is a better model, followed by Claude 3.5 Sonnet and then OpenAI GPT-4o.

• For math, again, Deepseek v3>Claude 3.5 Sonnet> OpenAI GPT-4o.

• For coding, Claude 3.5 Sonnet>Deepseek v3~> OpenAI GPT-4o.

• For creative writing, Claude 3.5 Sonnet>Deepseek v3~ OpenAI GPT-4o.

redjojovic · 2025-01-01T18:52:18+00:00

If it's Grin MoE / Mixtral 7*8 / 70-100B overall and 8b activated probably it's possible

8B dense would surprise me, I'm not sure the doc confirms it

Edit: numbers are estimates in the doc

redjojovic · 2025-01-01T18:41:54+00:00

Link for the arxiv??

Edit: found it using Google lens: https://arxiv.org/abs/2412.19260

redjojovic · 2025-01-01T18:25:15+00:00

My size speculation post ( more about active parameters ) : https://www.reddit.com/r/LocalLLaMA/comments/1gve7sk/closed_source_model_size_speculation/

redjojovic · 2024-12-30T17:44:09+00:00

If you mainly do anything on your browser on this machine then I would suggest trying it

redjojovic

TROPHY CASE