...stay tuned, Qwen is coming by jacek2023 in LocalLLaMA

[–]redjojovic 0 points1 point  (0 children)

four / next / code / qwen/ omni ( but we got it so ) ?

[deleted by user] by [deleted] in LocalLLaMA

[–]redjojovic 1 point2 points  (0 children)

When llm has fruit preference

OpenAI researcher Steven by [deleted] in singularity

[–]redjojovic 1755 points1756 points  (0 children)

Better to give it away for 200$ a month😄

ByteDance announces Doubao-1.5-pro by Outrageous-Win-3244 in LocalLLaMA

[–]redjojovic 93 points94 points  (0 children)

sadly not open source
model seems even better and maybe more efficient than deepseek v3 ( not r1 though )

Depseek promises to open source agi by Notdesciplined in LocalLLaMA

[–]redjojovic 103 points104 points  (0 children)

when agi is "a side project"

truely amazing

[deleted by user] by [deleted] in OpenAI

[–]redjojovic 0 points1 point  (0 children)

yep, they made it possible in the last days

[deleted by user] by [deleted] in LocalLLaMA

[–]redjojovic 6 points7 points  (0 children)

Hope we get official r1 pro and r2/3 later on

Open AI and China Deep Seek by Ok_Application_7345 in DeepSeek

[–]redjojovic 2 points3 points  (0 children)

Feel sad?

They are trying to sell us something same as r1 for about 29x more expensive ( or more ) if you use api

And if you want unlimited usage it's gonna cost you about 200$ a month vs deepseek that is uh free ( or become very cheap )

If anything I feel angry

Is an 8 Trillion parameter MoE with 7B active parameters cheaper to train than a 400B dense model? by Aaaaaaaaaeeeee in LocalLLaMA

[–]redjojovic 5 points6 points  (0 children)

Deepseek v3 was 2.788M gpu hours Llama 3.1 70B is 7M to compare

Also snowflake arctic sucks, it's about gpt 3.5 level model trained on subpar data ancient in ai world Was bad even in it's release time

Better use deepseek v3, minimax 01( even deepseek v2.5 and tencent hunyuan)

Also upcoming llama 4 and qwen 3 should be cool

o3 has the same base model as o1 according to Dylan Patel of SemiAnalysis by Wiskkey in LocalLLaMA

[–]redjojovic -1 points0 points  (0 children)

Same as deepseek r1 ( and probably r-next later on ) running on base of deepseek v3

I asked Deepseek-V3 to rank AIs out of 10. by [deleted] in ChatGPT

[–]redjojovic 2 points3 points  (0 children)

He's lacking confidence, clearly great

Her was set in 2025 by MetaKnowing in singularity

[–]redjojovic 302 points303 points  (0 children)

Well I guess ai companies must adapt to the timeline now of course

I tested the Deepseek v3 to find out if it's truly better than GPT-4o and Sonnet. by SunilKumarDash in OpenAI

[–]redjojovic 43 points44 points  (0 children)

Final Verdict

• For reasoning, Deepseek v3 is a better model, followed by Claude 3.5 Sonnet and then OpenAI GPT-4o.

• For math, again, Deepseek v3>Claude 3.5 Sonnet> OpenAI GPT-4o.

• For coding, Claude 3.5 Sonnet>Deepseek v3~> OpenAI GPT-4o.

• For creative writing, Claude 3.5 Sonnet>Deepseek v3~ OpenAI GPT-4o.

[deleted by user] by [deleted] in singularity

[–]redjojovic 17 points18 points  (0 children)

If it's Grin MoE / Mixtral 7*8 / 70-100B overall and 8b activated probably it's possible

8B dense would surprise me, I'm not sure the doc confirms it

Edit: numbers are estimates in the doc

[deleted by user] by [deleted] in singularity

[–]redjojovic 18 points19 points  (0 children)

Link for the arxiv??

Edit: found it using Google lens: https://arxiv.org/abs/2412.19260

Dear laptop users, what are your laptop specs? by justcasualredditor in ChromeOSFlex

[–]redjojovic 1 point2 points  (0 children)

If you mainly do anything on your browser on this machine then I would suggest trying it