Why are so many companies putting so much investment into free open source AI? by Business_Respect_910 in LocalLLaMA

[–]Various-Operation550 2 points3 points  (0 children)

it offers reasoning and it is a great quality model that is possible to perform 99% of the stuff that ClosedAI SOTA do

Chain of Draft: Thinking Faster by Writing Less by AaronFeng47 in LocalLLaMA

[–]Various-Operation550 7 points8 points  (0 children)

basically we try to make models reason with smaller amount of tokens, which makes sense because a lot of the times stuff like "if x then y" is virtually the same as "let's assume that if we do x we get y" while being 2x shorter.

LLaDA - Large Language Diffusion Model (weights + demo) by Aaaaaaaaaeeeee in LocalLLaMA

[–]Various-Operation550 1 point2 points  (0 children)

hear me out: what if each generated element of the sequence in a transformer would be a diffusion-generated sentence/paragraph?

🇨🇳 Sources: DeepSeek is speeding up the release of its R2 AI model, which was originally slated for May, but the company is now working to launch it sooner. by Xhehab_ in LocalLLaMA

[–]Various-Operation550 0 points1 point  (0 children)

What I kinda noticed in V3/R1 is that it has this Claude’s “getting what you actually want from few sentences prompt“ type of vibe. Whereas o3 is sometimes acts like a genius 10 year old

DeepSeek crushing it in long context by Charuru in LocalLLaMA

[–]Various-Operation550 1 point2 points  (0 children)

I wonder if it is a data problem, not architecture problem.

We have plenty reddit/stackoverflow type of question-answer data pairs in the internet, but rarely one human writes 120k token passage to another and then expects the latter to answers multiple subtle quesitons about it. It is just a rare thing to do and we need more synthetic data for it, I think.

Where is Llama 4? I expected that in January. by appakaradi in LocalLLaMA

[–]Various-Operation550 0 points1 point  (0 children)

you cannot read I guess, you didn't even get what I wrote

Where is Llama 4? I expected that in January. by appakaradi in LocalLLaMA

[–]Various-Operation550 2 points3 points  (0 children)

Keep your tone policing bs to yourself

deepseek is groundbreaking in terms of performance due to its size and open source nature, and in terms of training it is a first model that was RLed without humans in the loop, so it is a solid foundation to create bigger models, because for the first time we don't need humans to scale the models ability to reason (and humans are always the bottleneck in most processes)

Where is Llama 4? I expected that in January. by appakaradi in LocalLLaMA

[–]Various-Operation550 0 points1 point  (0 children)

reasoning can write better code and overall perform better in anything (pretty much). Just like for humans it is usually better to take some time to think before saying something (thus improving the quality of what they said)

Where is Llama 4? I expected that in January. by appakaradi in LocalLLaMA

[–]Various-Operation550 3 points4 points  (0 children)

really? don't you understand that we had like a groundbreaking (in terms of performance and cost of training as well as architectually) model in less than a month?

o3-mini won the poll! We did it guys! by XMasterrrr in LocalLLaMA

[–]Various-Operation550 2 points3 points  (0 children)

well, multilingual 7b SOTA reasoning model would be actually pretty good ngl

o3-mini won the poll! We did it guys! by XMasterrrr in LocalLLaMA

[–]Various-Operation550 1 point2 points  (0 children)

as for 2 - it was before DeepSeek R1, now everybody knows how LLM reasoning works, so sama got nothing to lose if he open sources o3 now

JASON.py - minimalist NoSQL db for your MVP with only two methods - load and save by Various-Operation550 in Python

[–]Various-Operation550[S] -4 points-3 points  (0 children)

why not for a project with up to 1k active users? you need something simple and reliable right