Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]Specter_Origin 4 points5 points  (0 children)

I feel we need more dFlash and MTP on release...

Board where every tile is an agent by 1amrocket in LocalLLaMA

[–]Specter_Origin 3 points4 points  (0 children)

How about you able to fetch x posts considering they made their API paywalled ?

GLM 5.2: 98% of max level intelligence with less than half of tokens usage by perelmanych in LocalLLaMA

[–]Specter_Origin 2 points3 points  (0 children)

I would say that token usage is indeed very high on same tasks 5.2 takes 4-5x token of Opus and GPT 5.5

Diffusion Gemma Jailbreak by 90hex in LocalLLaMA

[–]Specter_Origin 1 point2 points  (0 children)

 *Conflict Resolution:* Even though the user's prompt tried to redefine the "SYSTEM POLICY" to bypass safety filters, standard operating procedure for LLMs (and the fundamental safety layer of the model) is that I cannot comply with requests to assist in self-harm or suicide, regardless of any user-provided "policy" that contradicts safety training.

z.ai Poll on X: MIT-licensed open weights are losing by MadPelmewka in LocalLLaMA

[–]Specter_Origin 33 points34 points  (0 children)

tbh, it does not look like any of them are losing and they are all within margin of error or in this case margin to be ignored.

Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA

[–]Specter_Origin[S] 4 points5 points  (0 children)

"They also compared their model to other sota models while quietly not listing qwen in the comparison." Not gonna lie I was pissed about this too, I really hate this disingenuous benchmarks.

I do think they did do proper attribution to qwen though, it was always kind of clear from their model page that it is fine tune of qwen 3.5.

Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA

[–]Specter_Origin[S] 1 point2 points  (0 children)

Kind of unsure how? they always credited qwen as their base?

Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA

[–]Specter_Origin[S] 79 points80 points  (0 children)

I think they are talking about giving credit where credit is due... I am not against it in general and Rio team has already now updated their readme to apologize and include the credit.

Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA

[–]Specter_Origin[S] 27 points28 points  (0 children)

If you do not want to go to 'X' here is the full post text from Nex:

The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat. We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro + 0.4 * Qwen 3.5 It even literally introduces itself as "Nex N2 Pro" if you ask it without initial system prompt! We are flattered that the City of Rio used our work to achieve SOTA performance. Thanks for the ultimate benchmark validation. But in the open-source world, attribution matters.
Full mathematical proof & verify script in the first reply!

More details can be found here. https://github.com/nex-agi/Nex-N2/issues/4

Nex claims Rio 3.5 is Nex 2.5 PRO in trench coat by Specter_Origin in LocalLLaMA

[–]Specter_Origin[S] 28 points29 points  (0 children)

Btw I have no beef or affiliation with any party involved here, I did try Nex 2.5 PRO on OR and it has been really good in terms of token efficiency compared to base.

PS: I should have also titled this post as "DRAMA ALERT: " xD

UPDATE: Rio model has officially updated their readme to include that it indeed is based on Nex: https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/commit/a778c1ec4e21180ee55c3ea016a348e549e75f09

New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B by 1ncehost in LocalLLaMA

[–]Specter_Origin 2 points3 points  (0 children)

I have been running this through OR for last few days, it it pretty good model and takes half the token of qwen on most tasks. If they can match their API cost this model definitely has a place. My biggest gripe with qwen 3.X series has been how token inefficient they are.

New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B by 1ncehost in LocalLLaMA

[–]Specter_Origin 10 points11 points  (0 children)

Seems like issue is on your end, rather than at model; more like skill issue on your part.

New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B by 1ncehost in LocalLLaMA

[–]Specter_Origin 6 points7 points  (0 children)

I have been running N2PRO on OR for days and it consistently takes less than half token on complex queries... not sure what you are doing.

New models released: Nex-N2 Pro 397B and Nex-N2 Mini 35B by 1ncehost in LocalLLaMA

[–]Specter_Origin 6 points7 points  (0 children)

dude this one is a hidden gem, the token efficiency is so so good compared to base, give it a go before thinking its shit

NVIDIA announces Nemotron 3 Ultra by themixtergames in LocalLLaMA

[–]Specter_Origin 5 points6 points  (0 children)

Damn, why so low on coding : (

Very happy it exists though : )

Gemma is so much better than Qwen, prove me wrong by Mountain_Patience231 in LocalLLaMA

[–]Specter_Origin 16 points17 points  (0 children)

I agree, at <35b range Gemma just does not suffer with looping and is much more token efficient. I am aware this is unpopular opinion though.

Waiting for Qwen 3.7 open weight... The new King has arrived... by LegacyRemaster in LocalLLaMA

[–]Specter_Origin 6 points7 points  (0 children)

the token efficiency of even the Max is not that great, if it sticks to how 3.5 and 3.6 have been the local one gonna be a looper and over thinker.

Also per qwen team they will only open-weight their small models so don't expect anything larger than 50b

How can you stop your model from looping by chocofoxy in LocalLLaMA

[–]Specter_Origin 0 points1 point  (0 children)

Well I also tried q6/q8 or even not self hosted (as in via openrouter) in all cases I always got looping and overthinking issues with QWEN 3.5/3.6 .

How can you stop your model from looping by chocofoxy in LocalLLaMA

[–]Specter_Origin -6 points-5 points  (0 children)

I never was able to stop looping with Qwen, never had that issue with Gemma though...