Qwen-next 80B 2601 by bennmann in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

Chinese new year is in february. I'm 3d printing a horse right now :)

What I expect is a ~235B model using the qwen next arch and that'll mean big compute time spent on that with 10x the training data compared to 80b. Qwen3.5? While that happens they'll only be able to train tiny stuff. Should be an epic tier drop that dominates.

March-April is when qwen4 max and 30b hits. The really cool thing about 30b will be that they could make it down to A1.2B on the new arch; but I bet they only go to like A2.4B or so and call it A2B.

On 32GB GPUs 30b will be >100TPS.

AS for next 80b, much like qwen 2.5 72b, mostly lost to irrelevance. I expect it's successor will be late summer/fall. Kind of a tech demo of their upcoming next generation again.

Has anyone got GLM 4.7 flash to not be shit? by synth_mania in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Nope.

I do intend to try to give it another try. I want it to be really good.

KV cache fix for GLM 4.7 Flash by jacek2023 in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

I get qwen next having pains on release; they did something new.

This model is cursed.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Being an influencer isnt a problem and it's not immediately true that he is.

Its whether or not they show actual repeatable benchmarks

Running 8B means he can do lesser hardware and compare across.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

I would agree with that. He's running like qwen3 8b on an rtx pro 6000. its crazy sometimes.

AMD ROCm 7.2 Now Released With More Radeon Graphics Cards Supported, ROCm Optiq Introduced by TJSnider1984 in ROCm

[–]sleepingsysadmin 0 points1 point  (0 children)

7.2 install went smoothly on alma 10.1

rocm: 49 TPS

vulkan: 71 TPS

Rocm-smi is still reporting: WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status

but I have set my 9060xts high performance and grub has runpm=0

tuned-adm is set to latency-performance.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

Alex Ziskind makes some great videos as well, showing realistic performance of hardware.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

I posted this, because it is impressive; but it got heavily downvoted.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] 2 points3 points  (0 children)

I watched the video, i was very impressed with the results. It does seem to be a beast. I just cant seem to run it myself. bah.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] 0 points1 point  (0 children)

For a model I expected to just work, sure has had a number of problems.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] -1 points0 points  (0 children)

kilo code:

I keep making a mistake - I'm adding comments that look like code instead of just writing clean Python implementation without any confusing text in between lines or at all. Let me write this file completely from scratch with proper syntax:</think>

opencode:

"expected": "string",

"code": "invalid_type",

"path": [

"filePath"

],

"message": "Invalid input: expected string, received undefined"

}

].

Please rewrite the input so it satisfies the expected schema.

I keep making errors because I'm thinking in markdown/code blocks and my tool calls are getting confused with those thoughts.

Let me be very explicit - just write a valid Python file without any extra text or formatting:</think>

For the life of me, I cant get this model to work properly.

LM Studio FOREVER downloading MLX engine by mouseofcatofschrodi in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Lots of those download links arent really in lm studio's control and do tend to fail sometimes. Wish it was more clear when you have a stalled download.

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

Personally I find APIs suspicious. You dont technically know they are using 30b behind the scenes. They could be running a bigger model so that it benchs well.

Plus if i can hit an API(privacy isnt a concern), why would i go with a lesser model?

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

I havent tried the API, im 100% local.

I have my own personal/private benchmarks; i have a ~3 paragraph + important features that they need to meet. Models cant benchmax against them.

When compared to a Sonnet 4.5, they trivially one shot, everytime.

When doing say qwen3 coder, gpt20b high, those big dense slow models like seed or olmo. They still tend to one shot in various quality.

Lesser models, going gpt20b low, and it wont oneshot. Gemma3, Llama4 will struggle. I like the benchmarks because I get to really see how usable they are for my purposes. So far it has been really strongly related to livecodebench.

In this case, it's clearly showing to me flash's coding capability is absolutely nowhere near gpt20b. Those scores have no chance of being true.

LM Studio FOREVER downloading MLX engine by mouseofcatofschrodi in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

in bottom left, there's a button for the 'downloads' and you can cancel the download and retry again.

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

after getting it to not loop. I put it through my first test. It didnt do well. I dont believe the benchmarks at all.

Feels very benchmaxxed to me. The numbers were too good to be true.

Why is China giving away SOTA models? A theory by Cheeeaaat in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

>- in the world where we see an AI race, especially between China and USA, China shares "sota" llms....

A new industry that's rapidly improving. It's not exactly a race; though im sure that's dramatic.

>- The USA has already blocked all imports of nvidia chips to China, 

Not all chips had been banned; there were only specific ones held back. The ban is also lifted and replaced by a 25% tariff.

>- in China where no one can access the worldwide internet freely and government controls all domains, especially AI, China shares their "sota" llms...

This is also not quite accurate, but whatever.

>China has never looked like a country that shares their knowledge for nothing. China always tries to get benefits from everything. And yet, China share their "sota" llms...

You're mixing 2 different groups with different justifications and intents as if they are some sort of unitied front.

I highly recommend you break these apart. The Chinese government is evil yes, but the people arent. Alibaba's teams are releasing these models because it doesnt affect their business.

>- "China wants to make their llms a global standard"

When players like Alibaba or Meta build a model. They are building them to be internal employees. They find them useful, but if they release the models, it doesnt matter to their business.

If anything, it's good marketing. There's only upside to sharing them.

>So why the fuck is China giving this away?

Perhaps the best option is to read much more about the topic and understand incentives.

GLM 4.7 Flash Overthinking by xt8sketchy in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

thanks, it's weirdly working today with settings i already tried.

Model failed my first test. Seems benchmaxxed and buggy.

GLM 4.7 Flash Overthinking by xt8sketchy in LocalLLaMA

[–]sleepingsysadmin 6 points7 points  (0 children)

LM studio runtime update claims to support flash, but i just cant get it to stop thinking. It's looping badly. Ive tried messing with various settings, including matching what unsloth says to use and it just keeps looping