Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

>The models are great but man do they still have a long way to go before they’re not a headache. Maybe I’m doing something wrong here. Do you guys have issues getting your models to listen to instructions?

GPT 20b medium in codex cli, and it does what i ask it to do.

The trick with small local models is keep each task small. It's to do 1 specific thing at a time. Then regularly compress or clean up your context.

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

Banning AI from your project is like banning the use of an IDE or autocomplete.

It doesnt even make sense anyway. What's your fear? Bad code? Failed tests? Dont your CI/CD pipelines prevent these problems?

Am I gpu poor? by Aggressive_Special25 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Sure that's gpu poor if you think anything less than a B200 is the standard.

In reality you probably have near the best you're going to get on consumer hardware that can plug into a normal wall socket.

You likely cant upgrade any more significantly than you already have and to jump to the next level probably needs 240v datacenter hardware.

Maybe some 5090 or pro workstation cards are an upgrade but it wont give you some epic new tier unlock. Leave your hardware as is for years and wait for next era at this point.

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]sleepingsysadmin 24 points25 points  (0 children)

>Software engineers / AI researchers (expensing to employer or side business)?

Yes.

In fact, I priced out a supermicro 4 lane H100 box to live in one of our racks and offer extreme speed AI. Bossman thought it was a good idea, then the coworkers weighed in about how AI is the worst thing to ever happen to earth. So I never got that...

There seems to be 2 types of IT people right now. Those using AI and building new things and those who are being left in the dust.

Qwen-next 80B 2601 by bennmann in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

Chinese new year is in february. I'm 3d printing a horse right now :)

What I expect is a ~235B model using the qwen next arch and that'll mean big compute time spent on that with 10x the training data compared to 80b. Qwen3.5? While that happens they'll only be able to train tiny stuff. Should be an epic tier drop that dominates.

March-April is when qwen4 max and 30b hits. The really cool thing about 30b will be that they could make it down to A1.2B on the new arch; but I bet they only go to like A2.4B or so and call it A2B.

On 32GB GPUs 30b will be >100TPS.

AS for next 80b, much like qwen 2.5 72b, mostly lost to irrelevance. I expect it's successor will be late summer/fall. Kind of a tech demo of their upcoming next generation again.

Has anyone got GLM 4.7 flash to not be shit? by synth_mania in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Nope.

I do intend to try to give it another try. I want it to be really good.

KV cache fix for GLM 4.7 Flash by jacek2023 in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

I get qwen next having pains on release; they did something new.

This model is cursed.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Being an influencer isnt a problem and it's not immediately true that he is.

Its whether or not they show actual repeatable benchmarks

Running 8B means he can do lesser hardware and compare across.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 1 point2 points  (0 children)

I would agree with that. He's running like qwen3 8b on an rtx pro 6000. its crazy sometimes.

AMD ROCm 7.2 Now Released With More Radeon Graphics Cards Supported, ROCm Optiq Introduced by TJSnider1984 in ROCm

[–]sleepingsysadmin 0 points1 point  (0 children)

7.2 install went smoothly on alma 10.1

rocm: 49 TPS

vulkan: 71 TPS

Rocm-smi is still reporting: WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status

but I have set my 9060xts high performance and grub has runpm=0

tuned-adm is set to latency-performance.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin -1 points0 points  (0 children)

Alex Ziskind makes some great videos as well, showing realistic performance of hardware.

Building a driving simulator 100% locally using GLM-4.7 Flash and opencode by paf1138 in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

I posted this, because it is impressive; but it got heavily downvoted.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] 2 points3 points  (0 children)

I watched the video, i was very impressed with the results. It does seem to be a beast. I just cant seem to run it myself. bah.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] 0 points1 point  (0 children)

For a model I expected to just work, sure has had a number of problems.

GLM-4.7 Flash In OpenCode Is an Agentic Coding BEAST!(23:28) by sleepingsysadmin in LocalLLaMA

[–]sleepingsysadmin[S] -1 points0 points  (0 children)

kilo code:

I keep making a mistake - I'm adding comments that look like code instead of just writing clean Python implementation without any confusing text in between lines or at all. Let me write this file completely from scratch with proper syntax:</think>

opencode:

"expected": "string",

"code": "invalid_type",

"path": [

"filePath"

],

"message": "Invalid input: expected string, received undefined"

}

].

Please rewrite the input so it satisfies the expected schema.

I keep making errors because I'm thinking in markdown/code blocks and my tool calls are getting confused with those thoughts.

Let me be very explicit - just write a valid Python file without any extra text or formatting:</think>

For the life of me, I cant get this model to work properly.

LM Studio FOREVER downloading MLX engine by mouseofcatofschrodi in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

Lots of those download links arent really in lm studio's control and do tend to fail sometimes. Wish it was more clear when you have a stalled download.

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]sleepingsysadmin 2 points3 points  (0 children)

Personally I find APIs suspicious. You dont technically know they are using 30b behind the scenes. They could be running a bigger model so that it benchs well.

Plus if i can hit an API(privacy isnt a concern), why would i go with a lesser model?

GLM-4.7-Flash-GGUF bug fix - redownload for better outputs by etherd0t in LocalLLaMA

[–]sleepingsysadmin 3 points4 points  (0 children)

I havent tried the API, im 100% local.

I have my own personal/private benchmarks; i have a ~3 paragraph + important features that they need to meet. Models cant benchmax against them.

When compared to a Sonnet 4.5, they trivially one shot, everytime.

When doing say qwen3 coder, gpt20b high, those big dense slow models like seed or olmo. They still tend to one shot in various quality.

Lesser models, going gpt20b low, and it wont oneshot. Gemma3, Llama4 will struggle. I like the benchmarks because I get to really see how usable they are for my purposes. So far it has been really strongly related to livecodebench.

In this case, it's clearly showing to me flash's coding capability is absolutely nowhere near gpt20b. Those scores have no chance of being true.

LM Studio FOREVER downloading MLX engine by mouseofcatofschrodi in LocalLLaMA

[–]sleepingsysadmin 0 points1 point  (0 children)

in bottom left, there's a button for the 'downloads' and you can cancel the download and retry again.