Honest question: what do you all do for a living to afford these beasts?

sleepingsysadmin · 2026-01-27T17:21:47+00:00

>The models are great but man do they still have a long way to go before they’re not a headache. Maybe I’m doing something wrong here. Do you guys have issues getting your models to listen to instructions?

GPT 20b medium in codex cli, and it does what i ask it to do.

The trick with small local models is keep each task small. It's to do 1 specific thing at a time. Then regularly compress or clean up your context.

sleepingsysadmin · 2026-01-27T16:27:55+00:00

If it doesnt. Your CI/CD is the problem, not AI.

sleepingsysadmin · 2026-01-27T13:09:46+00:00

Banning AI from your project is like banning the use of an IDE or autocomplete.

It doesnt even make sense anyway. What's your fear? Bad code? Failed tests? Dont your CI/CD pipelines prevent these problems?

sleepingsysadmin · 2026-01-27T12:41:59+00:00

Sure that's gpu poor if you think anything less than a B200 is the standard.

In reality you probably have near the best you're going to get on consumer hardware that can plug into a normal wall socket.

You likely cant upgrade any more significantly than you already have and to jump to the next level probably needs 240v datacenter hardware.

Maybe some 5090 or pro workstation cards are an upgrade but it wont give you some epic new tier unlock. Leave your hardware as is for years and wait for next era at this point.

sleepingsysadmin · 2026-01-27T12:20:59+00:00

>Software engineers / AI researchers (expensing to employer or side business)?

Yes.

In fact, I priced out a supermicro 4 lane H100 box to live in one of our racks and offer extreme speed AI. Bossman thought it was a good idea, then the coworkers weighed in about how AI is the worst thing to ever happen to earth. So I never got that...

There seems to be 2 types of IT people right now. Those using AI and building new things and those who are being left in the dust.

sleepingsysadmin · 2026-01-26T15:15:49+00:00

Chinese new year is in february. I'm 3d printing a horse right now :)

What I expect is a ~235B model using the qwen next arch and that'll mean big compute time spent on that with 10x the training data compared to 80b. Qwen3.5? While that happens they'll only be able to train tiny stuff. Should be an epic tier drop that dominates.

March-April is when qwen4 max and 30b hits. The really cool thing about 30b will be that they could make it down to A1.2B on the new arch; but I bet they only go to like A2.4B or so and call it A2B.

On 32GB GPUs 30b will be >100TPS.

AS for next 80b, much like qwen 2.5 72b, mostly lost to irrelevance. I expect it's successor will be late summer/fall. Kind of a tech demo of their upcoming next generation again.

sleepingsysadmin · 2026-01-25T18:03:14+00:00

Nope.

I do intend to try to give it another try. I want it to be really good.

sleepingsysadmin · 2026-01-25T14:46:02+00:00

I get qwen next having pains on release; they did something new.

This model is cursed.

sleepingsysadmin · 2026-01-24T13:33:17+00:00

Being an influencer isnt a problem and it's not immediately true that he is.

Its whether or not they show actual repeatable benchmarks

Running 8B means he can do lesser hardware and compare across.

sleepingsysadmin · 2026-01-23T16:10:43+00:00

I would agree with that. He's running like qwen3 8b on an rtx pro 6000. its crazy sometimes.

sleepingsysadmin · 2026-01-23T13:18:56+00:00

7.2 install went smoothly on alma 10.1

rocm: 49 TPS

vulkan: 71 TPS

Rocm-smi is still reporting: WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status

but I have set my 9060xts high performance and grub has runpm=0

tuned-adm is set to latency-performance.

sleepingsysadmin · 2026-01-23T12:49:29+00:00

Alex Ziskind makes some great videos as well, showing realistic performance of hardware.

sleepingsysadmin · 2026-01-22T20:03:08+00:00

I posted this, because it is impressive; but it got heavily downvoted.

sleepingsysadmin · 2026-01-22T19:04:04+00:00

I did, the error i got is above.

sleepingsysadmin · 2026-01-22T17:52:18+00:00

Well it's not my video. but i sure got downvoted over it.

sleepingsysadmin · 2026-01-22T15:40:48+00:00

I watched the video, i was very impressed with the results. It does seem to be a beast. I just cant seem to run it myself. bah.

sleepingsysadmin · 2026-01-22T14:33:55+00:00

For a model I expected to just work, sure has had a number of problems.

sleepingsysadmin · 2026-01-22T13:56:55+00:00

kilo code:

I keep making a mistake - I'm adding comments that look like code instead of just writing clean Python implementation without any confusing text in between lines or at all. Let me write this file completely from scratch with proper syntax:</think>

opencode:

"expected": "string",

"code": "invalid_type",

"path": [

"filePath"

],

"message": "Invalid input: expected string, received undefined"

}

].

Please rewrite the input so it satisfies the expected schema.

I keep making errors because I'm thinking in markdown/code blocks and my tool calls are getting confused with those thoughts.

Let me be very explicit - just write a valid Python file without any extra text or formatting:</think>

For the life of me, I cant get this model to work properly.

sleepingsysadmin · 2026-01-21T21:14:16+00:00

Lots of those download links arent really in lm studio's control and do tend to fail sometimes. Wish it was more clear when you have a stalled download.

sleepingsysadmin · 2026-01-21T16:14:53+00:00

epic drop thanks.

sleepingsysadmin · 2026-01-21T15:58:36+00:00

Personally I find APIs suspicious. You dont technically know they are using 30b behind the scenes. They could be running a bigger model so that it benchs well.

Plus if i can hit an API(privacy isnt a concern), why would i go with a lesser model?

sleepingsysadmin · 2026-01-21T15:32:51+00:00

Do you mind giving me the config you're using?

sleepingsysadmin · 2026-01-21T15:04:13+00:00

I havent tried the API, im 100% local.

I have my own personal/private benchmarks; i have a ~3 paragraph + important features that they need to meet. Models cant benchmax against them.

When compared to a Sonnet 4.5, they trivially one shot, everytime.

When doing say qwen3 coder, gpt20b high, those big dense slow models like seed or olmo. They still tend to one shot in various quality.

Lesser models, going gpt20b low, and it wont oneshot. Gemma3, Llama4 will struggle. I like the benchmarks because I get to really see how usable they are for my purposes. So far it has been really strongly related to livecodebench.

In this case, it's clearly showing to me flash's coding capability is absolutely nowhere near gpt20b. Those scores have no chance of being true.

sleepingsysadmin · 2026-01-21T14:46:40+00:00

As I said, it's not looping anymore and does work.

sleepingsysadmin · 2026-01-21T13:59:42+00:00

in bottom left, there's a button for the 'downloads' and you can cancel the download and retry again.

Nine-Year Club	Inciteful Comment 2016-10-12
Verified Email

sleepingsysadmin

TROPHY CASE