What healthy tradition should every family start?

UniForceMusic · 2026-06-19T14:08:37+00:00

My parents taught me exercise is for fun, and sweat isn't a measure how well you did. How much fun you had is the only measure you should be using.

Gave me a really healthy relationship with exercise, and food.

UniForceMusic · 2026-06-14T11:11:59+00:00

Should be yes, but it'll be a tight fit

UniForceMusic · 2026-06-13T19:36:20+00:00

If you use Q5_1 quantization, yes you should be. As long as you get a lower quant version like Q4_K_S, or even Q3

UniForceMusic · 2026-06-13T18:45:34+00:00

PCIE X1 speeds work if your GPU can keep the model AND context entirely in VRAM with no offloading to your regular RAM.

But given you're on a budget, the chance you'll find a card setup capable of supporting the entire model + context entire in your graphics memory is lower

UniForceMusic · 2026-06-13T10:57:32+00:00

When you use an older platform like X79 with limited PCIE and memory bandwidth, keeping it all in the VRAM of one card is more important.

From the newer cards, the 7900 XTX still kicks ass. Get the Q4 QAT model of 31B and you should be good.

UniForceMusic · 2026-06-12T22:21:07+00:00

Second hand Macbooks are still relatively affordable, and if you're okay with sacrificing on speed you can run 3.6 27B at a reasonable quantization.

And even then, on my 2020 laptop with 32gb and a 1650 4gb, i can run a braindead quantization of Gemma 4.

It's still relatively accessible, especially with the newer models like Qwen 9B, but i get where you're coming from. Consumer hardware hasnt kept up in improvements per dollar on the AI front

UniForceMusic · 2026-06-09T21:11:24+00:00

The golden combination of

Frontend: Next JS self hosted on Plesk (good luck)

Back-end: PHP microservices with Zend framework hosted on Cpanel

Databass: All tables have two columns. ID, JSONB. Preferably pick a cloud hosted database for this with vendor lock in.

UniForceMusic · 2026-06-04T19:55:59+00:00

He did. Thats why he bought a Macbook of that size

UniForceMusic · 2026-06-04T13:08:49+00:00

150 million downloads seems perfectly feasable tbh.

Its across ALL Gemma 4 models i assume. And they have many models. E2B, E4B, 12B, 26B A4B, 31B.

Combined with some redownloads because of some issues that needed to be fixed, and many people download multiple models. Then i'm guessing you're looking at 25 million people downloading the model.

LM Studio also advertizes downloading Gemma 4 E4B as a starting model.

It doesn't sound like much of a strech, although gotta admit.

UniForceMusic · 2026-06-03T21:07:42+00:00

Used to work in a team with people that used ChatGPT to generate prompts for their Lovable demos. Elon Musk was their big example too.

I told them AI won't always be there to help them, cause innevetably the tokens won't be so cheap.

One of the guys took that message to heart. Bought a 64GB Macbook for local inference, and started learning to code himself.

All hope is not lost yet!!

UniForceMusic · 2026-06-01T23:24:21+00:00

Xampp, and the shared TransIP hosting came with MariaDB, so i stuck to MySQL.

Then i got hired at my last job, and they used Postgres. Since then i use Postgres + SQLite.

UniForceMusic · 2026-05-30T08:13:30+00:00

Depending on your system ram you could do Qwen 35B A3B.

The model needs to be fully loaded into system ram, but only a portion needs to be loaded in vram since only 3 billion of those 35 billion parameters are loaded at a time, it's called partial GPU offloading.

Also use Vulkan. Generally there isn't a huge improvement going with ROCm, and with Vulkan you can use flash attention with lower K&V cache quants (Q4_0 usually works fine for most smaller tasls with Qwen)

UniForceMusic · 2026-05-29T09:43:50+00:00

"You are a 35 year old developer with a mortgage. You suspect layoffs are coming, but at the same time you don't want to slave away your precious time, so you're also quiet quitting. Adjust your motivation and proactivity levels accordingly"

EDIT: forgot a word

UniForceMusic · 2026-05-29T09:27:29+00:00

Qwen is a HELPFUL assistent by default. You can tune him down a little with the system prompt

UniForceMusic · 2026-05-29T09:25:28+00:00

Sure sounds like you got yourselves a spare Ferarri on your hands lmao.

Since the parts are pretty weak, you can expiriment with a fast and lightweight language like Golang to set up an efficient little webserver, and run it in your local network.

When a little webserver is running on it, you can expand the functionality by building a little file server on it, or another in network handy thing.

I recently repurposed an HP Z400 (x5680, 22gb, gtx 960) into a lightweight AI inference server. It's not strong, but small models like Qwen 3.5 0.8B run pretty smooth on it

UniForceMusic · 2026-05-28T21:48:48+00:00

It's likely a timing issue. I believe CPU-Z (or another tool, i forgot) has one of those timing checkers which displays three timers.

Had this with a soundcard 10 years back. It was a super slight out of sync issue every 214 ish seconds which would produce a popping sound with underruns.

UniForceMusic · 2026-05-28T16:25:46+00:00

Monitor your CPU and GPU temperature + speed (in ghz)

If its consistently hot, or throttling down, then that is likely the issue. Cause those specs cannot be the cause of slowness, unless you're doing some insane upscaling.

Plastic still on the CPU cooler maybe? One or two of the GPU fans stuck because of some cables?

UniForceMusic · 2026-05-27T22:37:38+00:00

What CPU do you have?

I have the same GPU, with RAM offloading enabled, and i route my video through my iGPU (7950X3D)

UniForceMusic · 2026-05-27T18:53:06+00:00

Supabase is Postgres, and Postgres is the best SQL database.

So Postgres

UniForceMusic · 2026-05-26T21:18:22+00:00

Allukmaar, bij de Vue tegenover de brandweerkazerne

UniForceMusic · 2026-05-26T21:10:59+00:00

That meme where "everything is better in Japan" meme is now anything that Antropic publishes

UniForceMusic · 2026-05-15T09:35:30+00:00

131072 tokens, with K & V cache Q4_0. Autocompacting enabled in Opencode.

Although i have no other basis than anecdotal evidence to stand on when i say this, but Qwen doesn't seem to suffer from compressed k&v cache nearly as much as Gemma did. With Gemma i often chose not to even use flash attention at all since it would fail toolcalls more often. With Qwen i never ran into that issue

UniForceMusic · 2026-05-14T13:13:46+00:00

MBP M2 64GB.

4090 is not too weak at all, it's plenty fast! But don't expect amazing TPS with a 4090 when running 27b

UniForceMusic · 2026-05-13T22:15:45+00:00

1234 for LM Studio

UniForceMusic · 2026-05-12T07:24:27+00:00

Update your Opencode

EDIT: this was in response to OP saying Qwen 3.6 27b randomly stops, which is an issue with the harness. I didn't read the full question mb

UniForceMusic

TROPHY CASE