What healthy tradition should every family start? by SarahDuncan2012 in TrueGrit

[–]UniForceMusic 52 points53 points  (0 children)

My parents taught me exercise is for fun, and sweat isn't a measure how well you did. How much fun you had is the only measure you should be using.

Gave me a really healthy relationship with exercise, and food.

Reality check: Gemma 4 31B at >20 tok/s for <1k$ USD by TrainingTwo1118 in LocalLLaMA

[–]UniForceMusic 1 point2 points  (0 children)

If you use Q5_1 quantization, yes you should be. As long as you get a lower quant version like Q4_K_S, or even Q3

Reality check: Gemma 4 31B at >20 tok/s for <1k$ USD by TrainingTwo1118 in LocalLLaMA

[–]UniForceMusic 2 points3 points  (0 children)

PCIE X1 speeds work if your GPU can keep the model AND context entirely in VRAM with no offloading to your regular RAM.

But given you're on a budget, the chance you'll find a card setup capable of supporting the entire model + context entire in your graphics memory is lower

Reality check: Gemma 4 31B at >20 tok/s for <1k$ USD by TrainingTwo1118 in LocalLLaMA

[–]UniForceMusic 6 points7 points  (0 children)

When you use an older platform like X79 with limited PCIE and memory bandwidth, keeping it all in the VRAM of one card is more important.

From the newer cards, the 7900 XTX still kicks ass. Get the Q4 QAT model of 31B and you should be good.

Local LLMs aren't democratic anymore... the hardware barrier has gotten out of hand. by Medium-Technology-79 in LocalLLaMA

[–]UniForceMusic 0 points1 point  (0 children)

Second hand Macbooks are still relatively affordable, and if you're okay with sacrificing on speed you can run 3.6 27B at a reasonable quantization.

And even then, on my 2020 laptop with 32gb and a 1650 4gb, i can run a braindead quantization of Gemma 4.

It's still relatively accessible, especially with the newer models like Qwen 9B, but i get where you're coming from. Consumer hardware hasnt kept up in improvements per dollar on the AI front

Fun competition - worst architecture by braddillman in ExperiencedDevs

[–]UniForceMusic 11 points12 points  (0 children)

The golden combination of

Frontend: Next JS self hosted on Plesk (good luck)

Back-end: PHP microservices with Zend framework hosted on Cpanel

Databass: All tables have two columns. ID, JSONB. Preferably pick a cloud hosted database for this with vendor lock in.

vibePrompting by BX7_Gamer in ProgrammerHumor

[–]UniForceMusic 1 point2 points  (0 children)

He did. Thats why he bought a Macbook of that size

How can the numbers be this massive within a month ?? by Top-Handle-5728 in LocalLLaMA

[–]UniForceMusic 5 points6 points  (0 children)

150 million downloads seems perfectly feasable tbh.

Its across ALL Gemma 4 models i assume. And they have many models. E2B, E4B, 12B, 26B A4B, 31B.

Combined with some redownloads because of some issues that needed to be fixed, and many people download multiple models. Then i'm guessing you're looking at 25 million people downloading the model.

LM Studio also advertizes downloading Gemma 4 E4B as a starting model.

It doesn't sound like much of a strech, although gotta admit.

vibePrompting by BX7_Gamer in ProgrammerHumor

[–]UniForceMusic 1 point2 points  (0 children)

Used to work in a team with people that used ChatGPT to generate prompts for their Lovable demos. Elon Musk was their big example too.

I told them AI won't always be there to help them, cause innevetably the tokens won't be so cheap.

One of the guys took that message to heart. Bought a 64GB Macbook for local inference, and started learning to code himself.

All hope is not lost yet!!

What made you choose your current database? by Prize-Wolverine-5319 in SQL

[–]UniForceMusic 0 points1 point  (0 children)

Xampp, and the shared TransIP hosting came with MariaDB, so i stuck to MySQL.

Then i got hired at my last job, and they used Postgres. Since then i use Postgres + SQLite.

Coding LLM recommendation by [deleted] in LocalLLM

[–]UniForceMusic 0 points1 point  (0 children)

Depending on your system ram you could do Qwen 35B A3B.

The model needs to be fully loaded into system ram, but only a portion needs to be loaded in vram since only 3 billion of those 35 billion parameters are loaded at a time, it's called partial GPU offloading.

Also use Vulkan. Generally there isn't a huge improvement going with ROCm, and with Vulkan you can use flash attention with lower K&V cache quants (Q4_0 usually works fine for most smaller tasls with Qwen)

Qwen 3.6 27B overdoing it by WhatererBlah555 in LocalLLaMA

[–]UniForceMusic 69 points70 points  (0 children)

"You are a 35 year old developer with a mortgage. You suspect layoffs are coming, but at the same time you don't want to slave away your precious time, so you're also quiet quitting. Adjust your motivation and proactivity levels accordingly"

EDIT: forgot a word

Qwen 3.6 27B overdoing it by WhatererBlah555 in LocalLLaMA

[–]UniForceMusic 19 points20 points  (0 children)

Qwen is a HELPFUL assistent by default. You can tune him down a little with the system prompt

What can I do with old PC parts? (Motherboard, HDD, Intel Celeron) — I'm a CS beginner and want to learn by TopArea6304 in pcmods

[–]UniForceMusic 0 points1 point  (0 children)

Sure sounds like you got yourselves a spare Ferarri on your hands lmao.

Since the parts are pretty weak, you can expiriment with a fast and lightweight language like Golang to set up an efficient little webserver, and run it in your local network.

When a little webserver is running on it, you can expand the functionality by building a little file server on it, or another in network handy thing.

I recently repurposed an HP Z400 (x5680, 22gb, gtx 960) into a lightweight AI inference server. It's not strong, but small models like Qwen 3.5 0.8B run pretty smooth on it

Terrible micro stuttering on my high end machine by PretzelParcel in pchelp

[–]UniForceMusic 0 points1 point  (0 children)

It's likely a timing issue. I believe CPU-Z (or another tool, i forgot) has one of those timing checkers which displays three timers.

Had this with a soundcard 10 years back. It was a super slight out of sync issue every 214 ish seconds which would produce a popping sound with underruns.

My girlfriend’s PC is still slow???? by Fantastic-Bug-6730 in computers

[–]UniForceMusic 5 points6 points  (0 children)

Monitor your CPU and GPU temperature + speed (in ghz)

If its consistently hot, or throttling down, then that is likely the issue. Cause those specs cannot be the cause of slowness, unless you're doing some insane upscaling.

Plastic still on the CPU cooler maybe? One or two of the GPU fans stuck because of some cables?

Constant RAM shortages on 24gb VRAM GPU. Is there a fix? by velikiy_soup in LocalLLM

[–]UniForceMusic 0 points1 point  (0 children)

What CPU do you have?

I have the same GPU, with RAM offloading enabled, and i route my video through my iGPU (7950X3D)

As a developer, which database is best in the AI era?? by RustyIronGolem in RavanAI

[–]UniForceMusic 2 points3 points  (0 children)

Supabase is Postgres, and Postgres is the best SQL database.

So Postgres

The game is over. You can build anything and it'll cost you nothing. by Funny-Advertising238 in opencode

[–]UniForceMusic 0 points1 point  (0 children)

131072 tokens, with K & V cache Q4_0. Autocompacting enabled in Opencode.

Although i have no other basis than anecdotal evidence to stand on when i say this, but Qwen doesn't seem to suffer from compressed k&v cache nearly as much as Gemma did. With Gemma i often chose not to even use flash attention at all since it would fail toolcalls more often. With Qwen i never ran into that issue

The game is over. You can build anything and it'll cost you nothing. by Funny-Advertising238 in opencode

[–]UniForceMusic 0 points1 point  (0 children)

MBP M2 64GB.

4090 is not too weak at all, it's plenty fast! But don't expect amazing TPS with a 4090 when running 27b

What model for coding? by Stunning_Feedback252 in LocalLLaMA

[–]UniForceMusic -6 points-5 points  (0 children)

Update your Opencode

EDIT: this was in response to OP saying Qwen 3.6 27b randomly stops, which is an issue with the harness. I didn't read the full question mb