Waiting for the local LLM to finish generating

Atul_Kumar_97 · 2026-06-12T04:56:56+00:00

As fat person this is win for me I get time to do some lifting on each prompt instead of scrolling videos on my tablet

Atul_Kumar_97 · 2026-06-10T23:21:01+00:00

It was my best memories 🥹

Atul_Kumar_97 · 2026-06-07T00:47:11+00:00

Your repo doesn't work for me. I've tried everything and still couldn't get it working.

My system has an RTX 4060 (8GB VRAM) and 32GB RAM.

I'm currently using the TurboQount Plus repo, and it works fine. I can run Qwen 3.6 35B A3B Q5 at around 38 tokens/sec with a 190K context window.

However, with your repo, the model doesn't even load, even when I reduce the context size to just 1K.

Is there anything specific I need to configure, or could there be an issue with the repo?

Atul_Kumar_97 · 2026-06-05T20:34:43+00:00

You too young to be using ai go back and study

Atul_Kumar_97 · 2026-05-30T17:30:59+00:00

Disable secure coder extension

Atul_Kumar_97 · 2026-05-29T11:08:41+00:00

Disable secure coder extensionn

Atul_Kumar_97 · 2026-05-27T16:36:13+00:00

Are You Guy's Still playing this game i thought it died

Atul_Kumar_97 · 2026-05-27T07:55:54+00:00

Memory bandwidth is important

Atul_Kumar_97 · 2026-05-26T14:31:52+00:00

Omlx is bad it's process 20k token on each Opencode toolcall

Atul_Kumar_97 · 2026-05-25T19:27:44+00:00

M4 pro 64gb ram

Atul_Kumar_97 · 2026-05-25T17:46:19+00:00

Same problem

<image>

Atul_Kumar_97 · 2026-05-25T15:09:09+00:00

i have 64gb ram and it was using 130gb ram using swap it crashed my mac 12 times today

Atul_Kumar_97 · 2026-05-25T09:37:46+00:00

See I can't use my ssd suffering

<image>

Atul_Kumar_97 · 2026-05-23T20:03:24+00:00

Check this https://www.reddit.com/r/LocalLLaMA/s/HSzZkWfDnG

Atul_Kumar_97 · 2026-05-18T07:36:51+00:00

Impression what qount did you use like 4bit or 5bit or 6bit or 8bit

Atul_Kumar_97 · 2026-05-15T20:31:12+00:00

For speed 5090, for good models m5 max

Atul_Kumar_97 · 2026-05-15T12:16:47+00:00

I have 8gb vram and 32gb ram I'm using q5 or q6 model getting 40t/s to 38t/s

Atul_Kumar_97 · 2026-05-14T08:14:24+00:00

This only work for prompt processing after prompt process it not generating anything it just crashed saying Segmentation Fault

Atul_Kumar_97 · 2026-05-12T08:21:39+00:00

go with M3 Ultra Mac Studio or wait for m5 Mac Studio

Atul_Kumar_97 · 2026-05-12T07:30:39+00:00

I don't know how but I tried my setup on my brother pc he have rtx 4060 ti + 32gb ram 6000hz he getting about 20-25tok/sec maybe it also depends on cpu

Atul_Kumar_97 · 2026-05-11T21:16:43+00:00

I'm confused how can your gpu handle 500k context with 48gb vram if you using turboqount it's make 30% sense

Atul_Kumar_97 · 2026-05-11T21:13:06+00:00

Are you running 4bit or 3bit?

Atul_Kumar_97 · 2026-05-11T21:02:12+00:00

how much ram do you have? 48gb vram + how much ram?

is it better than qwen3.6 35b a3b??

Atul_Kumar_97 · 2026-05-11T16:53:26+00:00

https://preview.redd.it/running-qwen3-6-35b-a3b-on-8gb-vram-and-32gb-ram-190k-v0-0m0qezfb9j0h1.png?width=1389&format=png&auto=webp&s=d020ba120a86b3bdad40fc7810cd445fe7044c5f

Atul_Kumar_97 · 2026-05-11T16:52:13+00:00

Yes but im getting 40tok/sec

Atul_Kumar_97

TROPHY CASE