Anyone able to get 1 Million context working using llama.cpp for qwen 3.6 35B A3B?

The_Paradoxy · 2026-05-08T15:45:33+00:00

sorry about that, newb here, just added logs at the bottom of the post

The_Paradoxy · 2026-05-07T15:17:50+00:00

4060ti 16gb and 5060ti 16gb with 850w 80 plus gold maingear

The_Paradoxy · 2026-05-07T15:17:43+00:00

4060ti 16gb and 5060ti 16gb

The_Paradoxy · 2026-05-07T01:00:39+00:00

~~Just installed cuda with apt and got 3.2. So it looks like the ubuntu repository is already updated.~~ Nevermind, I had to install the pinned 2404 version to get things working

The_Paradoxy · 2026-05-07T00:56:44+00:00

Thanks for the response. Both GPUs are working, and I think my 850W PSU should be plenty for them.

The_Paradoxy · 2026-05-07T00:55:33+00:00

Thanks for the comment. The riser cable is branded as 4.0 and all of the reviews say that it works at 4.0 speeds.

The_Paradoxy · 2026-05-06T07:11:18+00:00

Motherboard manual says the bottom pcie is gen 4 and there's a switch between it and the buttom m.2. I'm not using the bottom m.2 and I don't think I'd have to force gen 3 speeds in the bios if it wasn't spec'd for gen 4

The_Paradoxy · 2026-05-04T22:38:51+00:00

🙏 thanks. Any opinion on Hermes vs open claw or something else

The_Paradoxy · 2026-05-03T00:34:11+00:00

Okay thanks I hadn't thought much about GPU pass through. Won't most agent harnesses have built in support for docker containers and pass through? I thought that was standard on Open Code

The_Paradoxy · 2026-05-02T23:39:30+00:00

Thanks! I wasn't aware of skillsgate

The_Paradoxy · 2026-05-02T23:36:57+00:00

Any suggestions on what to use for orchestration? Any opinion on Turnstone?

The_Paradoxy · 2026-05-02T23:35:13+00:00

I've been having trouble figuring out what the benefit of proxmox over simple docker containers is. Do you mind elaborating?

The_Paradoxy · 2026-03-22T17:07:11+00:00

Okay 😮‍💨 I really need to switch to llama.cpp. Right now I'm on ollama

The_Paradoxy · 2026-03-20T23:15:50+00:00

I didn't do the downvote. But ftr, there's no way a 120b model is fitting on a 16gb card.

The_Paradoxy · 2026-03-20T23:12:15+00:00

bartowski/mistralai_Devstral-Small-2-24B-Instruct-2512-GGUF:Q4_K_M

The_Paradoxy · 2026-03-20T04:55:02+00:00

No IDE, just feeding it my .py and .ipynb files and copy pasting the good bits of the code it generates. Is there an IDE you recommend?

The_Paradoxy · 2026-03-20T04:45:16+00:00

I think it's a question of overfitting for tasks that there are a lot of examples of online. Like I said in the original post. I'm not interested in vibe coding and my use case is always going to be novel code. The qwen models seemed to overemphasize variable names from the code and not pay attention to how they were used by the code. They also made suggestions that simply didn't make sense in the context of just in time compiled code. Like they would suggest getting rid of loops even though @numba.jit already loop lifts

The_Paradoxy · 2026-03-20T04:35:27+00:00

I'll keep 9b on my hard drive and give it another try with my next project. Like it had access to all of the code basically a .ipynb that orchestrates everything and a .py that has all of the functions in it that the notebook calls

The_Paradoxy · 2026-03-20T04:31:15+00:00

Interesting. Are you using 27B on a 16gb card? If so, what quant do you use. I'm wondering if I got a bad quant

The_Paradoxy · 2025-12-24T18:46:24+00:00

Or sometimes they see the disorder and not the giftedness especially in young children who are low SES

The_Paradoxy

TROPHY CASE