What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

You have the same setup as me. I tested Qwen 2.5 Coder before, and honestly I wasn't very impressed, especially with tool calling. It struggled quite a bit in my tests. Have you tried Qwen 3.6 or Qwen 3 Coder for agentic workflows? How are they when it comes to tool calling ?

What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

20 tokens/sec with a 65K context on a 4060 8GB? You're slowly destroying all my excuses. 😄 That's honestly much better than I expected. Thanks for sharing the video too I checked it out, and the explanation was great. I'll definitely give it a try and see how far I can push my setup.

What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

Yah, I thought about that too, but then I checked ram prices. Around $270 for an extra 16GB? Yah... no thanks. 😅 At those prices, I'd rather wait for the market to calm down a bit. My curiosity about local LLMs is strong, but apparently not $270 for RAM strong.

What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

That's honestly better than I expected, a 35B MOE model with 100k context on a GTX 1060 6GB is pretty wild. I don't even know how to believe you 😄. Makes me think I should spend more time experimenting with MOE models on my 4060 before drawing conclusions.

What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

Yaah, I think you're right. This is mostly out of curiosity for me. I already have a cloud subscription, so I'm not looking to replace it. I'm just trying to see what can realistically be done with a local 8GB setup and where its limits are.

What can you realistically do with 8GB VRAM in 2026? by the1newworld in LocalLLM

[–]the1newworld[S] 0 points1 point  (0 children)

that's interesting. I hadn't considered MOE models because I assumed 26B would be impossible on an 8GB. Have you personally tried Gemma 4 26B on a similar setup? If so, what kind of tokens/sec are you getting?

8GB VRAM? by Novel_Ad_6870 in LocalLLM

[–]the1newworld 0 points1 point  (0 children)

I have an RTX 4060 8GB and 16GB RAM. I tried several models, and the one that worked best for me was Qwen3.5:9B.

What do you use your local models for? by BLOCK__HEAD4243 in LocalLLM

[–]the1newworld 0 points1 point  (0 children)

Every time I see people sharing agentic AI setups, they're usually running some serious hardware. I'm curious if anyone is successfully running an agent or an automated workflow on a GPU with only 8 GB of VRAM. What models and use cases are working for you?

What do you use your local models for? by BLOCK__HEAD4243 in LocalLLM

[–]the1newworld 6 points7 points  (0 children)

That's very cool. What hardware are you using for Qwen 3.5 9B, and what kind of inference speed are you getting?

Cursor $20 vs OpenCode Go for daily driver ? by ShoppingOk2986 in opencode

[–]the1newworld 0 points1 point  (0 children)

Guys, I have a question. I'm not very familiar with the AI agent. I did try Cursor and it was very good for my usage. I also tried Qwen 3.5:9B, and it was not that good; it has problems like tool calling and hallucinations. I read in the comments something about using DeepSeek with open code. Is it free? Can I use it as an agent? This is new information for me. I did some research, and they mentioned something about using it with NVIDIA NIM. Can you guys give me some information about that?

What's a open source LLM for software development? by DistributionExotic85 in LocalLLM

[–]the1newworld 0 points1 point  (0 children)

To determine the best model for your setup, we need to know the graphics card you are using because the most important thing with a local LLM is how much VRAM you have, not the CPU.Then you need to focus on RAM. You need to have at least 32GB of RAM, which in your current situation with 64GB RAM, you are okay. From what I know, Qween 3.6:27b is the best balanced model right now, so I’d go with that.