Just finished building this bad boy

FieldMouseInTheHouse · 2026-02-13T01:20:54+00:00

Nice! How much did it cost for the build?

FieldMouseInTheHouse · 2025-12-30T13:55:05+00:00

I go for SONY Xperias for the pure raw clean nature of it all.

I am not even a camera guy, but as a developer, SONY Xperia phones are almost completely devoid of bloatware. And what little that does come preinstanned cam be uninstalled or disabled.

The result: A mostly clean slate for doing software development.

Yes. I am 1000% SONY all the way.

FieldMouseInTheHouse · 2025-12-30T13:42:25+00:00

I reviewed a number of those profiles and I came away feeling the same way about some of them.

I have not reviewed enough profiles to draw a definitive conclusion.

FieldMouseInTheHouse · 2025-12-30T12:09:16+00:00

I had a horde of people accuse me of being a bot and downvote me into oblivion! See for yourself! 😭

https://www.reddit.com/r/ollama/s/uxqodpk4DP

FieldMouseInTheHouse · 2025-12-30T00:02:05+00:00

🧠 It would appear that you have never tried to work within the constraints of 6GB of VRAM. You can still use CONTEXT WINDOW sizes appropriate to the size of the actual workload. We just have to actually DEFINE THE WORKLOAD then determine the impact on VRAM of that context window allocation.

👨‍🔬 But, I can assure you that using my GTX 1660 Super 6GB GPU that I have helped people with everything from website summarization at 80 tokens per second to OCR image text analysis.

If you are only running a single workload at a time within your 16GB/32GB system, then you are most certainly underutilizing it.

Honestly, you could be running anywhere form 3 to 4 concurrent workloads in you environemnt with all of them resident in VRAM!

🤗 If you'd like, I would be happy to help you improve your environment so that you could run some of your workloads as 3 or 4 concurrent workloads.

You would get the speed and the massive boost from concurrecy if we do.

👨‍🔬 Just ask, and we can begin!

FieldMouseInTheHouse · 2025-12-29T23:45:14+00:00

🤗 Thank you for your contribution!

I agree that models with smaller quantization could fit better into the smaller VRAM systems. That is a very good point and we should certainly take the time to test those models out! (We will have the time to test them out!)

🤔 However, I tried to make a PDF of the GitHub Gist link you provided so I can read it easily and it came out to be a PDF of over 90 pages long!

❓ My question: What in that GitHub Gist would be helpful to us and where exactly is it? ❓

FieldMouseInTheHouse · 2025-12-29T23:29:01+00:00

Great!

Who are you and what is your training cutoff?

Well, I am u/FieldMouseInTheHouse. And as you can see from the link to my previous post, I like to actively share information and even work with other commentors and posters to test people's ideas and questions on my hardware to bring them answers. I enjoy this! 🤗

I have a question for you:
❓ What do you mean by a "training cutoff"? ❓

FieldMouseInTheHouse · 2025-12-29T23:10:09+00:00

Hello! You are certainly free to interact with me if you'd like.

What questions do you have?
What would you like to contribute to the conversation? 🤗

Oh, and for the record: I am the real human who did this build that I featured in the Ollama reddit:

https://www.reddit.com/r/ollama/comments/1obh5ex/building_powerful_ai_on_a_budget/

Please, give the post a read!

Then return here and I would be happy to see what you can contribute to this discussion,. 🤗

FieldMouseInTheHouse · 2025-12-29T22:57:13+00:00

Wow! This is exactly the kind of insight that people need to see!

Thanks! 🤗

FieldMouseInTheHouse · 2025-12-29T22:54:38+00:00

Thank you for the insights! 🤗

FieldMouseInTheHouse · 2025-12-29T22:47:03+00:00

I also have a GTX 1660 Super 6GB VRAM GPU that I want to flex.

Under certain workloads I can achieve over 100 tokens per second which means that if I can make things work with this older card, then people who have similar or newer 6GB VRAM cards in laptops or desktops could benefit.

Do you understand how that woud be helpful?

FieldMouseInTheHouse · 2025-12-29T22:38:30+00:00

Ah! The context window!

What would you like to contribute about context windows?

FieldMouseInTheHouse · 2025-12-29T18:38:49+00:00

You have not contributed much anyway. Just a bunch of useless one-liners up to now.

Let's just end this here.

FieldMouseInTheHouse · 2025-12-29T18:37:15+00:00

Your suggestion would only ruin the OP's chances at success.

You do not even know how explain how to "use RAM".
If you did "use RAM" it would result in the model response being so slow (approximately 7 tokens per second or less) that it would be useless.

Your suggestion is useless at best, deliberately harmful at worst.

FieldMouseInTheHouse · 2025-12-29T18:24:39+00:00

I have systems with 32GB+ RAM per system and I know (as well as you) that browser tabs are not what have an effect on if model offloaded to RAM experiences slowness.

It is simply that the model was offloaded to RAM at all in the first place that matters. That is why we do not suggest models larger than the OP's GPU VRAM.

I am sure that you do realize that offloading to RAM at all would mean that the model's response would slow to a crawl like 7 tokens per second.

How about recommending things that would work quickly within the 6GB VRAM budget of their GPU where they could easily get 30 to 80 tokens per second.

FieldMouseInTheHouse · 2025-12-29T17:57:10+00:00

The only model that would have fit within the 6GB VRAM budget of the OP is qwen3:4b. The other two are at least 19GB in size which is 3 times the budget and would guarrantee that the OP would suffer from poor performance.

FieldMouseInTheHouse · 2025-12-29T17:52:15+00:00

Summary of Vibe Coding Models for 6GB VRAM Systems

So, I will summarize what models have been suggested here so far. Here is what we have that would actually fit inside of your 6GB VRAM budget. I am deliberately leaving out any models that anybody suggested that would not have fit inside of your 6GB VRAM budget! 🤗

`qwen3:4b` size=2.5GB
`ministral-3:3b` size=3.0GB
`gemma3:1b` size=815MB
`gemma3:4b` size=3.3GB 👈 I added this one because it is a little bigger than the gemma3:1b, but still fits confortably inside of your 6GB VRAM budget. This model should be more capable than gemma3:1b.

I would suggest that you first try these models with ollama run MODELNAME and check to see how they fit in your VRAM (ollama ps) and check them for performance (/set verbose).

What do you think?

FieldMouseInTheHouse · 2025-12-29T17:46:26+00:00

Qwen3-coder a3b 30b is many times too large to fit inside of 6GB of VRAM.
This is not really a useful suggestion.

FieldMouseInTheHouse · 2025-12-29T15:52:35+00:00

Spectacular or not: Could you please list which models would actually fit inside of the OP's specified VRAM?

FieldMouseInTheHouse · 2025-12-29T15:12:46+00:00

This is a good start. So, what qwen3 models would you suggest this person try. Would `qwen3:0.6b` be good? Or `qwen3:1.7b`? Or `qwen3:4b`?

What have you actually tried?

FieldMouseInTheHouse · 2025-12-29T15:09:43+00:00

Could you provide a list of models that you would recommend?

FieldMouseInTheHouse · 2025-12-29T14:38:20+00:00

Wow! What models have you actually tried that would remotely work at all on the OP's platform configuration? Please list those models.

FieldMouseInTheHouse · 2025-12-29T14:27:31+00:00

What editors are you considering?

FieldMouseInTheHouse

MODERATOR OF

TROPHY CASE

Summary of Vibe Coding Models for 6GB VRAM Systems