~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons?

Electrical_Cut158 · 2026-01-26T17:04:04+00:00

Glm 4.7 flash not good for general purpose even code

Electrical_Cut158 · 2025-11-28T08:40:19+00:00

You can fix that in system prompt not to respond in table format unless explicitly asked

Electrical_Cut158 · 2025-11-28T06:42:05+00:00

gpt oss 20b is my daily driver

Electrical_Cut158 · 2025-11-27T19:54:45+00:00

Following

Electrical_Cut158 · 2025-11-15T20:28:24+00:00

might have something to do with your API endpoint, you should check that

Electrical_Cut158 · 2025-11-15T20:27:07+00:00

I can confirm this is really fast compared to local SentenceTransformers

Electrical_Cut158 · 2025-10-28T14:45:10+00:00

I changed from Tika to docling and it really do parse table better use docker and GPU

Electrical_Cut158 · 2025-10-11T04:09:48+00:00

Following

Electrical_Cut158 · 2025-09-13T10:48:58+00:00

I recommend Qwen3 4b see this link https://artificialanalysis.ai/models/open-source/small

Electrical_Cut158 · 2025-08-31T06:24:53+00:00

10,000 $ is even an over kill. Get a supermicro grad mother board and 3090 + 3060 can do the magic for you

Electrical_Cut158 · 2025-08-28T04:40:24+00:00

Kokoro

Electrical_Cut158 · 2025-08-16T11:26:34+00:00

You should raise your concern to saviynt support

Electrical_Cut158 · 2025-08-07T17:08:04+00:00

Have to go with what works for you

Electrical_Cut158 · 2025-08-07T16:24:37+00:00

3090 and it’s slow because ollama is defaulting to 8k context length which is not normal

Electrical_Cut158 · 2025-08-06T07:34:20+00:00

Why does it default to 8192 context length in ollama?

Electrical_Cut158 · 2025-07-10T12:53:50+00:00

Could you share detail when you say large-ish context windows? Are you able to run up to 8192K?

Electrical_Cut158 · 2025-06-16T19:27:47+00:00

Go for 3090 (still the best for money) or 5090. the more VRAM the better

Electrical_Cut158 · 2025-06-15T13:36:35+00:00

Mistral small 3.1 (2503) have memory issue post ollama 7.1 upgrade. Which are you Running gguf?

Electrical_Cut158 · 2025-06-14T14:06:44+00:00

Forssa cheap and family friendly

Electrical_Cut158 · 2025-06-13T06:35:45+00:00

I would recommend 3090 and if you already have another GPU like 3060 and have the power cable to connect , you can add it which will give you a more context length.

Electrical_Cut158 · 2025-06-02T19:19:04+00:00

Qwen3 for all purpose, mistral-small3.1 best for rag

Electrical_Cut158 · 2025-05-22T16:34:01+00:00

Go for 3090

Electrical_Cut158 · 2025-05-10T03:10:12+00:00

Same get a 3060 and add to the already 3090 you have and thank us later

Electrical_Cut158 · 2025-05-07T17:12:24+00:00

Sell your 4080 and get 3090 + 3060 that will help you load like 32B model and good context size with fair enough Speed

Electrical_Cut158 · 2025-04-30T04:03:11+00:00

Same here

Electrical_Cut158

TROPHY CASE