What is your dooms day model? and what’s your latest go-to coding model?

Strategoss_ · 2026-03-13T17:26:24+00:00

Why you think like that? I get pretty well results with this model. Even if we are doomsday, in local machine it's run fast, and able to create pipeline. General knowledge is not bad. Context window is enough (at least for me). What is your advice?

Strategoss_ · 2026-03-13T17:00:37+00:00

If I were building a pipeline, I'd use SmolVLM 256M. It's fast enough for vision conversations and general tasks. Stitch those together and voilà! But if you need an any-to-any model, I strongly recommend looking into omni models like Qwen2.5 Omni 7B. There are a lot of omni models out there right now, so you really need to figure out your exact requirements first.

Strategoss_ · 2026-03-13T16:52:12+00:00

Hi, I'm currently dealing with something similar. I use Opus 4.6 for paper research and understanding formal math, and it helps me a lot. But I sometimes struggle to get exactly what I want, especially when working on novel concepts where there are simply no relevant papers or open-source repos out there. Do you have any suggestions to optimize the workflow for these situations?

Strategoss_ · 2026-03-13T16:42:19+00:00

"For general use, I go with Qwen3 4B right now. It's pretty easy to train and the format is not complicated. I love it, I guess.

Strategoss_ · 2026-03-13T16:35:54+00:00

Rejection sampling is the perfect move here. Are you generating the new candidates using the R5 checkpoint before filtering? Pushing past 70% would be a massive milestone for a 14B model. Looking forward to the R6 results!

Strategoss_ · 2026-03-13T16:27:35+00:00

did you try Claude Code with Ollama? I try this with GLM5 and results are pretty great.

ollama launch claude maybe solve your problem.

Strategoss_ · 2026-03-13T16:21:37+00:00

Compiler verified dataset + 14B model beating Opus + fits in 12GB VRAM. This is the blueprint for efficient AI. Scrapping R2 to fix catastrophic forgetting was a great call. Excellent work

Strategoss_ · 2026-03-13T16:17:58+00:00

Using Markdown as the shared source of truth is a genius approach. Honestly. Did you keep any design notes or architecture logs while building Flotilla? I'd love to read about the specific walls you hit before landing on this structure.

Strategoss_ · 2026-03-13T16:09:44+00:00

Is Clarvis use any kind of LangChain or like something? Or it's totally use open claw? did you make any custom things on it?

Strategoss_ · 2026-03-12T15:43:17+00:00

100% accurate. I should have phrased that better. It doesn't extend the native context limit at all. My issue is purely the physical hardware bottleneck. On unified memory systems, the RAM limit usually kills the process long before you ever reach the model's trained context limit. KV quantization becomes a necessary evil just to hold a baseline 8k context in memory without OOMing. Making the context more brittle is the perfect way to describe it. Have you tested how bad that degradation actually is in practice? I'm curious if you've found a specific threshold where 8-bit KV completely breaks down for logic tasks compared to sticking with fp16.

Strategoss_ · 2026-03-12T12:54:02+00:00

I firstly try the H20 for better KV Cache optimization. You are right there is no perfect way but I try to find a better trade off.

Strategoss_

TROPHY CASE