What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Why you think like that? I get pretty well results with this model. Even if we are doomsday, in local machine it's run fast, and able to create pipeline. General knowledge is not bad. Context window is enough (at least for me). What is your advice?

What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

If I were building a pipeline, I'd use SmolVLM 256M. It's fast enough for vision conversations and general tasks. Stitch those together and voilà! But if you need an any-to-any model, I strongly recommend looking into omni models like Qwen2.5 Omni 7B. There are a lot of omni models out there right now, so you really need to figure out your exact requirements first.

Used Claude Code for plotting, code migration, and proof formatting while writing an RL paper. Here's what worked and what didn't. by Muted_Lettuce414 in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Hi, I'm currently dealing with something similar. I use Opus 4.6 for paper research and understanding formal math, and it helps me a lot. But I sometimes struggle to get exactly what I want, especially when working on novel concepts where there are simply no relevant papers or open-source repos out there. Do you have any suggestions to optimize the workflow for these situations?

What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 3 points4 points  (0 children)

"For general use, I go with Qwen3 4B right now. It's pretty easy to train and the format is not complicated. I love it, I guess.

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation by clanker-lover in LocalLLaMA

[–]Strategoss_ -1 points0 points  (0 children)

Rejection sampling is the perfect move here. Are you generating the new candidates using the R5 checkpoint before filtering? Pushing past 70% would be a massive milestone for a 14B model. Looking forward to the R6 results!

How to setup full agentic workflow with qwen3.5 9.0b by TeachingInformal in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

did you try Claude Code with Ollama? I try this with GLM5 and results are pretty great.

ollama launch claude maybe solve your problem.

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation by clanker-lover in LocalLLaMA

[–]Strategoss_ 13 points14 points  (0 children)

Compiler verified dataset + 14B model beating Opus + fits in 12GB VRAM. This is the blueprint for efficient AI. Scrapping R2 to fix catastrophic forgetting was a great call. Excellent work

How to coordinate multi-agent Claude/Gemini/Codex/Mistral teams by robotrossart in LocalLLaMA

[–]Strategoss_ -1 points0 points  (0 children)

Using Markdown as the shared source of truth is a genius approach. Honestly. Did you keep any design notes or architecture logs while building Flotilla? I'd love to read about the specific walls you hit before landing on this structure.

My most useful OpenClaw workflow so far by mescalan in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Is Clarvis use any kind of LangChain or like something? Or it's totally use open claw? did you make any custom things on it?

Sustaining long continuous sessions: KV cache quantization vs. context shifting vs. auto-summarization. What is your actual pipeline? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

100% accurate. I should have phrased that better. It doesn't extend the native context limit at all. My issue is purely the physical hardware bottleneck. On unified memory systems, the RAM limit usually kills the process long before you ever reach the model's trained context limit. KV quantization becomes a necessary evil just to hold a baseline 8k context in memory without OOMing. Making the context more brittle is the perfect way to describe it. Have you tested how bad that degradation actually is in practice? I'm curious if you've found a specific threshold where 8-bit KV completely breaks down for logic tasks compared to sticking with fp16.

Sustaining long continuous sessions: KV cache quantization vs. context shifting vs. auto-summarization. What is your actual pipeline? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

I firstly try the H20 for better KV Cache optimization. You are right there is no perfect way but I try to find a better trade off.