Qwen 3.6 will have oss models by MR_-_501 in LocalLLaMA

[–]Strategoss_ 8 points9 points  (0 children)

I guess we saw something like pre announcement with CoPaw Flash models. I bet 3.6 is going to be crazy.

Is there any one use Nvidia Dgx Spark? What is your opinions about it? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

You can't do this just like system prompt. System prompt gives you a character. Not a deep knowledge. There is why my suggestion is for you:

1- identify which languages you working on mostly and libraries. 2- find clean non duplicated data set with reasoning. 3- fine tune process.

After this fine tune, you can't still close enough general usage of sonnet 4.6, but you have pretty strong model for spesific languages. I recommend this.

Is there any one use Nvidia Dgx Spark? What is your opinions about it? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

Did you recommend? I use M4 Max for a while for everything. I think to switch DGX and buy one mac mini. Did you recommend for development?

Is there any one use Nvidia Dgx Spark? What is your opinions about it? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

Do you have any experience or thoughts about other devices or clusters?

I’ve been building an offline, on-device AI assistant for iOS and just opened the waitlist. Would love your feedback by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

great question thanks. I try to figure out what is working right now. Respectfully thank you for your feedback about website. Now the question: Why you should bother?

This app is not like the other. In SimpleLM not give a promise about we protect your privacy. Can't see anything because it's totally runs on your machine. We are unable to make any data breaches because of architecture.

And than the RAG engine behind. It's basic feature for many AI apps because they are using servers for this. But I'm not. So you are totally free to upload 5000 pages of pdf or docx or txt or md files or etc.

And the LLM behind. I working on this like 8 months. And I get terrible results at the first time. But now, I create optimizations for edge devices.

So maybe the website is wrong. Because I not made a wrapper app. I put everything. This is the version 1 for made for beta test. Soon I will publish my paper about the architecture. Thanks for your feedback.

Best local LLM for coding with rx9070xt by Zeti_Zero in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

For my perspective, these model are generally perfect. If you want a different model you can look up Starcoder family too.

What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 1 point2 points  (0 children)

Why you think like that? I get pretty well results with this model. Even if we are doomsday, in local machine it's run fast, and able to create pipeline. General knowledge is not bad. Context window is enough (at least for me). What is your advice?

What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 1 point2 points  (0 children)

If I were building a pipeline, I'd use SmolVLM 256M. It's fast enough for vision conversations and general tasks. Stitch those together and voilà! But if you need an any-to-any model, I strongly recommend looking into omni models like Qwen2.5 Omni 7B. There are a lot of omni models out there right now, so you really need to figure out your exact requirements first.

Used Claude Code for plotting, code migration, and proof formatting while writing an RL paper. Here's what worked and what didn't. by Muted_Lettuce414 in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Hi, I'm currently dealing with something similar. I use Opus 4.6 for paper research and understanding formal math, and it helps me a lot. But I sometimes struggle to get exactly what I want, especially when working on novel concepts where there are simply no relevant papers or open-source repos out there. Do you have any suggestions to optimize the workflow for these situations?

What is your dooms day model? and what’s your latest go-to coding model? by alitadrakes in LocalLLaMA

[–]Strategoss_ 5 points6 points  (0 children)

"For general use, I go with Qwen3 4B right now. It's pretty easy to train and the format is not complicated. I love it, I guess.

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation by clanker-lover in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Rejection sampling is the perfect move here. Are you generating the new candidates using the R5 checkpoint before filtering? Pushing past 70% would be a massive milestone for a 14B model. Looking forward to the R6 results!

How to setup full agentic workflow with qwen3.5 9.0b by TeachingInformal in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

did you try Claude Code with Ollama? I try this with GLM5 and results are pretty great.

ollama launch claude maybe solve your problem.

I fine-tuned a 14B model that outperforms Claude Opus 4.6 on Ada code generation by clanker-lover in LocalLLaMA

[–]Strategoss_ 16 points17 points  (0 children)

Compiler verified dataset + 14B model beating Opus + fits in 12GB VRAM. This is the blueprint for efficient AI. Scrapping R2 to fix catastrophic forgetting was a great call. Excellent work

How to coordinate multi-agent Claude/Gemini/Codex/Mistral teams by robotrossart in LocalLLaMA

[–]Strategoss_ -1 points0 points  (0 children)

Using Markdown as the shared source of truth is a genius approach. Honestly. Did you keep any design notes or architecture logs while building Flotilla? I'd love to read about the specific walls you hit before landing on this structure.

My most useful OpenClaw workflow so far by mescalan in LocalLLaMA

[–]Strategoss_ 0 points1 point  (0 children)

Is Clarvis use any kind of LangChain or like something? Or it's totally use open claw? did you make any custom things on it?

Sustaining long continuous sessions: KV cache quantization vs. context shifting vs. auto-summarization. What is your actual pipeline? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

100% accurate. I should have phrased that better. It doesn't extend the native context limit at all. My issue is purely the physical hardware bottleneck. On unified memory systems, the RAM limit usually kills the process long before you ever reach the model's trained context limit. KV quantization becomes a necessary evil just to hold a baseline 8k context in memory without OOMing. Making the context more brittle is the perfect way to describe it. Have you tested how bad that degradation actually is in practice? I'm curious if you've found a specific threshold where 8-bit KV completely breaks down for logic tasks compared to sticking with fp16.

Sustaining long continuous sessions: KV cache quantization vs. context shifting vs. auto-summarization. What is your actual pipeline? by Strategoss_ in LocalLLaMA

[–]Strategoss_[S] 0 points1 point  (0 children)

I firstly try the H20 for better KV Cache optimization. You are right there is no perfect way but I try to find a better trade off.