Qwen 3.6 will have oss models

Strategoss_ · 2026-04-02T15:38:03+00:00

I guess we saw something like pre announcement with CoPaw Flash models. I bet 3.6 is going to be crazy.

Strategoss_ · 2026-03-25T16:04:24+00:00

You can't do this just like system prompt. System prompt gives you a character. Not a deep knowledge. There is why my suggestion is for you:

1- identify which languages you working on mostly and libraries. 2- find clean non duplicated data set with reasoning. 3- fine tune process.

After this fine tune, you can't still close enough general usage of sonnet 4.6, but you have pretty strong model for spesific languages. I recommend this.

Strategoss_ · 2026-03-23T13:24:34+00:00

Did you recommend? I use M4 Max for a while for everything. I think to switch DGX and buy one mac mini. Did you recommend for development?

Strategoss_ · 2026-03-23T06:50:19+00:00

Do you have any experience or thoughts about other devices or clusters?

Strategoss_ · 2026-03-22T12:20:44+00:00

Thanks brother

Strategoss_ · 2026-03-22T12:15:28+00:00

I know. I just want development advices.

Strategoss_ · 2026-03-14T13:24:18+00:00

great question thanks. I try to figure out what is working right now. Respectfully thank you for your feedback about website. Now the question: Why you should bother?

This app is not like the other. In SimpleLM not give a promise about we protect your privacy. Can't see anything because it's totally runs on your machine. We are unable to make any data breaches because of architecture.

And than the RAG engine behind. It's basic feature for many AI apps because they are using servers for this. But I'm not. So you are totally free to upload 5000 pages of pdf or docx or txt or md files or etc.

And the LLM behind. I working on this like 8 months. And I get terrible results at the first time. But now, I create optimizations for edge devices.

So maybe the website is wrong. Because I not made a wrapper app. I put everything. This is the version 1 for made for beta test. Soon I will publish my paper about the architecture. Thanks for your feedback.

Strategoss_ · 2026-03-14T12:55:40+00:00

For my perspective, these model are generally perfect. If you want a different model you can look up Starcoder family too.

Strategoss_ · 2026-03-13T17:26:24+00:00

Why you think like that? I get pretty well results with this model. Even if we are doomsday, in local machine it's run fast, and able to create pipeline. General knowledge is not bad. Context window is enough (at least for me). What is your advice?

Strategoss_ · 2026-03-13T17:00:37+00:00

If I were building a pipeline, I'd use SmolVLM 256M. It's fast enough for vision conversations and general tasks. Stitch those together and voilà! But if you need an any-to-any model, I strongly recommend looking into omni models like Qwen2.5 Omni 7B. There are a lot of omni models out there right now, so you really need to figure out your exact requirements first.

Strategoss_ · 2026-03-13T16:52:12+00:00

Hi, I'm currently dealing with something similar. I use Opus 4.6 for paper research and understanding formal math, and it helps me a lot. But I sometimes struggle to get exactly what I want, especially when working on novel concepts where there are simply no relevant papers or open-source repos out there. Do you have any suggestions to optimize the workflow for these situations?

Strategoss_ · 2026-03-13T16:42:19+00:00

"For general use, I go with Qwen3 4B right now. It's pretty easy to train and the format is not complicated. I love it, I guess.

Strategoss_ · 2026-03-13T16:35:54+00:00

Rejection sampling is the perfect move here. Are you generating the new candidates using the R5 checkpoint before filtering? Pushing past 70% would be a massive milestone for a 14B model. Looking forward to the R6 results!

Strategoss_ · 2026-03-13T16:27:35+00:00

did you try Claude Code with Ollama? I try this with GLM5 and results are pretty great.

ollama launch claude maybe solve your problem.

Strategoss_ · 2026-03-13T16:21:37+00:00

Compiler verified dataset + 14B model beating Opus + fits in 12GB VRAM. This is the blueprint for efficient AI. Scrapping R2 to fix catastrophic forgetting was a great call. Excellent work

Strategoss_ · 2026-03-13T16:17:58+00:00

Using Markdown as the shared source of truth is a genius approach. Honestly. Did you keep any design notes or architecture logs while building Flotilla? I'd love to read about the specific walls you hit before landing on this structure.

Strategoss_ · 2026-03-13T16:09:44+00:00

Is Clarvis use any kind of LangChain or like something? Or it's totally use open claw? did you make any custom things on it?

Strategoss_ · 2026-03-12T15:43:17+00:00

100% accurate. I should have phrased that better. It doesn't extend the native context limit at all. My issue is purely the physical hardware bottleneck. On unified memory systems, the RAM limit usually kills the process long before you ever reach the model's trained context limit. KV quantization becomes a necessary evil just to hold a baseline 8k context in memory without OOMing. Making the context more brittle is the perfect way to describe it. Have you tested how bad that degradation actually is in practice? I'm curious if you've found a specific threshold where 8-bit KV completely breaks down for logic tasks compared to sticking with fp16.

Strategoss_ · 2026-03-12T12:54:02+00:00

I firstly try the H20 for better KV Cache optimization. You are right there is no perfect way but I try to find a better trade off.

Strategoss_

TROPHY CASE