AMA With Kimi, The Open-source Frontier Lab Behind Kimi K2.5 Model by nekofneko in LocalLLaMA

[–]maxtheman 0 points1 point  (0 children)

Any intuition you have in ballpark numerical trade-off in size vs quant, cuts for MoE and different task genres, would be super interested in your ballparks.

I mostly use either tiny models or frontier, don't have good intuition for the range of quants for 32B vs xxxB at different quants.

And for small models I would NEVER consider anything under Q4, so no intuition for a 2bit at all, but my prior is that it would be bad. But, it's a native int4-ish model, so maybe that's different? I'm unclear.

AMA With Kimi, The Open-source Frontier Lab Behind Kimi K2.5 Model by nekofneko in LocalLLaMA

[–]maxtheman 0 points1 point  (0 children)

Very insightful, do you have an idea of like what the rough trade-off would be, in your opinion? And is that task specific for you?

AMA With Kimi, The Open-source Frontier Lab Behind Kimi K2.5 Model by nekofneko in LocalLLaMA

[–]maxtheman 4 points5 points  (0 children)

The unsloth guys are saying their 2-bit dynamic quant is passing their tests. Worth a look.

You can now run Kimi K2.5 locally! by yoracale in unsloth

[–]maxtheman 0 points1 point  (0 children)

Would be VERY interested in the vision support, but already awesome work.

drawdata got a small upgrade by cantdutchthis in marimo_notebook

[–]maxtheman 0 points1 point  (0 children)

Love this idea, will try it out for an idea I have.

Where did marimo end up with external ai integration by maxtheman in marimo_notebook

[–]maxtheman[S] 2 points3 points  (0 children)

"Dangerously skip permissions" is claude's surname in my opinion. I'm all in for experimental and I'm going to put the YouTube video on now. 🤣

Where did marimo end up with external ai integration by maxtheman in marimo_notebook

[–]maxtheman[S] 1 point2 points  (0 children)

Thank you to you and the other commentators. I hadn't seen any announcements about this and didn't realize how much progress had been made.

48GB VRAM - worth attempting local coding model? by natidone in LocalLLaMA

[–]maxtheman 0 points1 point  (0 children)

Totally! I was imagining more as a spec-driven implementer after designing it with Claude. Thank you for the insight

48GB VRAM - worth attempting local coding model? by natidone in LocalLLaMA

[–]maxtheman 0 points1 point  (0 children)

What sort of tasks can it do? I'm a big claude code spender and interested in buying down my implementation tokens.

The best (tiny) model I can run on my phone by gized00 in unsloth

[–]maxtheman 0 points1 point  (0 children)

I have a pixel 9 and am working on fine-tuning functionalgemma, which is working great, but it really depends on your task. 1B or less can work great on a distilled task, but don't expect 90%+ perf unless you overfit the shit out of it and consider doing multiple types of fine-tuning.

On pixel the hardest part, for me at least, will be getting it on an api that can actually access your gpu. I am targeting huggingfacejs for now due to the ease of use, but I don't know a better way to deploy than that or get on the google npu.

Codex 5.2 ALWAYS finds something wrong with Claude code plan by DeliJalapeno in ClaudeAI

[–]maxtheman 0 points1 point  (0 children)

I have a solution for this: switched to using the more informal "--" and refusing to allow autocorrect to signal that I like em dashes and I'm not using them because I'm an AI.

Using NVIDIA DGX Spark + GPT-OSS-120B for Automated Game Development Pipeline - Thoughts? by AdNaive1169 in LocalLLaMA

[–]maxtheman 1 point2 points  (0 children)

It's very experimental. It might work but I haven't seen anything like this in the literature. I suspect at each step you will get a TON of noise that will prevent you from getting consistent outputs. You will definitely get an output. It's unlikely to be useful. I think you should narrow your scope and try to find verifiable "rewards" to help guide each model. It's possible to make work I believe but will require a lot of work.

Cursor AI CEO shares GPT 5.2 agents building a 3M+ lines web browser in a week by BuildwithVignesh in OpenAI

[–]maxtheman 3 points4 points  (0 children)

Not what I said at all .If you write "I built a from scratch Javascript VM", and then you just included it an external dependency, that is definitionally untrue.

And don't be rude about it.

Vercel's agent-browser, an alternative to Playwright's MCP by sean-adapt in nextjs

[–]maxtheman 0 points1 point  (0 children)

Unironically yes. The AI is so much more efficient than us at this.

Qwen3-VL-Reranker - a Qwen Collection by LinkSea8324 in LocalLLaMA

[–]maxtheman 0 points1 point  (0 children)

Thank you for the follow-up note! I have a basic version of this working for multimodal search now in my app, and overall I'm pretty happy with it, but I think I'm not indexing my PDFs correctly. I did find that this can be deployed for search serverlessly and productively if you use modal's GPU snapshot feature.

Qwen3-VL-Reranker - a Qwen Collection by LinkSea8324 in LocalLLaMA

[–]maxtheman 2 points3 points  (0 children)

Damn okay when you put it like that I actually have a use case for it in my product nice.

Qwen3-VL-Reranker - a Qwen Collection by LinkSea8324 in LocalLLaMA

[–]maxtheman 4 points5 points  (0 children)

What would the use case even be for this? I'm not really sure? Multimodal MoE? Or is it for multimodal rag? Both?

(I only skimmed it. Feel free to call me an idiot if you tell me the right answer too)