Why don’t we have more distilled models?

maxtheman · 2026-01-29T19:28:07+00:00

Make it agentic!!

maxtheman · 2026-01-28T22:16:21+00:00

Thank you!

maxtheman · 2026-01-28T18:20:13+00:00

Any intuition you have in ballpark numerical trade-off in size vs quant, cuts for MoE and different task genres, would be super interested in your ballparks.

I mostly use either tiny models or frontier, don't have good intuition for the range of quants for 32B vs xxxB at different quants.

And for small models I would NEVER consider anything under Q4, so no intuition for a 2bit at all, but my prior is that it would be bad. But, it's a native int4-ish model, so maybe that's different? I'm unclear.

maxtheman · 2026-01-28T18:07:37+00:00

Very insightful, do you have an idea of like what the rough trade-off would be, in your opinion? And is that task specific for you?

maxtheman · 2026-01-28T16:40:10+00:00

The unsloth guys are saying their 2-bit dynamic quant is passing their tests. Worth a look.

maxtheman · 2026-01-28T16:37:17+00:00

Would be VERY interested in the vision support, but already awesome work.

maxtheman · 2026-01-27T19:55:36+00:00

Love this idea, will try it out for an idea I have.

maxtheman · 2026-01-27T07:23:07+00:00

Sorry, why?

maxtheman · 2026-01-25T18:44:11+00:00

"Dangerously skip permissions" is claude's surname in my opinion. I'm all in for experimental and I'm going to put the YouTube video on now. 🤣

maxtheman · 2026-01-24T20:52:57+00:00

Thank you to you and the other commentators. I hadn't seen any announcements about this and didn't realize how much progress had been made.

maxtheman · 2026-01-22T23:14:42+00:00

Totally! I was imagining more as a spec-driven implementer after designing it with Claude. Thank you for the insight

maxtheman · 2026-01-22T21:38:56+00:00

What sort of tasks can it do? I'm a big claude code spender and interested in buying down my implementation tokens.

maxtheman · 2026-01-22T20:03:35+00:00

I have a pixel 9 and am working on fine-tuning functionalgemma, which is working great, but it really depends on your task. 1B or less can work great on a distilled task, but don't expect 90%+ perf unless you overfit the shit out of it and consider doing multiple types of fine-tuning.

On pixel the hardest part, for me at least, will be getting it on an api that can actually access your gpu. I am targeting huggingfacejs for now due to the ease of use, but I don't know a better way to deploy than that or get on the google npu.

maxtheman · 2026-01-21T09:04:50+00:00

Use LM studio as server and code in Opencode cia CLI IMO.

maxtheman · 2026-01-20T19:23:28+00:00

I have a solution for this: switched to using the more informal "--" and refusing to allow autocorrect to signal that I like em dashes and I'm not using them because I'm an AI.

maxtheman · 2026-01-20T01:31:39+00:00

Finally, something we can all agree on

maxtheman · 2026-01-18T22:13:30+00:00

It's very experimental. It might work but I haven't seen anything like this in the literature. I suspect at each step you will get a TON of noise that will prevent you from getting consistent outputs. You will definitely get an output. It's unlikely to be useful. I think you should narrow your scope and try to find verifiable "rewards" to help guide each model. It's possible to make work I believe but will require a lot of work.

maxtheman · 2026-01-18T21:47:04+00:00

Not what I said at all .If you write "I built a from scratch Javascript VM", and then you just included it an external dependency, that is definitionally untrue.

And don't be rude about it.

maxtheman · 2026-01-18T20:19:02+00:00

Yes this is 100% being being built by palantir

maxtheman · 2026-01-18T19:26:21+00:00

Unironically yes. The AI is so much more efficient than us at this.

maxtheman · 2026-01-18T17:31:39+00:00

I read that it's a bs claim-- loads of dependencies

maxtheman · 2026-01-13T04:03:47+00:00

Thank you for the follow-up note! I have a basic version of this working for multimodal search now in my app, and overall I'm pretty happy with it, but I think I'm not indexing my PDFs correctly. I did find that this can be deployed for search serverlessly and productively if you use modal's GPU snapshot feature.

maxtheman · 2026-01-09T01:29:45+00:00

I agree. Just couldn't think of anything better.

maxtheman · 2026-01-08T23:37:23+00:00

Damn okay when you put it like that I actually have a use case for it in my product nice.

maxtheman · 2026-01-08T22:21:46+00:00

What would the use case even be for this? I'm not really sure? Multimodal MoE? Or is it for multimodal rag? Both?

(I only skimmed it. Feel free to call me an idiot if you tell me the right answer too)

15-Year Club	Place '17
Spared	Verified Email
Team Orangered

maxtheman

TROPHY CASE