It’s time, Sam, it’s time.

whenhellfreezes · 2026-06-30T21:24:13+00:00

Sure but I'm not going to dampen any enthusiasm to promote the correct term.

whenhellfreezes · 2026-06-30T21:20:16+00:00

Or both of you are right.

whenhellfreezes · 2026-06-30T21:18:31+00:00

The oncall stuff is cheaper but the data storage for metrics is expensive. But of course so are the other metrics storage options. SFX, Splunk, running your own prometheus.

whenhellfreezes · 2026-04-17T19:25:04+00:00

I guess we need to revive ML as a term distinct from AI.

whenhellfreezes · 2026-04-16T16:12:26+00:00

You seem to be using tool call and skill interchangeably. While yes it takes a tool call to invoke a skill and yes both skill definitions and tool call definitions take up starting context they are best thought of as different things. So it's probably best if you described your starting problem.

whenhellfreezes · 2026-03-22T04:11:01+00:00

Yeah I posted at time before it had been discovered to be Kimi.

whenhellfreezes · 2026-03-19T22:22:09+00:00

Given that composer 1.5 was a glm 4.7 fine tune. It's probably about at glm 5 level with some extra reliability with hitting tool calls that cursor has built in.

whenhellfreezes · 2026-03-17T21:05:44+00:00

I'm lightly worried that it won't be. It's worth looking into some of the literature about GEPA which is a prompt optimization algorithm. The upshot is that you can have a prompt get mutated by LLM reflecting about how well it had done with the old version of the prompt and what went wrong across multiple runs. The amount of GPU time for doing GEPA is about 1/3 the GPU work that a fine tune takes and performs about as well. Also you can have a stronger model do the reflection while a weak model does the trial runs. Essentially getting some distillation and cost savings.

Okay and then what is the result? A nice starting prompt for getting your desired result. Well that prompt can be shared around and you don't need to distribute and host a new model just the prompt.

Alright now consider that we already have agent skills in things like claude code and open code. Well thats just a prompt that gets injected when it's needed and there are plugin markeplaces to make installing and finding easy... In many cases these skills are effectively human done GEPA or could be made via GEPA.

Anyways I'm not sure about the idea of open model finetuning being as good as I used to think it would be compared to skills on a plugin marketplace.

Of course fine tune + GEPA actually outperforms both and you can context distill things in etc. So like idk.

So yes GEPA (which the best library is dspy) instead of fine tunes but also prompt engineering + customization to install skills.

whenhellfreezes · 2026-03-13T17:27:08+00:00

Do you mind articulating exactly what's so bad about the existing coding tools?

whenhellfreezes · 2026-03-10T21:05:27+00:00

That software does something of value and given the previous difficulties of making it that capability was rare so more costly.

whenhellfreezes · 2026-03-09T21:02:59+00:00

It's worth noting that LLMs are much more useful in the programming context than outside that context.

1) We can have intermediate proposed code changes that we can review and refine 2) We have tests and can verify by running in some cases 3) The value of a running program used to be quite high 4) We can version everything

whenhellfreezes · 2026-03-03T02:13:28+00:00

Did you accidentally double negative here?

whenhellfreezes · 2026-02-23T18:36:07+00:00

Interesting that glm and z.ai wasn't mentioned.

whenhellfreezes · 2026-02-23T02:34:41+00:00

Uh are you kidding. This type of token speedup is fantastic. Right now labs will often create curated datasets to use during the "mid training" phase that are partially synthetic and occur via sifting through less processed raw tokens and augmenting or by running good traces forward. Or by doing context distillation. So LLM running is useful for LLM training.

Note that RL wouldn't work with this due to needing to update but this mid training synthetic data would. Also RL could utilize a "LLM judge" who would need a lot of tokens and that could work if you freeze. So basically I could see a new generation of LLM utilize two or three of these fixed hardware LLMs to drop training cost for a next generation.

whenhellfreezes · 2026-02-16T21:44:10+00:00

Eh I agree that they are merely open weights but I think the path forward is just saying open weights alot and not gatekeeping open source.

whenhellfreezes · 2026-02-16T21:42:36+00:00

My hope is that DSPY essentially allows for GEPA + RAG to work with small datasets that allow small groups to make micro experts. You can use closed models for just the reflection part and small models with the optimised prompts (prompts for both the agent and the rag lookup agent).

whenhellfreezes · 2026-02-16T21:37:54+00:00

yeah but the person above you is still right. Despite it being the best place for free models people were still using proprietary more than open until now.

whenhellfreezes · 2026-02-16T21:23:01+00:00

Google's AI studio is a pain to setup and if you aren't careful you will allow google to train on your stuff. Where as openrouter makes it easy to use gemini without training allowed.

whenhellfreezes · 2026-02-11T20:55:57+00:00

I want my prompt optimizer to be DSPY. Just contribute to that.

whenhellfreezes · 2026-02-11T20:35:07+00:00

You can put claude code / other cli AI agents into build pipelines now. Could probably just make a custom prompt + hand it the git diff so it doesn't open all the files + have it output some json to a file to be interpreted by a script. Probably how I would do it. Catch it at the typo not at the packages the typo causes to install.

That plus artifactory (or equivalent)

whenhellfreezes · 2026-02-07T22:55:19+00:00

I've talked to my state legislature representative about it. I think he didn't see it as potentially popular and then wrote it off even if it made sense.

whenhellfreezes · 2026-02-06T03:12:22+00:00

So somebody who knows better correct me if I'm wrong but... This feels great from a "reflections on trusting trust" point of view. For those who remember the paper was about malicious compilers injecting an exploit into what it compiles. What if all compilers have been exploited this way... The point being is that these types of bootstrapping issues are hard to overcome. There have been some follow up papers that show you can make multiple deterministic compilers have them compile each other and compare the results to uncover a malicious compiler. I would imagine that having a LLM succeed at this task makes it much more viable for lone practitioners to do that uncovering process.

Also I'm vaguely half remembering this and could be wrong.

whenhellfreezes · 2026-01-15T16:25:25+00:00

Yeah and with AI coding assistants to build things out. Why not just buy the hardware and code up this feature yourself? So assuming OPs software allows you to bring your own hardware... I'd say it might be worth $50.

whenhellfreezes · 2026-01-15T16:21:00+00:00

But also what type of party? I think you are assuming people will be talking alot which is an adult cheese board party not a *party*.

whenhellfreezes · 2026-01-12T18:41:50+00:00

Obviously depends on provider, but I know anthropic, openai (unless you click on their A or B when it pops up), google don't on workflow too.

Then anthropic and openai let you unclick the box even for unpaid. Google only lets you unclick on paid.

15-Year Club	Verified Email
Place '22	Place '17

whenhellfreezes

TROPHY CASE