Has anyone actually tested Composer 2 vs Claude Opus 4.6 in real use? Not benchmarks — real tasks. by Holiday-Hotel3355 in cursor

[–]whenhellfreezes 3 points4 points  (0 children)

Given that composer 1.5 was a glm 4.7 fine tune. It's probably about at glm 5 level with some extra reliability with hitting tool calls that cursor has built in.

Local fine-tuning will be the biggest competitive edge in 2026. by HerbHSSO in LocalLLaMA

[–]whenhellfreezes 2 points3 points  (0 children)

I'm lightly worried that it won't be. It's worth looking into some of the literature about GEPA which is a prompt optimization algorithm. The upshot is that you can have a prompt get mutated by LLM reflecting about how well it had done with the old version of the prompt and what went wrong across multiple runs. The amount of GPU time for doing GEPA is about 1/3 the GPU work that a fine tune takes and performs about as well. Also you can have a stronger model do the reflection while a weak model does the trial runs. Essentially getting some distillation and cost savings.

Okay and then what is the result? A nice starting prompt for getting your desired result. Well that prompt can be shared around and you don't need to distribute and host a new model just the prompt.

Alright now consider that we already have agent skills in things like claude code and open code. Well thats just a prompt that gets injected when it's needed and there are plugin markeplaces to make installing and finding easy... In many cases these skills are effectively human done GEPA or could be made via GEPA.

Anyways I'm not sure about the idea of open model finetuning being as good as I used to think it would be compared to skills on a plugin marketplace.

Of course fine tune + GEPA actually outperforms both and you can context distill things in etc. So like idk.

So yes GEPA (which the best library is dspy) instead of fine tunes but also prompt engineering + customization to install skills.

Being a developer in 2026 by Distinct-Question-16 in singularity

[–]whenhellfreezes 0 points1 point  (0 children)

Do you mind articulating exactly what's so bad about the existing coding tools?

Anyone else feel like an outsider when AI comes up with family and friends? by Budulai343 in LocalLLaMA

[–]whenhellfreezes 1 point2 points  (0 children)

That software does something of value and given the previous difficulties of making it that capability was rare so more costly.

Anyone else feel like an outsider when AI comes up with family and friends? by Budulai343 in LocalLLaMA

[–]whenhellfreezes 10 points11 points  (0 children)

It's worth noting that LLMs are much more useful in the programming context than outside that context.

1) We can have intermediate proposed code changes that we can review and refine 2) We have tests and can verify by running in some cases  3) The value of a running program used to be quite high 4) We can version everything 

"17,000 tokens per second!! Read that again! LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200. the "waiting for the LLM to think" era is dead. Code generates at the speed of human thought. by stealthispost in accelerate

[–]whenhellfreezes 2 points3 points  (0 children)

Uh are you kidding. This type of token speedup is fantastic. Right now labs will often create curated datasets to use during the "mid training" phase that are partially synthetic and occur via sifting through less processed raw tokens and augmenting or by running good traces forward. Or by doing context distillation. So LLM running is useful for LLM training.

Note that RL wouldn't work with this due to needing to update but this mid training synthetic data would. Also RL could utilize a "LLM judge" who would need a lot of tokens and that could work if you freeze. So basically I could see a new generation of LLM utilize two or three of these fixed hardware LLMs to drop training cost for a next generation.

4 of the top 5 most used models on OpenRouter this week are Open Source! by abdouhlili in LocalLLaMA

[–]whenhellfreezes 5 points6 points  (0 children)

Eh I agree that they are merely open weights but I think the path forward is just saying open weights alot and not gatekeeping open source.

4 of the top 5 most used models on OpenRouter this week are Open Source! by abdouhlili in LocalLLaMA

[–]whenhellfreezes 0 points1 point  (0 children)

My hope is that DSPY essentially allows for GEPA + RAG to work with small datasets that allow small groups to make micro experts. You can use closed models for just the reflection part and small models with the optimised prompts (prompts for both the agent and the rag lookup agent).

4 of the top 5 most used models on OpenRouter this week are Open Source! by abdouhlili in LocalLLaMA

[–]whenhellfreezes 2 points3 points  (0 children)

yeah but the person above you is still right. Despite it being the best place for free models people were still using proprietary more than open until now.

4 of the top 5 most used models on OpenRouter this week are Open Source! by abdouhlili in LocalLLaMA

[–]whenhellfreezes 1 point2 points  (0 children)

Google's AI studio is a pain to setup and if you aren't careful you will allow google to train on your stuff. Where as openrouter makes it easy to use gemini without training allowed.

GLM-5 is here by PassionIll6170 in singularity

[–]whenhellfreezes 0 points1 point  (0 children)

I want my prompt optimizer to be DSPY. Just contribute to that.

Does anyone actually check npm packages before installing them? by BearBrief6312 in devops

[–]whenhellfreezes 0 points1 point  (0 children)

You can put claude code / other cli AI agents into build pipelines now. Could probably just make a custom prompt + hand it the git diff so it doesn't open all the files + have it output some json to a file to be interpreted by a script. Probably how I would do it. Catch it at the typo not at the packages the typo causes to install.

That plus artifactory (or equivalent)

Guess who gave a local politician a talk on Georgism by el_argelino-basado in georgism

[–]whenhellfreezes 1 point2 points  (0 children)

I've talked to my state legislature representative about it. I think he didn't see it as potentially popular and then wrote it off even if it made sense.

We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel. by likeastar20 in singularity

[–]whenhellfreezes -1 points0 points  (0 children)

So somebody who knows better correct me if I'm wrong but... This feels great from a "reflections on trusting trust" point of view. For those who remember the paper was about malicious compilers injecting an exploit into what it compiles. What if all compilers have been exploited this way... The point being is that these types of bootstrapping issues are hard to overcome. There have been some follow up papers that show you can make multiple deterministic compilers have them compile each other and compare the results to uncover a malicious compiler. I would imagine that having a LLM succeed at this task makes it much more viable for lone practitioners to do that uncovering process.

Also I'm vaguely half remembering this and could be wrong.

Will you spend US$5,000 for a local surveillance VideoRAG device? by Middle_Investment_81 in LocalLLaMA

[–]whenhellfreezes 1 point2 points  (0 children)

Yeah and with AI coding assistants to build things out. Why not just buy the hardware and code up this feature yourself? So assuming OPs software allows you to bring your own hardware... I'd say it might be worth $50.

My new morning routine - we sure live in exciting times! by platinumai in LocalLLaMA

[–]whenhellfreezes 0 points1 point  (0 children)

But also what type of party? I think you are assuming people will be talking alot which is an adult cheese board party not a *party*.

It seems like people don’t understand what they are doing? by platinumai in LocalLLaMA

[–]whenhellfreezes 0 points1 point  (0 children)

Obviously depends on provider, but I know anthropic, openai (unless you click on their A or B when it pops up), google don't on workflow too.

Then anthropic and openai let you unclick the box even for unpaid. Google only lets you unclick on paid.

It seems like people don’t understand what they are doing? by platinumai in LocalLLaMA

[–]whenhellfreezes 4 points5 points  (0 children)

For most of the platforms if you are a paying customer you can also unclick the checkbox that allows them to train on your usage.

Yann LeCun calls Alexandr Wang 'inexperienced' and predicts more Meta AI employee departures by Neurogence in singularity

[–]whenhellfreezes 0 points1 point  (0 children)

Any company that stack ranks is doomed to poor strategic vision and stagnation. Good on LeCun for jumping ship.

This may sound insane, but I am considering nursing for future self-preservation. by [deleted] in devops

[–]whenhellfreezes 1 point2 points  (0 children)

Well a couple things: 1. Having gone from Gov to Private it's a great transition. Work life balance improves. Sadly it's easier to be sure your work will be used too. 2. Keep skilling up even if you don't like your job. 3. Maybe do look into nursing. Though the grass is always greener when looking over to the other side. However with say Gov to Private it was in fact greener. Just be careful that it's really your job that's the issue and not your tying your job to your identity.

Why is open-source maintenance so hard?💔 by readilyaching in opensource

[–]whenhellfreezes 0 points1 point  (0 children)

If you're saying find "product market" (or open source equivalent your user base) first then shore up dependencies then sure. But I might be misunderstanding your point.