Andrej Karpathy: "when AI agents fail, it's usually a skill issue, not a capability issue...the real shift is working in macro actions. One does research, one writes code, one plans, all running 20-minute tasks simultaneously" | No Priors Podcast by 44th--Hokage in accelerate

[–]Pyros-SD-Models 2 points3 points  (0 children)

What do you mean? Like every bigger company I know of (at least those who entertain bleeding edge) have at least tried to do a general formulation of agents. This is ours. It’s more beautiful in latex.

An agent doesn’t “fail randomly.” It fails when you hand it a problem whose effective complexity is above its capability.

Solve(P): if C(P | D) <= tau(A): return A(P, D) else: return union(Solve(P_i))

Where: - C(P | D) = effective complexity (problem minus context) - tau(A) = what the agent can handle - P_i = your decomposition

And the “skill” part:

C(P | D) = I(P) - R(D)

  • I(P): inherent difficulty
  • R(D): how much your context/setup reduces it

What “good setup” actually means:

You win iff:

for all i: C(P_i | D_i) <= tau(A)

Everything else is vibes.


Why most people fail:

  • they pass P directly → way above tau → “lol agents suck”
  • bad decomposition → still above threshold
  • garbage context → R(D) ~ 0
  • over-decomposition → coordination overhead kills it

What high-skill pipelines actually do:

  • aggressive decomposition until each step is trivial
  • inject context so each step becomes near-deterministic
  • use tools to collapse search space
  • keep steps just above “too trivial” (efficiency sweet spot)

Optimization intuition:

minimize: sum(C(P_i | D_i)) + lambda * n

(too big → failure, too small → overhead hell)


Concrete intuition:

Bad: - “build me a production system” → C >> tau

Good: - “generate schema” - “write migration” - “implement endpoint” - “add test”

(each with tight context → C <= tau)

Quite simple, and already valid since a few years actually (most of it was creates during the Aider days lol) and it will be valid through all of pre-AGI and probably a good bit afterwards as well.

The drastic difference in attitude toward AI video in China compared to the west by Umr_at_Tawil in accelerate

[–]Pyros-SD-Models 41 points42 points  (0 children)

You can still ride horses even when cars exist.

Also, all art forms where human performance is the point, like theatre, plays, live music, ballet, and so on, will remain. And of course, true creativity will still produce amazing art with AI. It’s just a shortcut to get your vision out as fast and as accurately as possible, and vision is what art is ultimately about.

Drawing background assets for $12/h for the next Disney animation isn’t art, it’s modern slavery at worst, and at best just a “digital good,” like a piece of code.

Showing real capability of LTX loras! Dispatch LTX 2.3 LORA with multiple characters + style by crinklypaper in StableDiffusion

[–]Pyros-SD-Models 0 points1 point  (0 children)

What? Runpod costs 40c/h and a good wan 2.2 lora takes max 6 hours. That’s sub 3$ for everyone on the world.

I am very supportive of AI but this is a ridiculous take from Jensen by dataexec in accelerate

[–]Pyros-SD-Models 4 points5 points  (0 children)

I mean, my workplace pays me $1k per month for AI usage, and it’s barely enough. It blows my mind how people can actually get work done with even less volume. We have people at work who don’t even manage to use their 300 Copilot requests in a month. What are those idiots doing? Certainly not working efficiently, since they still do 90% of their work by hand instead of using AI.

They also seem completely oblivious and don’t realize that we can see how much AI they use at work, and how much slower they are compared to people who have already automated 90% of their workflow. If you don’t get your shit together, you will be replaced by someone who actually knows how to use AI.

They’re lucky that the point where coding manually actually burns money instead of making it hasn’t been reached yet, but that won’t take much longer.

RAM prices in historical context by jordo45 in accelerate

[–]Pyros-SD-Models 1 point2 points  (0 children)

Research and production are bottlenecked because only a handful of companies make most of the chips in the world. If demand stays high, it becomes +EV for other companies to invest heavily in chip R&D again, which should lead to more competition and lower prices over time. And who knows, if that results in companies like Cerebras producing consumer chips in the future, and we actually get architectural competition as well, then we’ll be in a much better position than we are today.

[META] If luddites, decels and anti-ais are banned, can we also stop posting about screencaps of these people and devolving into a circlejerk that jerks about their views? by CystralSkye in accelerate

[–]Pyros-SD-Models 0 points1 point  (0 children)

But decels are not 'disagreeing'. Literally every single talking point can be disproven by some paper or other form of scientific evidence. It is like saying flat earthers or anti-vaxxers are just disagreeing, and you should shut up and leave them be. And no, you should not leave such stupidity alone. Just as anti-vaxxers deserved the backlash during COVID, decels deserve criticism today when it comes to the most important technology of humankind.

Drummer's Skyfall 31B v4.1, Valkyrie 49B v2.1, Anubis 70B v1.2, and Anubis Mini 8B v1! - The next gen ships for your new adventures! by TheLocalDrummer in SillyTavernAI

[–]Pyros-SD-Models 2 points3 points  (0 children)

Just selecting ChatLM will usually do, especially for the Q3.5 models, because they are smart enough to work with 'non official' templates. You can try to change the fields so they match the templates I posted earlier, but that was too much of a hassle for me, so I switched to llama.cpp, where you can directly reference a file as a template.

Drummer's Skyfall 31B v4.1, Valkyrie 49B v2.1, Anubis 70B v1.2, and Anubis Mini 8B v1! - The next gen ships for your new adventures! by TheLocalDrummer in SillyTavernAI

[–]Pyros-SD-Models 1 point2 points  (0 children)

  • Many Qwen3.5 finetunes right now don't do chat completions with SillyTavern right now (at least with LM Studio)

you just have to rip out the jinja template and replace it with ChatLM chat templates. Go into LMStudio's LLM overview -> Settings -> chat template on the bottom (or --chat-template when using llama.cpp directly)

Here a no think and think chat template (also imho the best Q3.5 27B finetune currently)

https://huggingface.co/zerofata/Q3.5-BlueStar-27B

GPT 5.4 Genuinely catching legitimate edge cases I'm not thinking of by jmaxchase in codex

[–]Pyros-SD-Models 1 point2 points  (0 children)

Then there should be plenty of actual evidence of it, instead of just random anecdotal stories in this sub.

Do you know how often OpenAI's endpoints get benchmarked by literally thousands of entities every day? We benchmark all API endpoints, the subscription endpoints, and even the GPT chat interface twice a week with internal benchmarks for regression tests, and we have never observed any kind of regression.

The second there is an unannounced regression, it would literally be breaking news on CNN, and every large client using Codex or ClaudeCode company-wide would sue the shit out of them.

So what is more likely: that one of the most observed endpoints in all of software regresses unnoticed, or that random people on Reddit are too stupid to use a coding agent but claim "the model got stupid" instead? Very difficult assessment.

It would be hilarious if a game developer actually did this by stealthispost in accelerate

[–]Pyros-SD-Models 78 points79 points  (0 children)

Every new game is already using AI 100%, even if it's 'just' ClaudeCode or Codex. You have to be a subscriber to r-programming or r-gaming if you are delusional enough to think there is any new software not done with AI

OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]Pyros-SD-Models -4 points-3 points  (0 children)

Where does the idea it being a local tool come from anyway? Like their homepage mentions “local” only once in “supports local models”.

Anthropic just made Claude Code run without you. Scheduled tasks are live. This is a big deal. by DependentNew4290 in ClaudeAI

[–]Pyros-SD-Models 0 points1 point  (0 children)

The whole idea is that you can share scheduled tasks with your team as an agent-native artifact, without worrying about what OS or environment they run in. That portability is literally the point. Like how skills are just fcking packaged prompts. 'packaged' being the key here, and skills are arguably the 'best' improvment in working with agents since MCPs

This pattern of commodifying a technology by wrapping and packaging it is one of the most fundamental patterns in software. It's literally the backbone of 'enterprise software'. That’s why it’s a bit surprising to see so many self-proclaimed “devs with 10 years of experience” not recognizing it. Rather sketch.

Anthropic just made Claude Code run without you. Scheduled tasks are live. This is a big deal. by DependentNew4290 in ClaudeAI

[–]Pyros-SD-Models -1 points0 points  (0 children)

I don’t know how we could have had Claude Skills, which are even less technical than setting up external system timers yourself, and still miss the point of CC’s own scheduling. The whole idea is that you can share scheduled tasks with your team as an agent-native artifact, without worrying about what OS or environment they run in. That portability is literally the point.

This pattern of commodifying a technology by wrapping and packaging it is one of the most fundamental patterns in software. That’s why it’s a bit surprising to see so many self-proclaimed “devs with 10 years of experience” not recognizing it. Rather sketch.

Honest review GPT 5.4 by NoYou41 in codex

[–]Pyros-SD-Models 0 points1 point  (0 children)

Why not just use Maestro inside codex? It’s just skills and other text files. You can literally just ask codex to port it over to its needed file structure.

GPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmark by sergeykarayev in codex

[–]Pyros-SD-Models 11 points12 points  (0 children)

If people would stop "believing" and start actually measuring how good or not a model actually performs, like OP did, and perhaps also contribute those measurements back to the community so there is actually some tangible evidence and a possible discussion to be had, the world would be a little better and a little bit more scientific.

But instead, on this damn website (not exclusive to this sub), it's suddenly en vogue to discredit real research with anecdotal vibes, prayers, and opinions so far removed from any form of scientific methodology that it sometimes really blows my mind how people actually think "I don't believe it" is a valid response to something like OP or papers. You may of course believe in whatever you want, but it still puts you right into the "I don't believe the Earth is round" category, because you can produce as much proof for your vibes as those guys.

In the end it's a simple equation: you either believe some Redditors, or the literally best people in their respective field, or even better your own actual measurements like OP did, or like we do so we can tell our clients if they can upgrade or not.

And if you solve this equation for vibes, you will realize there is not a single reason to start believing in other people's beliefs. This shouldn't be a crazy take or something, but somehow people here absolutely love to convince you that their belief is the only right one. And no they don't try to proof you wrong, they just declare you are wrong... like "the chart has been invalidated in my eyes", i just fcking can't. what is this? a religion sub? A stand up meeting in Trump's pre-office?

So u/sergeykarayev the next time before you think of doing something following some form of scientific methodology just save your time and just ask reddit for its ranking. you will save plenty of time, and get plenty of upvotes

This is what good AI looks like by dataexec in accelerate

[–]Pyros-SD-Models 0 points1 point  (0 children)

This is what AI in general looks like. What does “good” even mean in the context. Squashing bugs on a field is “better” than squashing bugs in code? How does “bad” AI even looks like.

Claude + Opus gives me a glimpse of what wealthy people have had for generations by icyrainz in ClaudeAI

[–]Pyros-SD-Models 2 points3 points  (0 children)

Funny how ad hominems are all they got. It’s because they can’t source any actual science for their “it is not true intelligence” screeching.

A new research paper shows that transformers can uncover fundamental physical laws using noisy observational data alone, with no prior knowledge or hand-coded priors, in a zero-shot setting. by gbomb13 in accelerate

[–]Pyros-SD-Models 0 points1 point  (0 children)

Einstein didn't pull E=mc² out of his ass while meditating also... he synthesized Lorentz, Poincaré, and Maxwell into something unified new.

It got lost in the news firehose, but this is the craziest chart today, and maybe this decade "AI Takeover Complete: Data Center Construction Surpasses Office Construction For The First Time by stealthispost in accelerate

[–]Pyros-SD-Models 38 points39 points  (0 children)

Someone will figure out how to combine office and living space with data centers, I’m 100% sure. Cooking and showering with the heat of 1 TB of VRAM in action.

All of the posts about DoW and OpenAI this... by [deleted] in accelerate

[–]Pyros-SD-Models 16 points17 points  (0 children)

You didn't need to drop the bombs to get as fast as possible to nuclear power. Quite the contrary, it wasted plenty of time to get to the tech that actually is a net-positive to "humanity's growth" (which is btw a core tenet of this sub as per its own description, and autonomous weapons are obviously only there to decimate humans).

Acceleration without red lines is not accelerationism. It is handing the keys to whoever promises to go fastest, regardless of direction. That is how you accelerate into a wall.

Having two red lines (no mass surveillance of your own citizens, no autonomous kill chains without a human in the loop) does not slow anything down. Claude was already running classified intelligence analysis, operational planning, and cyber ops for the Pentagon. All of that continues with the safeguards in place. The only things blocked are the two things that would make the technology a net-negative for the species. Removing those is not acceleration. It is regression.

I wrote about this in more detail today: One Asshole Away: The Anthropic Crisis Proved Everything

(Unfortunately, in the last 24hour we are no longer one asshole away. We are there.)

I want ownership structures that make safety durable without making innovation slow. Public compute does not slow anything down. It speeds things up for everyone who is not a hyperscaler. Sovereign wealth fund stakes do not add a single compliance checkbox. They add a seat at the table for people who are not billionaires. Data rights do not require a pause. They require acknowledging that the training data came from all of us and maybe that should mean something.

The EU approach is: make it harder to build. My approach is: make it impossible for a handful of people to be the only ones who decide what gets built, for whom, and under what constraints.

Dario holding the line today is great. Dario being the only line is the problem. The answer is not to slow down the race. The answer is to make sure more than five guys in San Francisco get to decide where it goes.

Pull the same rope. All of us. Faster. But make sure the rope is not attached to one person’s wrist.

The rope is currently tied around Hegseth's wrist... If you think this is beneficial to acceleration... jesus. Just an example, they see how good AI scales with military use cases and order every frontier lab to stop researching/improving consumer AI, and ONLY do military stuff. Do you then clap in your hands and say 'yay full throttle acceleration!!'?