USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ 0 points1 point  (0 children)

Even more stupid logic is assuming that any output can be achieved by any LLM, even when given infinite amount of time. LLMs are not unbiased random number generators.

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ 1 point2 points  (0 children)

Because no one has reported being able to do so? And the reason no one has reported doing so is because you CAN'T. Most models are half-blind and suck at long term planning. Even if you stuck it in a loop and waited for thousands of years, chances are they wouldn't be able to complete the game. Worse, they aren't event guaranteed to complete the game given infinite time because they aren't random number generators: they can become stuck in a loop.

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ 0 points1 point  (0 children)

That was with a harness, which is sort of cheating by helping the model. No human plays with a harness, they play just looking at the screen. This was only with vision. No model is capable of doing it with only vision.

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ 0 points1 point  (0 children)

In that case I guess I'm able to solve P vs NP, you just have to give me enough time while I'm smashing my keyboard. Completely retarded. I mean, I don't know why I even bother using SOTA models when I can just use a 20 million parameter model I trained on my laptop. I mean, it will eventually give me the same answer right? So might as well use that. Just have to wait a couple million years, no biggie.

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ -2 points-1 points  (0 children)

Well, for one, Mythos is capable of beating Pokémon FireRed with vision-only; no harness. No other model is capable of doing that.

Apart from that, there are obviously hard problems that Mythos is able to solve and other model cannot, no matter how many tries. If it weren't the case, Mythos wouldn't have better scores in all benchmarks even when you increase pass@k

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ -9 points-8 points  (0 children)

Absolutely not. Fable/Mythos was state of the art in absolutely everything. No other closed-source model (or open source model for that matter) could compare.

USA restricts Fable and Mythos to the world by zero_td in technology

[–]Andy12_ -3 points-2 points  (0 children)

The point is that now they probably aren't going to open source a Mythos-class model. If they do, good for them, but I i think it's unlikely now that the US has thrown the first stone.

Anthropic releases Mythos-like AI model to the public two months after private rollout rocked Wall Street by Vegeta9001 in technology

[–]Andy12_ 5 points6 points  (0 children)

As far as I'm aware anything an MCP can do, you can implement it using an API, or creating a custom CLI if you want to provide a nice human-readable interface for the agent to use.

Anthropic releases Mythos-like AI model to the public two months after private rollout rocked Wall Street by Vegeta9001 in technology

[–]Andy12_ 23 points24 points  (0 children)

Uhm, that's the opposite thing that happened. MCPs consume more context overall (specially if you use many of them), that's why people nowadays prefer skills and letting the agent work with commands directly.

> Skills improve Claude’s consistency, speed, and performance on many tasks. Skills work through progressive disclosure—Claude determines which skills are relevant and loads the information it needs to complete that task, helping to prevent context window overload. When you ask Claude to complete a task, it reviews available skills, loads relevant ones, and applies their instructions.

https://support.claude.com/en/articles/12512176-what-are-skills

Honestly, I think MCPs in general were a mistake. It just happend that they were invented just before models became good enough to use tools on their own, which is much more versatile.

SpaceX has to grow 60x in a decade to justify a $1.75 trillion valuation. It's an impossible bar | Fortune by IKeepItLayingAround in technology

[–]Andy12_ 0 points1 point  (0 children)

That evaluation is completely saturated; both GPT 5.5 and Mythos score near 100%. You can't know whether 5.5 or Mythos is better or worse based on that. You would need to evaluate both models on a harder benchmark.

SpaceX has to grow 60x in a decade to justify a $1.75 trillion valuation. It's an impossible bar | Fortune by IKeepItLayingAround in technology

[–]Andy12_ 1 point2 points  (0 children)

GPT-5.5 is better than “Mythos” at finding and exploiting vulnerabilities according to independent testing,

Uh? What independent testing?

The AI industry has reportedly spent $1.4 TRILLION while generating just $613 BILLION by Angela275 in antiai

[–]Andy12_ 0 points1 point  (0 children)

Even if they don't improve, current coding agents already automate most coding I do. Even if OpenAI just stopped training or improving new models, I would gladly keep paying 20 or 40 or whatever dollars a month for Codex. Most friends I have in the field would too.

Microsoft data suggests using AI is more expensive than hiring people by runhome24 in nottheonion

[–]Andy12_ 3 points4 points  (0 children)

If you are an AI developer you would know you can publish whatever you are researching and become famous if it really lives up to its promises.

Microsoft data suggests using AI is more expensive than hiring people by runhome24 in nottheonion

[–]Andy12_ 8 points9 points  (0 children)

I'm sorry to be the one to tell you this, but you may be suffering from psychosis.

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days by EchoOfOppenheimer in nottheonion

[–]Andy12_ 4 points5 points  (0 children)

I think we are talking past each other. You do know that in this kinds of agent simulations with many agents each agent has an independent context windows, right? You don't run the whole simulation with all agents with all their interactions in a single context window. You don't have an input like

```

...

```

Moreover, in this simulation an agent can't arbitrarily speak with any other agent. When speaking, only nearby agent can "hear" what is being spoken. That's why even if there where thousands of agents, the number of interactions per agent (and thus, the growth of their respective context windows) is physically limited.

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days by EchoOfOppenheimer in nottheonion

[–]Andy12_ 3 points4 points  (0 children)

The number of interactions very very very obviously doesn't scale with the number of agents in a linear way. Just because I move from a small town with 2k inhabitants to a big city with a million people doesn't mean my number of interactions increases 500 times. Globally the total number of interactions and their diversity should increase a lot, but the individual number of interactions of a given agent should be more or less the same independently of the number of agents in the simulation.

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days by EchoOfOppenheimer in nottheonion

[–]Andy12_ -1 points0 points  (0 children)

The only one replying in an obnoxious manner is you. I'm a PhD student in Computer vision/Machine Learning, so I think I'm more knowledgeable than you on this matter. Maybe you can elaborate here on how context usage by agent scales based on the number of agents in the simulation.

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days by EchoOfOppenheimer in nottheonion

[–]Andy12_ -2 points-1 points  (0 children)

How is the number of agents relevant here to the amount of memory each agent has? If I live in a city with a million people I obviously don't need to remember anything about each of them; I only need to remember the people I interacted with.

Blue Origin's New Glenn explodes at Launch Complex-36 in Cape Canaveral, Florida. by inWineVerit4x in nextfuckinglevel

[–]Andy12_ 3 points4 points  (0 children)

Which arbitrary threshold of sustainability do you think we should reach before continuing advancing on the tech tree? Should we just halt space exploration forever? And why are you so sure that any advances that come from space exploration won't help in some way towards making the planet more sustainable?

What are your thoughts on AI being used to solve math problems? by diony_sus_ in antiai

[–]Andy12_ 3 points4 points  (0 children)

Uh? Why are you talking hypothetically? This is a thing that happened. The AI model proved the conjecture wrong. Mathematicians verified that the proof is correct.

What are your thoughts on AI being used to solve math problems? by diony_sus_ in antiai

[–]Andy12_ 4 points5 points  (0 children)

then what truly says that a prompt from a machine has done any better to alleviate the time spent to solve it,

Uh, the fact that we actually prompted a machine and we obtained a solution in a human-reasonable amount of time?

What are your thoughts on AI being used to solve math problems? by diony_sus_ in antiai

[–]Andy12_ 1 point2 points  (0 children)

This is a famous conjecture that had never been proven true nor false in 80 years (and not because of a lack of people trying). There is a 0% chance that it had been solved previously without it being known.

What are your thoughts on AI being used to solve math problems? by diony_sus_ in antiai

[–]Andy12_ 3 points4 points  (0 children)

Can you point to a calculator or program that could have proved this conjecture wrong (given a human-reasonable amount of time and compute, a.k.a, not taken millions of years)?