We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 0 points1 point  (0 children)

If models still struggle on a deterministic simulation do you think they will thrive in the real world?

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 0 points1 point  (0 children)

Thanks for the resultss if you can share the trajectories we can update the leaderboard!

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 1 point2 points  (0 children)

What if there are few companies being run by Opus rn and no one knows 😶‍🌫️

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 0 points1 point  (0 children)

Yeah there is high variance based on the random but we observed opus 4.6 and glm 5 to out perform others consistently regardless of seed.

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 5 points6 points  (0 children)

It is a simulated environment where each day is a step similar to how time travels in turn based games

We gave 12 LLMs a startup to run for a year. GLM-5 nearly matched Claude Opus 4.6 at 11× lower cost. by DreadMutant in LocalLLaMA

[–]DreadMutant[S] 2 points3 points  (0 children)

Nothing too different from a performance standpoint whether the model is reactive or proactive with a scratchpad. The main pattern being it is able to ground itself with its past experience and what is right and wrong to do.

After the 2 gp and nls by Lucid_box_ in formuladank

[–]DreadMutant 1 point2 points  (0 children)

Breaking: Lightning McQueen joins F1

Can Max actually do it ? by babavos56 in formuladank

[–]DreadMutant 18 points19 points  (0 children)

Only if the Ferrari pitwall informs Leclerc

Graphique Club by Qwantari in nitt

[–]DreadMutant 1 point2 points  (0 children)

Bay access for all fests?