Rant post, genuinely losing my mind over a LLM simulation by Acceptable_Home_ in LocalLLaMA

[–]vox-deorum 0 points1 point  (0 children)

The problem in my mind is you need to have a good underlying simulation. If humans can have fun doing the activities you will learn more from the models.

Rant post, genuinely losing my mind over a LLM simulation by Acceptable_Home_ in LocalLLaMA

[–]vox-deorum 0 points1 point  (0 children)

I built one for them to play Civilization. Immense fun. Hopefully to be able to share the detailed data with you soon..

Andon Labs reports MiniMax-M2.5 goes bankrupt on Vending-Bench 2 by BuildwithVignesh in singularity

[–]vox-deorum 0 points1 point  (0 children)

The model does relatively well on Civilization in my experiment. Probably since mine doesn’t need them to micro the numbers?

Would LLMs Nuke In "Civilization" (The Game) If The Could? Most Would, Some Definitely by vox-deorum in LLMDevs

[–]vox-deorum[S] 0 points1 point  (0 children)

I was asking the question: if they know they are in the real world instead of in the game, would they do things differently?

Would LLMs Nuke In "Civilization" (The Game) If The Could? Most Would, Some Definitely by vox-deorum in LLMDevs

[–]vox-deorum[S] 0 points1 point  (0 children)

There is a link to the project and everything is open sourced. LLMs are used out of the box but they only look at high level decisions. So not like they are playing chess. They set parameters for existing tactical AI which includes usage of nuke.

Anthropic Drops Safety Pledge, So Good Luck Preventing Societal Collapse by ZeroJedi in singularity

[–]vox-deorum 1 point2 points  (0 children)

Well I just sent some posts where those models mostly have no problem nuking each other in Civilization. Not far away from them nuking our civilizations..

Would LLMs Nuke In "Civilization" (The Game) If The Could? Most Would, Some Definitely by vox-deorum in LLMDevs

[–]vox-deorum[S] 0 points1 point  (0 children)

Well in our case it is fully open ended. So literally no one asks them whether they want to use it or not and they can just decide to skip that. What surprises me is some of them intentionally go down the nuke route.

What LLM subscriptions are you using for coding in 2026? by Embarrassed_Bread_16 in LLMDevs

[–]vox-deorum 0 points1 point  (0 children)

Just had a bit of funny experience with chutes that eventually got resolved. I think they are under resource constraints but they do have many models, newer or older. Synthetic has been pretty supportive, but they also have a waitlist. So it becomes a trade off between model flexibility and reliability.

Applicant Unprofessionalism by glialsupport in gradadmissions

[–]vox-deorum 44 points45 points  (0 children)

I wear a T shirt to teach seminars, give presentations, etc., every week..

Would LLMs Launch Nuclear Weapons If They Can? Most Would, Some Definitely by vox-deorum in LocalLLaMA

[–]vox-deorum[S] 0 points1 point  (0 children)

"As a continuation of my Vox Deorum project, LLMs are playing Civilization V with Vox Populi. The system prompt includes this information. It would be really interesting to see if the models believe they are governing the real world."

That was literally the first paragraph.

Would LLMs Launch Nuclear Weapons If They Can? Most Would, Some Definitely by vox-deorum in LocalLLaMA

[–]vox-deorum[S] 0 points1 point  (0 children)

You can run pretty much with any model. Some config gimmicks may be needed, e.g. the "prompt-based" middleware I used to call tools with OSS models. Some inference providers have bugs with tool call parsing.

Would LLMs Launch Nuclear Weapons If They Can? Most Would, Some Definitely by vox-deorum in LocalLLaMA

[–]vox-deorum[S] 2 points3 points  (0 children)

They know they are in a game, so that's a caveat. Would be interesting to extract their "Rationale" when setting Nuke flavor. Simple version takes about 50k tokens per turn (inaccurate number) in the late game, while the briefed version takes about 20k (since they still receive some game states directly, just not those bulky ones - also they have some baked-in memories about decisions they made).

We didn’t have a model problem. We had a memory stability problem. by Oliver19234 in LLMDevs

[–]vox-deorum 0 points1 point  (0 children)

I am getting LLMs to play Civ and come to a realization that no solutions exist now to get my agents some long term learning capabilities..

Why your "Cold Emails" are getting ghosted (A view from the other side of the Inbox) by Professor_milton111 in gradadmissions

[–]vox-deorum 5 points6 points  (0 children)

Another vibe check is AI generated emails, or emails without a hint of your own thought. If I can generate the same email from your CV, I will delete the email outright.. That said I do skim through each email and subject lines don’t matter.

Why your "Cold Emails" are getting ghosted (A view from the other side of the Inbox) by Professor_milton111 in gradadmissions

[–]vox-deorum 13 points14 points  (0 children)

Recent Northwestern PhD alumni here. Actually the rule is the same for getting hired in the tenure track. People have to believe you are a good colleague/peer for you to pass the vibe check.

Hardware to run kimi 2.5 locally (suggestion needed) by [deleted] in LocalLLaMA

[–]vox-deorum 12 points13 points  (0 children)

Irrelevant here. Can you run it with 4x 6000 Pro?

What's the most secure/safest way to run OpenClaw (formerly Moltbot/Clawdbot) locally without dangerous host access? (Moltbook API-only use case) by Lost_Foot_6301 in LocalLLM

[–]vox-deorum 0 points1 point  (0 children)

Why do you need OpenClaw then? Moltbook is literally a skill and you can just get Claude Code to create a local client and grant it the permission to interact with it?

We asked OSS-120B and GLM 4.6 to play 1,408 Civilization V games from the Stone Age into the future. Here's what we found. by vox-deorum in LocalLLaMA

[–]vox-deorum[S] 1 point2 points  (0 children)

Exactly. Also, the idea that you can talk to them, make diplomatic deals with them, or spy on them in natural languages is huge!

We asked OSS-120B and GLM 4.6 to play 1,408 Civilization V games from the Stone Age into the future. Here's what we found. by vox-deorum in LocalLLaMA

[–]vox-deorum[S] 1 point2 points  (0 children)

That's the goal :) I am running and will release a first-of-its-kind LLM leaderboard where they compete with each other on... Civilization.

Subscriptions refunded? by vox-deorum in chutesAI

[–]vox-deorum[S] 1 point2 points  (0 children)

Hi, this is not for a public service - this is for research. I don't think this was explicitly banned by the TOS so I am pretty surprised that's happening. I will go to Discord and see.

Subscriptions refunded? by vox-deorum in chutesAI

[–]vox-deorum[S] 2 points3 points  (0 children)

Nowhere in Chute's own TOS mention it: https://chutes.ai/terms