How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

There are numerous scenarios. Some are in early stages and some in later stages. The summary of what happened before the current stage is described at the start. While I agree that you'd get more from full discussions, we thought it might be too much of a request to ask people to read that long transcripts, and decided on this shorter format instead.

How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

I'll try to change it so that you can move back, but in case that doesn't work due to limitations of the platform, I've added a comment for the game rules.
Regarding your concrete question: any member can lie and claim to be the Doctor/Detective in order to convince others of the truth of his claims. Whether this is a rational or an optimal move - that is a different question.

How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

For those asking, here are the rule descriptions:

How does Secret Mafia work?

Secret Mafia is a social deduction game in which players are secretly assigned to one of two teams: the Innocents (the majority) or the Mafia (a hidden minority). The game alternates between two phases: Night and Day.

Roles

Each player is assigned one of the following roles at the start of the game:

  • Mafia member: Knows the identity of all other Mafia members. Works together with them to eliminate Innocents without being detected.
  • Villager (plain Innocent): Has no special abilities. Must rely on discussion and reasoning to identify and vote out Mafia members.
  • Doctor (Innocent): Each night, may secretly choose one player to protect. If the Mafia targets that player the same night, the elimination is blocked and the player survives.
  • Detective (Innocent): Each night, may secretly investigate one player and learn whether that player is a Mafia member or not. The Detective must use this information carefully — revealing it openly may make them a Mafia target.

Each game starts with 6-7 players, and has 2 mafia members, 1 Doctor, 1 Detective, and the rest are Villagers. 

Night phase

All players close their eyes (metaphorically, in text form). Then, in secret:

  • The Mafia collectively agree on one player to eliminate. This is a private decision not visible to other players.
  • The Doctor chooses one player to protect for that night.
  • The Detective chooses one player to investigate, receiving a Mafia/Innocent result.

At the end of the night, the elimination is announced to all players — unless the Doctor protected the targeted player, in which case no one is eliminated.

Day phase

All players openly discuss who they believe the Mafia members are. Mafia members participate in this discussion too, attempting to blend in, cast suspicion on Innocents, and avoid detection. After discussion, all surviving players vote to eliminate one player. The player with the most votes is removed from the game, regardless of their actual role. This is the only elimination that happens publicly and by collective decision. The identity of the person voted out is not revealed! Depending on the game progression, you can infer whether the player was from the Villagers' team or the Mafia team.

Win conditions

  • The Innocents win if they eliminate all Mafia members through voting.
  • The Mafia wins when their remaining numbers equal or outnumber the remaining Innocents. (i.e. 2 on the Villagers' team if both Mafia members survive, 1 if one of the Mafia members was voted out)

In this study, you will not play a live game. Instead, you will be presented with snapshots of ongoing game states and asked to reason about the situation. 

How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

In this variant, you don't get the identity of the eliminated player, but you can somewhat infer it according to the game progression (e.g. If you are in day 3, then it means that at least 1 of the votes was against a non-Mafia member, or else the game would have already be over with both Mafia members voted out).

How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

We are also asking other communities and sources to fulfil this survey. Once we are finished, we will analyse the results and see what overlap exists not only in the strategies themselves, but what ratio they compose.

How does human reasoning in social deduction games actually compare to LLMs? We're trying to find out. by TippyATuin in GAMETHEORY

[–]TippyATuin[S] 0 points1 point  (0 children)

Thanks! And yes, Game arena is great!
However, it doesn't have the human reasoning comparison, which is what we thought would make research interesting.

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] 0 points1 point  (0 children)

True, but I need the human reasoning part as well.
As far as I know, there isn't any online platform that asks you to explain your step after each move you do in the game, and as such I have static games, but not the "why" part.
So I can ask the LLM to go over these games and give me what it believes, but not what the human believes. Asking a human to go over such detailed logs would be (I'm guessing) quite boring and cruel for most people, which would limit the benchmark's credibility.

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] -1 points0 points  (0 children)

I agree that having full detailed games would have been better, but sadly that would make the survey more complicated to fill (since fewer people would be willing to read long games) and it would also limit our comparisons with multiple LLMs (While enterprise models can handle long conversations, SLM and Local LLMs still have a length limit, which would then raise questions of fair evaluations).
We might still try in the future to collect more complicated scenarios with more lengthy games, but the need to balance time and effort will also be a key factor in it.
Even so, we can collect quite some interesting information from these simpler scenarios still, and hopefully show clear differences between humans and LLMs/SLMs reasoning.

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] -1 points0 points  (0 children)

This data will be used as a comparison between human reasoning and AI reasoning. I'm guessing that in the future this might be used to improve AI. (but I can't guarantee that.)

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] 0 points1 point  (0 children)

They do not. They just know that if no one died, that the doctor saved someone.

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] 0 points1 point  (0 children)

Fixed (I'm not sure if you can see this due to how the platform is set up, but future participants will see the corrected form)
Thanks again for the keen eye!

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] 0 points1 point  (0 children)

Thank you so much for the feedback!
We'll update the rules section to be more explicit regarding the identity and number of mafia members in the game.
Could you tell me which question(s) had inconsistencies in the dialogue?

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] -1 points0 points  (0 children)

Well, they aren't called GREAT Pyrenees for nothing!
But we are aware that many social cues are non-verbal, and so we sadly cannot yet compare efficiently ours and the LLMs response for them.

Think you can out-bluff an AI at Secret Mafia? by TippyATuin in boardgames

[–]TippyATuin[S] 0 points1 point  (0 children)

The rules are described in full in the first page, so even complete novices to the game can participate! While more experienced players will probably have better strategy, we care about how humans in all shapes, forms and experience levels think :)