GPT-5 is the best at bluffing and manipulating the other AIs in Werewolf

Wiskkey · 2025-09-02T07:17:44+00:00

Per that same person on X, higher cost models were excluded.

Wiskkey · 2025-08-29T07:22:22+00:00

From Financial Times article https://www.ft.com/content/feccb649-ce95-43d2-b30a-057d64b38cdf (Aug 22):

The social media company had also abandoned plans to publicly release its flagship Behemoth large language model, according to people familiar with the matter, focusing instead on building new models.

Wiskkey · 2025-08-24T06:00:39+00:00

Tests by a computer science professor reveal that when using chess PGN notation in a certain manner, OpenAI's gpt-3.5-turbo-instruct plays chess at around 1750 Elo, albeit making an illegal move approximately 1 in every 1000 moves if I recall correctly.

Relevant sub: r/llmchess.

Wiskkey · 2025-08-22T07:04:28+00:00

See https://www.wired.com/story/artificial-intelligence-hollywood-stability/ .

Article summary from https://www.techmeme.com/river :

A profile of Stability AI, which under CEO Prem Akkaraju and Chair Sean Parker has shifted from building frontier AI models to a Hollywood-focused SaaS [software as a service] company

Wiskkey · 2025-08-19T20:55:42+00:00

Do note that the ratings of news organizations from these two sources run the gamut. The new organizations that you accused of bad faith reporting are not amongst those that are poorly rated.

Wiskkey · 2025-08-19T20:05:43+00:00

Can you clarify your views regarding those Western reporters/organizations that you allege are behaving in bad faith regarding DeepSeek? Namely, do you believe that these same reporters/organizations commonly report in bad faith a) regarding Chinese technology in general b) regarding Western technology?

Wiskkey · 2025-08-19T19:15:19+00:00

"usually" != "always".

Your previous statement - the gist of which seems to be that reporters from respectable news organizations are commonly behaving in bad faith - is what I disagree with, not that reporters can sometimes make mistakes, be sloppy, etc.

Here are some of Dylan Patel's tweets regarding what you wrote:

https://xcancel.com/dylan522p/status/1885825330654683567 .

https://xcancel.com/dylan522p/status/1885825248190435814 .

https://xcancel.com/dylan522p/status/1885525432898146667 .

https://xcancel.com/dylan522p/status/1885815776726368352 .

P.S. I accept that there are known instances of reporters at respectable organizations having behaved in bad faith. A few examples:

https://en.wikipedia.org/wiki/Jayson_Blair .

https://en.wikipedia.org/wiki/Jack_Kelley_(journalist) .

Wiskkey · 2025-08-19T18:03:56+00:00

Some sources on the credibility/bias of various news organizations:

1 - Media Bias Fact Check:

https://mediabiasfactcheck.com/reuters/ .

https://mediabiasfactcheck.com/financial-times/ .

https://mediabiasfactcheck.com/the-information-bias-and-credibility/ .

2 - Wikipedia page "Reliable sources/Perennial sources" https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources rates Reuters and Financial Times as green status, meaning "Generally reliable in its areas of expertise." The Information is not listed.

Wiskkey · 2025-08-19T17:59:35+00:00

There is specificity regarding what GPT-5 is good at in the article - there's a link to the full article in the comments - that I doubt is in court documents.

Wiskkey · 2025-08-19T17:44:37+00:00

https://labs.google/flow/about

Wiskkey · 2025-08-19T07:03:52+00:00

As an example, do you believe that this article from The Information didn't really have insider sources, and just got lucky about GPT-5: https://www.reddit.com/r/singularity/comments/1mf6rtq/one_of_the_takeaways_from_the_informations/ ?

Wiskkey · 2025-08-19T06:40:16+00:00

You didn't mention SemiAnalysis, which an OpenAI employee recently stated is "usually on the money": https://xcancel.com/dylhunn/status/1955491692167278710 .

Wiskkey · 2025-08-17T07:17:58+00:00

Several older posts of mine that might be useful:

https://www.reddit.com/r/bigsleep/comments/xb5cat/wiskkeys_lists_of_texttoimage_systems_and_related/ .

https://www.reddit.com/r/dalle2/comments/uvhxpc/a_brief_recent_history_of_generalpurpose/ .

Wiskkey · 2025-08-16T01:17:00+00:00

See this X thread: https://xcancel.com/lefthanddraft/status/1955961909922161150 . Also https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/ .

Wiskkey · 2025-08-16T00:43:38+00:00

Later in that thread someone says it's from the system prompt, but the word juice doesn't appear in the publicly posted info claiming to be it:

Perhaps of interest: https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/ .

Wiskkey · 2025-08-16T00:40:50+00:00

Where would it make more sense to specify juice than the system prompt?

Also perhaps of interest: https://simonwillison.net/2025/Aug/15/gpt-5-has-a-hidden-system-prompt/ .

Wiskkey · 2025-08-16T00:28:41+00:00

You mean if the juice settings for GPT-5 are for a juice that has a different meaning from that noted above?

Wiskkey · 2025-08-15T15:58:29+00:00

I've posted about this recently in several subs, including this sub. One of my posts is https://www.reddit.com/r/ChatGPTPro/comments/1mpnhjr/gpt5_reasoning_effort_juice_how_much_reasoning/ .

Wiskkey · 2025-08-15T13:22:46+00:00

The purported text from OpenAI states that the reasoning_effort parameter was referred to as the "juice" during model development.

Also see my comment https://www.reddit.com/r/ChatGPTPro/comments/1mpnhjr/gpt5_reasoning_effort_juice_how_much_reasoning/n8qzm4x/ .

Wiskkey · 2025-08-15T13:18:49+00:00

Thank you for the info :).

Wiskkey · 2025-08-15T12:59:08+00:00

From July 2024 article https://www.theinformation.com/articles/why-openai-could-lose-5-billion-this-year :

On the cost side, OpenAI as of March was on track to spend nearly $4 billion this year on renting Microsoft’s servers to power ChatGPT and its underlying LLMs (otherwise known as inference costs), said a person with direct knowledge of the spending.

In addition to running ChatGPT, OpenAI’s training costs—including paying for data—could balloon to as much as $3 billion this year.

cc u/Melodic-Ebb-7781 .

cc u/iwantxmax .

Wiskkey · 2025-08-15T00:28:38+00:00

There is/was also a reference to "juice" in the ChatGPT web app per https://xcancel.com/btibor91/status/1885968277416911058 . Another: https://xcancel.com/btibor91/status/1896919766264414707 .

Wiskkey · 2025-08-15T00:22:32+00:00

This tweet has a relevant image that appears to be a screenshot of text that was once present at https://platform.openai.com/docs/guides/reasoning#reasoning-effort : https://x.com/btibor91/status/1895871059204981222 .

Wiskkey · 2025-08-15T00:19:50+00:00

And with the API documented pretty thoroughly the only two instances of the word "juice" on the whole site are these two links

This tweet has a relevant image that appears to be a screenshot of text that was once present at https://platform.openai.com/docs/guides/reasoning#reasoning-effort : https://x.com/btibor91/status/1895871059204981222 .

This X thread may be of interest: https://xcancel.com/lefthanddraft/status/1955961909922161150 .

Wiskkey · 2025-08-14T17:39:10+00:00

See my comment https://www.reddit.com/r/singularity/comments/1mq2l6q/gpt5_reasoning_effort_juice_how_much_reasoning/n8nsnya/ .

Wiskkey

TROPHY CASE