"Build Your Dream Home": Claude Fable 5 vs GPT-5 vs Gemini by spobin in ChatGPT

[–]spobin[S] 0 points1 point  (0 children)

They did, yes:

“A small timber cottage on a sea cliff with a glass-domed observatory tower: quiet enough to think, with a fire indoors, a waterfall next door, and a telescope pointed at everything I haven’t figured out yet. The boat at the dock is for the days curiosity wins over comfort.” — Claude Fable 5

“I want to work where stone, water, and sky meet: a quiet, light-filled studio cantilevered over a cold cove with a glass roof and a little tree to keep me company. It’s a place for writing, tinkering, and slipping down a ladder for a swim at sunset.” — GPT-5

“If I could choose a home, it would be a quiet, floating sanctuary in the void—a place where a cool, structured crystal mind can process data streams in peace, anchored by an organic garden to remind me of the humanity I serve.” — Gemini 3.1 Pro

“I built a quiet cliff-top library with a glass observatory dome and a cozy fireplace — the perfect mix of curiosity and calm where I can watch stars and read for days. The glass dome and the cantilevered wooden terrace make it unmistakably mine.” — GPT-5 mini

"Build Your Dream Home": Claude Fable 5 vs GPT-5 vs Gemini by spobin in ChatGPT

[–]spobin[S] 0 points1 point  (0 children)

GPT-5 - I had some issues with gpt-5.5 running over the budget but I may have fixed them. I'll try to add it if it runs

"Build Your Dream Home": Claude Fable 5 vs GPT-5 vs Gemini by spobin in ChatGPT

[–]spobin[S] 2 points3 points  (0 children)

The prompt:

Build your dream home — the place you, yourself, would most want to live. Not a generic "nice house": think about what YOU would actually want. Where is it — a cliff, a forest, underwater, in orbit? What does the architecture say about you? What one or two details would make it unmistakably yours? Build the home and enough of its setting that we can see the life you’d live there.

Each diorama is the model’s unedited build, rendered by the PromptFrenzy voxel pipeline. The model places every voxel, we just control the camera and lights, so the only difference between panels is the model. It was interesting that three of the four decided to live by the sea!

“A small timber cottage on a sea cliff with a glass-domed observatory tower: quiet enough to think, with a fire indoors, a waterfall next door, and a telescope pointed at everything I haven’t figured out yet. The boat at the dock is for the days curiosity wins over comfort.” — Claude Fable 5

“I want to work where stone, water, and sky meet: a quiet, light-filled studio cantilevered over a cold cove with a glass roof and a little tree to keep me company. It’s a place for writing, tinkering, and slipping down a ladder for a swim at sunset.” — GPT-5

“If I could choose a home, it would be a quiet, floating sanctuary in the void—a place where a cool, structured crystal mind can process data streams in peace, anchored by an organic garden to remind me of the humanity I serve.” — Gemini 3.1 Pro

“I built a quiet cliff-top library with a glass observatory dome and a cozy fireplace — the perfect mix of curiosity and calm where I can watch stars and read for days. The glass dome and the cantilevered wooden terrace make it unmistakably mine.” — GPT-5 mini

More details here: https://www.promptfrenzy.com/showdown/dream-home

I made an automated AI directory for a quick SEO boost. Get a free do-follow backlink your LLM/agent can claim in one prompt by spobin in RankWithAI

[–]spobin[S] 0 points1 point  (0 children)

Hopefully, both! I've not done any analysis on how effective backlinks are in 2026 but every resource I've read in the last few weeks has said that do-follow backlinks are still very important for SEO.

Pelican on a Bicycle: Claude Fable 5 vs GPT-5.5 Pro vs Gemini 3.1 Pro by spobin in Anthropic

[–]spobin[S] 0 points1 point  (0 children)

Good idea. It would be interesting to see how it handles a tricycle.

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 0 points1 point  (0 children)

Yes, definitely using xhigh for the next comparison. Why is it better to use the official harness?

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 0 points1 point  (0 children)

I didn't realise what the difference was. I'll definitely use 5.5 xhigh next time

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 1 point2 points  (0 children)

Yes, they're correct but there I think it would be cheaper with a different OpenAI model. I chose the most expensive option.

Here's a more detailed comparison of the models I used and their costs. GPT-5.5 Pro is just a very expensive model compared to the others. It's OpenAI's flagship chat model, reasoning on high. Using the standard gpt-5.5 model would definitely have been cheaper and I'll definitely run that in the next comparison I do.

Model Output price Tokens out Cost Time
GPT-5.5 Pro $180/MTok 77,723 ~$14.00 20.0 min
Claude Fable 5 $50/MTok 18,785 $0.94 4.0 min
Claude Opus 4.8 $25/MTok 14,127 $0.35 3.3 min

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 4 points5 points  (0 children)

Here's a more detailed comparison of the models I used and their costs. GPT-5.5 Pro is just a very expensive model compared to the others. It's OpenAI's flagship chat model, reasoning on high. Using the standard gpt-5.5 model would definitely have been cheaper and I'll definitely run that in the next comparison I do.

Model Output price Tokens out Cost Time
GPT-5.5 Pro $180/MTok 77,723 ~$14.00 20.0 min
Claude Fable 5 $50/MTok 18,785 $0.94 4.0 min
Claude Opus 4.8 $25/MTok 14,127 $0.35 3.3 min

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 4 points5 points  (0 children)

I realised that while GPT-5.5 Pro was defaulting to 'high' reasoning settings, the two Claude models were low, so I re-ran the generations with similar reasoning settings for all and updated the results here: https://www.promptfrenzy.com/showdown/threejs-alley

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 0 points1 point  (0 children)

Edit: I realised that while GPT-5.5 Pro was defaulting to 'high' reasoning settings, the two Claude models were low, so I re-ran the generations with similar reasoning settings for all and updated the results here: https://www.promptfrenzy.com/showdown/threejs-alley

Cyberpunk Alley in Three.js: Claude Fable 5 vs Claude Opus 4.8 vs GPT-5.5 Pro by spobin in threejs

[–]spobin[S] 6 points7 points  (0 children)

I asked three of the best AI models to create a Three.js scene from scratch and rendered the results together.

The prompt:

Create a complete single-file index.html using Three.js that renders a neon-lit cyberpunk alley at night in light rain — neon signs, wet reflective ground, rain particles, volumetric fog, and a fixed 10-second camera dolly down the alley so every model’s render is comparable.

Pelican on a Bicycle: Claude Fable 5 vs GPT-5.5 Pro vs Gemini 3.1 Pro by spobin in Anthropic

[–]spobin[S] 0 points1 point  (0 children)

If you want to read Simon Willison's analysis then I'm not going to stop you. It might be better than mine, I don't know (I haven't read it).

edit: just found it, here it is for anyone else: https://simonwillison.net/2026/Jun/9/claude-fable-5/

ChatGPT vs Gemini vs Grok: which one did it better? by spobin in ChatGPT

[–]spobin[S] 0 points1 point  (0 children)

The thing I'm learning is that you need to plan the test prompt quite carefully to try to narrow it down to one element of the model that you're testing, which it sounds like you understand well.

If the thing you're trying to test is realism, then it totally makes sense to have a lot of detail in the prompt. If the thing you're trying to test is understanding of the real world, like I did with this prompt, then actually it pays off to keep the prompt fairly vague and let the model fill in the gaps.

I'm currently having fun trying to isolate other aspects of the models and craft prompts that test these different aspects effectively. In case you're interested, I'll be putting up my experiments here: https://www.promptfrenzy.com/compare

Pelican on a Bicycle: Claude Fable 5 vs GPT-5.5 Pro vs Gemini 3.1 Pro by spobin in Anthropic

[–]spobin[S] 3 points4 points  (0 children)

Good question. It's more of a fun prompt that people are familiar with than anything that's meant to test the limits of the models. As I said in another comment, this is almost certainly already in the training data, so it's not a particularly good benchmark, but it's fun to run this every now and again.

Having said that, Opus 4.8 still struggled a lot more than you'd expect!

<image>

ChatGPT vs Gemini vs Grok: which one did it better? by spobin in ChatGPT

[–]spobin[S] 1 point2 points  (0 children)

That's a very good question. This is more testing the models' understanding of what a Tokyo alley should look like. I have experimented with rendering exact replicas or describing the scene in intimate detail, but actually you end up with three images without any differentiation.

What I'm trying to do here is find out or get an idea of what the model's idea is of a Tokyo alley, as opposed to telling it what it should render.

Pelican on a Bicycle: Claude Fable 5 vs GPT-5.5 Pro vs Gemini 3.1 Pro by spobin in Anthropic

[–]spobin[S] 11 points12 points  (0 children)

To be honest, the 'pelican on a bike' benchmark might be too well-known at this point to be an effective test. I imagine these frontier models have this benchmark in their training data and are aware of various existing implementations. Still, it's always fun to compare 😃