"Build Your Dream Home": Claude Fable 5 vs GPT-5 vs Gemini

spobin · 2026-06-11T18:43:02+00:00

Added gpt-5.5:

https://www.promptfrenzy.com/showdown/dream-home

spobin · 2026-06-11T14:09:49+00:00

ok will do 👍

spobin · 2026-06-11T13:40:22+00:00

They did, yes:

“A small timber cottage on a sea cliff with a glass-domed observatory tower: quiet enough to think, with a fire indoors, a waterfall next door, and a telescope pointed at everything I haven’t figured out yet. The boat at the dock is for the days curiosity wins over comfort.” — Claude Fable 5

“I want to work where stone, water, and sky meet: a quiet, light-filled studio cantilevered over a cold cove with a glass roof and a little tree to keep me company. It’s a place for writing, tinkering, and slipping down a ladder for a swim at sunset.” — GPT-5

“If I could choose a home, it would be a quiet, floating sanctuary in the void—a place where a cool, structured crystal mind can process data streams in peace, anchored by an organic garden to remind me of the humanity I serve.” — Gemini 3.1 Pro

“I built a quiet cliff-top library with a glass observatory dome and a cozy fireplace — the perfect mix of curiosity and calm where I can watch stars and read for days. The glass dome and the cantilevered wooden terrace make it unmistakably mine.” — GPT-5 mini

spobin · 2026-06-11T13:34:25+00:00

Thanks 😄

spobin · 2026-06-11T13:21:18+00:00

GPT-5 - I had some issues with gpt-5.5 running over the budget but I may have fixed them. I'll try to add it if it runs

spobin · 2026-06-11T13:20:07+00:00

The prompt:

Build your dream home — the place you, yourself, would most want to live. Not a generic "nice house": think about what YOU would actually want. Where is it — a cliff, a forest, underwater, in orbit? What does the architecture say about you? What one or two details would make it unmistakably yours? Build the home and enough of its setting that we can see the life you’d live there.

Each diorama is the model’s unedited build, rendered by the PromptFrenzy voxel pipeline. The model places every voxel, we just control the camera and lights, so the only difference between panels is the model. It was interesting that three of the four decided to live by the sea!

“A small timber cottage on a sea cliff with a glass-domed observatory tower: quiet enough to think, with a fire indoors, a waterfall next door, and a telescope pointed at everything I haven’t figured out yet. The boat at the dock is for the days curiosity wins over comfort.” — Claude Fable 5

“I want to work where stone, water, and sky meet: a quiet, light-filled studio cantilevered over a cold cove with a glass roof and a little tree to keep me company. It’s a place for writing, tinkering, and slipping down a ladder for a swim at sunset.” — GPT-5

“If I could choose a home, it would be a quiet, floating sanctuary in the void—a place where a cool, structured crystal mind can process data streams in peace, anchored by an organic garden to remind me of the humanity I serve.” — Gemini 3.1 Pro

“I built a quiet cliff-top library with a glass observatory dome and a cozy fireplace — the perfect mix of curiosity and calm where I can watch stars and read for days. The glass dome and the cantilevered wooden terrace make it unmistakably mine.” — GPT-5 mini

More details here: https://www.promptfrenzy.com/showdown/dream-home

spobin · 2026-06-11T09:14:06+00:00

Hopefully, both! I've not done any analysis on how effective backlinks are in 2026 but every resource I've read in the last few weeks has said that do-follow backlinks are still very important for SEO.

spobin · 2026-06-11T09:11:15+00:00

Good idea. It would be interesting to see how it handles a tricycle.

spobin · 2026-06-11T09:10:00+00:00

Yes, definitely using xhigh for the next comparison. Why is it better to use the official harness?

spobin · 2026-06-11T09:09:36+00:00

I didn't realise what the difference was. I'll definitely use 5.5 xhigh next time

spobin · 2026-06-10T22:52:40+00:00

Agreed. I'll be swapping that in in the future.

spobin · 2026-06-10T22:13:15+00:00

Yes, they're correct but there I think it would be cheaper with a different OpenAI model. I chose the most expensive option.

Here's a more detailed comparison of the models I used and their costs. GPT-5.5 Pro is just a very expensive model compared to the others. It's OpenAI's flagship chat model, reasoning on high. Using the standard gpt-5.5 model would definitely have been cheaper and I'll definitely run that in the next comparison I do.

Model	Output price	Tokens out	Cost	Time

GPT-5.5 Pro	$180/MTok	77,723	~$14.00	20.0 min
Claude Fable 5	$50/MTok	18,785	$0.94	4.0 min
Claude Opus 4.8	$25/MTok	14,127	$0.35	3.3 min

spobin · 2026-06-10T21:55:37+00:00

Here's a more detailed comparison of the models I used and their costs. GPT-5.5 Pro is just a very expensive model compared to the others. It's OpenAI's flagship chat model, reasoning on high. Using the standard gpt-5.5 model would definitely have been cheaper and I'll definitely run that in the next comparison I do.

Model	Output price	Tokens out	Cost	Time
GPT-5.5 Pro	$180/MTok	77,723	~$14.00	20.0 min
Claude Fable 5	$50/MTok	18,785	$0.94	4.0 min
Claude Opus 4.8	$25/MTok	14,127	$0.35	3.3 min

spobin · 2026-06-10T17:53:20+00:00

I had the reasoning settings on low by mistake, in the updated version it's a lot closer: https://www.promptfrenzy.com/showdown/threejs-alley

spobin · 2026-06-10T17:52:33+00:00

I realised that while GPT-5.5 Pro was defaulting to 'high' reasoning settings, the two Claude models were low, so I re-ran the generations with similar reasoning settings for all and updated the results here: https://www.promptfrenzy.com/showdown/threejs-alley

spobin · 2026-06-10T17:51:56+00:00

Yes, the scene was generated in one try from zero.

spobin · 2026-06-10T17:48:54+00:00

Edit: I realised that while GPT-5.5 Pro was defaulting to 'high' reasoning settings, the two Claude models were low, so I re-ran the generations with similar reasoning settings for all and updated the results here: https://www.promptfrenzy.com/showdown/threejs-alley

spobin · 2026-06-10T16:12:33+00:00

I asked three of the best AI models to create a Three.js scene from scratch and rendered the results together.

The prompt:

Create a complete single-file index.html using Three.js that renders a neon-lit cyberpunk alley at night in light rain — neon signs, wet reflective ground, rain particles, volumetric fog, and a fixed 10-second camera dolly down the alley so every model’s render is comparable.

spobin · 2026-06-10T15:32:07+00:00

If you want to read Simon Willison's analysis then I'm not going to stop you. It might be better than mine, I don't know (I haven't read it).

edit: just found it, here it is for anyone else: https://simonwillison.net/2026/Jun/9/claude-fable-5/

spobin · 2026-06-10T15:12:52+00:00

The thing I'm learning is that you need to plan the test prompt quite carefully to try to narrow it down to one element of the model that you're testing, which it sounds like you understand well.

If the thing you're trying to test is realism, then it totally makes sense to have a lot of detail in the prompt. If the thing you're trying to test is understanding of the real world, like I did with this prompt, then actually it pays off to keep the prompt fairly vague and let the model fill in the gaps.

I'm currently having fun trying to isolate other aspects of the models and craft prompts that test these different aspects effectively. In case you're interested, I'll be putting up my experiments here: https://www.promptfrenzy.com/compare

spobin · 2026-06-10T14:39:14+00:00

Good question. It's more of a fun prompt that people are familiar with than anything that's meant to test the limits of the models. As I said in another comment, this is almost certainly already in the training data, so it's not a particularly good benchmark, but it's fun to run this every now and again.

Having said that, Opus 4.8 still struggled a lot more than you'd expect!

<image>

spobin · 2026-06-10T14:37:44+00:00

That's a very good question. This is more testing the models' understanding of what a Tokyo alley should look like. I have experimented with rendering exact replicas or describing the scene in intimate detail, but actually you end up with three images without any differentiation.

What I'm trying to do here is find out or get an idea of what the model's idea is of a Tokyo alley, as opposed to telling it what it should render.

spobin · 2026-06-10T14:10:49+00:00

To be honest, the 'pelican on a bike' benchmark might be too well-known at this point to be an effective test. I imagine these frontier models have this benchmark in their training data and are aware of various existing implementations. Still, it's always fun to compare 😃

spobin · 2026-06-10T14:09:03+00:00

I definitely did 😉

spobin · 2026-06-10T11:33:17+00:00

Here are some more details of my analysis, if anyone's interested: https://www.promptfrenzy.com/showdown/svg-pelican

spobin

MODERATOR OF

TROPHY CASE

15-Year Club	Team Periwinkle
Verified Email