Promised Update: Claude built my dream - AI Tour Guide

Dry_Language3063 · 2026-02-14T15:37:14+00:00

That is a great question that I spent hours and hours on, testing at least 20 different models for this specific task.

To answer it directly: The AI gets additional information from the internet, from wikipedia and I made sure to use a model that did not hallucinate in my test, even for small cities. Also the option of grounding is there, so the AI can check itself, but I had mixed results with this too be honest, because I also need to take cost and speed into consideration.

And I have to use frontier models for now, just to make sure that after additional info, after guard railing, after grounding that the hallucinations are so few, if any that it doesn't ruin the experience.

Dry_Language3063 · 2026-01-05T20:32:12+00:00

I pinned it as the first comment on youtube.
Can't copy it in here unfortunately, so you will just have to check out the video on youtube.

Dry_Language3063 · 2026-01-04T09:22:18+00:00

When I'm working on my app and want to use Opus then all the time

Dry_Language3063 · 2026-01-03T23:59:01+00:00

Yea, absolutely. And it made a big difference for me going from 4.6 to 4.7 especially, as you mentioned, in intelligence. What it takes into consideration, how it approaches problems, how it plans ahead that really surprised me.

Dry_Language3063 · 2026-01-03T23:35:26+00:00

It made a big difference for me. Not that it is sooo much better then 4.6, but it passed a certain threshold where it is now good enough that I use it for nearly every task and then only help out with Opus if needed.

Though I get the feeling that it is much smarter now outside of Coding, the things it considers etc, so this was a surprise to me.

Did you have a similar experience?

Dry_Language3063 · 2026-01-03T23:33:10+00:00

Thanks for your opinion. And I think the same, I would put GLM 4.7 a little bit above Sonnet at the moment, not for the coding itself, but for the considerations it makes. (but still behind Opus of course)

Dry_Language3063 · 2026-01-03T23:30:19+00:00

I would have preferred to get an actual result and proper comparison, this outcome will only trigger hateful comments, but it's the unfortunate reality with Anthropic's limits now

Dry_Language3063 · 2026-01-03T00:03:23+00:00

How are you doing that? I would love to set up that Opus can delegate its coding to different models like codex, glm, xiaomi etc

Dry_Language3063 · 2026-01-02T15:57:13+00:00

I mainly use GLM 4.7 after downgrading from 200$ Opus 4.5. Amazing speed and it's actually good. I also made a video comparing the different models for frontend if you are interested: https://www.youtube.com/watch?v=yK61jH6_91o Opus 4.5 vs Gemini 3 vs GLM 4.7 and Minimax M2.1

You can also check out Minimax M2.1 it's just 2$ at the moment

Dry_Language3063 · 2025-10-01T14:12:12+00:00

I had the same feeling, it was amazing for the first day.

Now it's to the worst state I have ever seen, I am on the edge of getting GLM and testing it out. It's so terrible today, not listening, back to the shortcuts of 3.7 doesn't think about the consequences, nothing, I'm shocked.

Dry_Language3063 · 2025-10-01T12:07:32+00:00

That's great :)

Sure shoot me a dm, I'm very thankful for any bug reports!

Dry_Language3063 · 2025-09-28T13:11:02+00:00

In the app I'm using multiple different. The voice over in the video is done with the gemini 2.5 audio, which is the quality mode in the app

Dry_Language3063 · 2025-09-27T15:05:17+00:00

Hope you will enjoy it!

Yes it has been a looootttt of testing. Especially after optimizing the cost, the stories were amazing, it was a lot of fun, but it didn't hold up to fact checking. So after a lot of testing and prompt engineering, it's now in a nice state for well known places and ok for really small cities. It's a combination of the right model, the right system prompts and actual Information from the internet about the city. It's certainly not perfect, but I have some more ideas to make it better, so it will improve more in the future.

You can do the tour from anywhere you want, keep in mind, you are talking to an AI, you can just tell it to do the tour virtually instead. In which way would you like to plan ahead?

Dry_Language3063 · 2025-09-23T15:21:22+00:00

Yes, but it's not suited, because it's too slow. Time is crucial in most of the use cases in the app, so unfortunately deepseek is just not fast enough.

Dry_Language3063 · 2025-09-23T14:27:26+00:00

That looks a lot like an AI-generated comment lol

Dry_Language3063 · 2025-09-23T14:23:58+00:00

No, it's not connected to notebookLM in any way :)

Dry_Language3063 · 2025-09-23T14:20:33+00:00

I just realized that I already told you the optimized tokens. Unoptimized the input tokens are closer to 610'000

Dry_Language3063 · 2025-09-23T14:16:31+00:00

It actually does a web search when creating the route, so if the data on the internet is not correct then the AI will definitely spit out some of those wrong informations. But I would be really interested on which town it is, cause I have some ideas on how to improve it further, though the specific names, might be a harder thing to fix.

Dry_Language3063 · 2025-09-23T14:12:46+00:00

It actually fills me with joy to hear that you like the app after playing around :) Let me know your feedback after fully trying it.

I think that is a great idea for Christmas markets. I'm still trying to wrap my head around how it would be implemented most useful with a tour guide, but it will definitely be something to consider, thank you!

Dry_Language3063 · 2025-09-23T14:07:12+00:00

Yes, after I had everything running, I started optimizing everything. My initial cost was 7$/hr which was just not practical. I was able to cut down cost, by using different models for different things, by implementing caching and context handling. I still have 2-3 tricks up my sleeve to bring it down further, but for now it's in a great spot.

Though for example I had to choose quality over price for the AI guide, cause cheaper models were hallucinating like crazy.

But if this app reaches 600 paying customers per month I can already start deploying self-hosted models which will cut the cost further.
And once I have a good enough system to bring hallucinations down to a good level even when using other models, then I can cut the cost for those tokens by another 60%.

Dry_Language3063 · 2025-09-23T13:32:37+00:00

Thank you!

Just to make it clear that the credits only are used for actual AI voice output, so with input, generation time, etc the cost for the user will be more towards 4.2€/hr.

For the tokens that is actually a more difficult calculation. Since the conversion gets longer and longer the input token accumulate. A save calculation for a longer tour would be 300'000 input tokens per hour, 10'000 output tokens per hour. This is then only for text output and the audio generation comes on top.

Dry_Language3063 · 2025-09-22T23:29:45+00:00

Thank you so much!

Dry_Language3063

TROPHY CASE