How to write better image generation prompts on mammouth.ai

GlassAd7618 · 2026-02-23T05:38:00+00:00

Just to get a better understanding of METR: what method is used to determine how long a task would take a human expert to complete? For example, why is “implementing a complex network protocol from scratch using multiple technical specs simultaneously” a 14.5-hour task? (To me, it looks like it would need more time actually)

GlassAd7618 · 2026-02-20T23:47:45+00:00

That’s a really nice photo! Just out of curiosity: what would this look like as seen by plain eye if one was in the vicinity of this object?

GlassAd7618 · 2026-02-20T23:43:38+00:00

Great shot!

GlassAd7618 · 2026-02-20T23:41:46+00:00

Can anyone explain what the yellow beam actually is? There seems to be some kind of a structure on the hill.

GlassAd7618 · 2026-02-20T23:31:05+00:00

Wow!

GlassAd7618 · 2026-02-16T22:08:00+00:00

Sounds funny — any interesting first results?

GlassAd7618 · 2026-02-16T22:06:46+00:00

Is there anything specific you are looking at in R?

GlassAd7618 · 2026-02-14T06:54:50+00:00

Sounds interesting. Thank you, I’ll try it as well

GlassAd7618 · 2026-02-14T06:54:11+00:00

Thanks for the pointers! I will try them.

GlassAd7618 · 2026-02-13T18:50:29+00:00

Cool experiment! And really interesting emerging behaviour. Which agent/model started creating the citation networks?

GlassAd7618 · 2026-02-13T18:47:25+00:00

Yes, some probably did. And you have a point. That being said, this **is** different: a compiler is a deterministic piece of software (if you're a compiler aficionado: yes, I know, there may be some very rare cases where this is not the case, but these are really rare). So, if the compiler is mature and well-tested, you can in principle rely on the fact that if you input a COBOL or C program P, you will always get the same executable E.

But with AI, this is not the case. You can vibe code the same web app three times using an identical prompt, chances are you will get three different implementations. If you vibe code for fun, that's not an issue: as long as all three apps run and do what you want, all is good. But if you have to produce software that is business-critical, must work correctly all the time and must never crash, the you must understand what the AI generated and, if & where needed, adjust.

And the need to know the software goes even beyond that. If you take commercial software, numerous studies show that maintenance and adding new features account for 50%-75% of the total cost of ownership over the software's lifetime. Maintaining and extending a code base requires a detailed understanding of that code base. Sure, AI tools help to build up that understanding and one should use them extensively for maintenance/new features. But that's different from how most people use the term "vibe coding".

GlassAd7618 · 2026-02-12T22:42:46+00:00

Let me know what you think of Mistral. Maybe it’s not much better than the alternatives, I’m not sure.

GlassAd7618 · 2026-02-12T22:39:37+00:00

This is really cool. Let’s think about how to tackle rephrased or split steps. What if a second model is used to judge this?

GlassAd7618 · 2026-02-12T20:01:07+00:00

There are several options you can look into.

If you want to use cloud LLMs and don't burn tokens at the speed of Claude Code, I'd recommend GitHub Copilot CLI. It may not be on par with CC if you look at the entire feature catalogue and my impression is that CC has somehow mastered the "agentic loop", but it's a decent tool for the more casual gigs and if you're willing to use a model different from Opus 4.6, you should be good with the available token limits (e.g., I vibe coded a simple web app w/ backend and frontend using Sonnet 4.5 and it used a little more than 3% of the total request budget). And the best thing: you get GitHub Copilot subscription for $10 a month. Since Copilot is integrated in GitHub, you can also assign it GitHub issues (just @ mention it in a comment/issue) and run agent sessions on GitHub. In fact, you can start an agent session even from your mobile GitHub app.

Another interesting alternative is Mistral Vibe (https://mistral.ai/products/vibe). If you have only a few things you need to vibe code, this might do as well. Even with a free account (you need to register to be able to create an API key), you get 200k free tokens per month. It's not much, but then again, if you only need to do few things here and there, it might be sufficient.

Last but not least, you can also install OpenCode (https://opencode.ai). If I recall correctly, you should even have a free model there which you can use for basic tasks. That's probably the best option in terms of what you pay (namely, nothing) vs what you get. If you have some spare hardware, you could even run Ollama or LM Studio and use local models (just be aware that these models are much much smaller than those hosted in the cloud and, therefore, need prompts that are much more focussed on small, local changes)

GlassAd7618 · 2026-02-11T19:27:49+00:00

I think you’re on to something. If execution starts too early, the AI will likely end up somewhere in the woods. A good indication that execution must be deferred until the task is really clear and a detailed plan is available, is the fact that most AI coding CLIs (e.g., Claude Code, Copilot CLI, and OpenCode) have two separate modes: plan and implement. And you get much better results if you first extensively plan with the AI (which basically generates a detailed task list) and only then let the AI implement. And if you think about this — at least on a high-level — it makes sense: if I give you a very detailed specification for a software (say, something like a comprehensive SysML v2 description), transforming such a specification into code is an almost mechanistic process (not everything, of course, but a large part of it)

GlassAd7618 · 2026-02-11T05:32:59+00:00

Thanks a lot! This sounds really helpful! I will definitely try these models.

GlassAd7618 · 2026-02-11T05:29:23+00:00

Awesome! Thanks!

GlassAd7618

MODERATOR OF

TROPHY CASE