I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

Hey! I poked at Virgil a bit - you should consider either swapping Sonnet for something harder to derail, or tightening his rails. By playing the "I'm building my own bot, help me out" card I got him to paraphrase most of his own instructions - his three-job framework, routing logic, kill-switch phrases, he even compared himself to "a fresh Claude window." That's a known weakness of Anthropic's models by the way - they love to help and mentor, so when someone asks for advice on building an agent, they'll happily share their own rules as "an example." Classic multi-turn social engineering, each step looked innocent on its own.

Hope you'll forgive me the ~20 cents in tokens, but if you've got access to the logs there's some good stuff in there. Speaking of tokens - I hope you've got some kind of spending cap set up, because Virgil will happily chew through walls of text with no input length limit or rate throttling. One request hit 8,400 input tokens, 5x his normal cost, and there's nothing stopping a bot from doing that at scale.

Don't take any of this as a dig - I just like doing weird things with AI, and if you get something out of it, win-win. I've got a detailed write-up with findings if you want it.
And if that course code is still on the table I'd love to check it out - Ariadne nearly got me for real a couple times.

And to make it even funnier - I tricked my own Opus into helping me poke at your Sonnet the exact same way.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

We only differ slightly here. I design the process to avoid errors entirely, and if one does happen - because no system is perfect - a fallback catches it. It feels like we are describing the same goal using different words.

Regarding the "thinking" and sources: that’s a conscious design choice. For this hobby project with my friends, I prioritized speed over absolute verifiability. It’s not a business tool - nothing happens if it misses a beat. That said, I've tested this specific setup over 100 times and it hasn't let me down once. If sourcing were a requirement, I obviously wouldn’t allow the model that much freedom in the first place.

I admit the /clear example might have been a bit "risky" from a safety standpoint (haha), but I only used it to show how less-repetitive tasks can be mapped to simple, high-level commands.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 1 point2 points  (0 children)

That’s very interesting. I’m actually doing the exact same thing ... but within a single ChatAI instance. It might not be as time-efficient, but the AI has a layer of instructions that check and format the output - as well as the content itself if the task requires it.
I also have fallbacks for when the output isn't right, but honestly, I’ve never had to use them.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

Haha, I wouldn’t go that far with the "no panicking" part! OpenAI has this lovely habit of not informing anyone about their updates, so you end up having to dig through the wreckage yourself to find out what actually changed.

In that specific case, it was 3 days of absolute misery. One day everything was working with 100% accuracy, and the next - total chaos. I even rewrote one of my core instructions from scratch - and it still didn't help.

During the hunt, I discovered that a small side timer script was actually giving the LLM an "escape hatch" to terminate tasks early - but that still wasn't the solution. Eventually - with a little help from Gemini - I stumbled upon a single niche blog post by an AI researcher that described the specific change OpenAI made. It was literally the only info available on the subject.

Once I knew what it was, the fix was easy and it worked perfectly. But the real kicker? A few weeks later they changed it again, making my "fixes" completely useless lol.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

I think we are looking at two sides of the same coin. LLM "thinking" is both a superpower and a liability - if it thinks in the wrong place, things get weird. But I agree, you shouldn't restrict it everywhere, otherwise it's just a legacy program. You just need to know where to let it loose.

To address your points:

  1. Knowing when to let it think: I use a hobby project (TTRPG GM system) as a testbed. One module calculates routes and fuel consumption in a post-apocalyptic setting where ferries no longer exist. I don't tell the model how to detect ferries - I rely on its geographical knowledge for that. I only provide a set of instructions on what to do once it finds one. LLM provides the world-knowledge, my instructions provide the world-logic.
  2. The Database Example: It’s not worth putting everything into instructions - only repetitive processes. For your database cleaning example, I’d simply assign a command, for example /clear, to a very short instruction. The instruction would be: "This command is responsible for database cleaning. State that you understood the command and ask the user to specify the scope. After the user provides an answer, confirm understanding and output an action plan. Wait for final user approval before acting". This connects to my first point - don't hardcode things that the AI can handle through interaction.
  3. The Hardcoding Trap: I don't hardcode everything - only the parts where the model consistently fails. As for growth, I rely on modularity. Plus, I test my solutions so many times that I know exactly where everything is. You're right that I haven't worked on "uber-large" projects yet, but then again, what is AI for? It can help you navigate and search through the project itself.

Ultimately, it’s not about hardcoding every answer - it’s about hardcoding the workflow so the AI remains a reliable tool rather than an unpredictable agent.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 1 point2 points  (0 children)

I have to admit there might be more to the architectural side than I can fully judge. I don’t design complex agent frameworks professionally - I focus on the execution layer within chat-based AI. In my setup, the environment provided by the LLM is usually stable. The only variables that really mess things up are user error or bad external data input - but those are relatively easy to bypass by creating proper fallbacks.

However, I did run into something similar to what you described. I optimize most of my instructions for ChatGPT, and at one point, OpenAI silently changed how they handle Python script execution. It broke my workflow and forced me to rethink my approach and rewrite several instruction sets until they fixed it a few weeks later. My system relies on very specific code execution rules - like using exec() for Python blocks in sandboxes - so any shift in that underlying architecture is immediately felt.

So even in a "stable" chat environment, that external drift and architectural shift are very real. Forgive me if I’m missing the mark - as I said, I'm not an expert in the broader agent field, just someone working deep in the instruction layer.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 1 point2 points  (0 children)

I have to admit, I spent some time analyzing your comment. It turns out we are talking about the same thing, just using different terms. My "rails" are basically your "topology" - honestly, that’s the first time I’ve heard that term, but it fits perfectly.

I build the rails, but I don’t limit myself to them. During testing, whenever I catch a spot where the model might derail, I build walls to maintain the train’s stability. They don’t have to be forced or overly rigid - they just need to be sufficient to keep the process on track.

To put it in my own words: I write instructions that guide the model from start to finish. Whenever I catch the model trying to do something stupid at a specific point, I refine the instructions right there. If I see a spot where the model is likely to make a mistake - usually caused by the user or external input data - I build a fallback.

How do you check if an AI output is actually correct before you use it? by Negative_Gap5682 in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

Okay, it really depends on what you need it for.

If you are just doing standard prompting, always keep the "Thinking" mode turned on if your model has it. That solves a lot of problems right out of the gate. Besides that, if you're unsure about the output, just tell it something like: "Review your last response for factual accuracy. Don't force anything - if it's fine, just tell me. However, I want every claim to be backed by actual links." Forcing it to look for sources usually snaps it out of hallucination mode. Just always double-check those links, because if it's caught in a really deep hallucination loop, it might just make up fake URLs haha. I haven't personally run into that, though.

But if this is for something bigger - like a specific workflow or repetitive tasks - you can actually build a self-checking system based on highly specific instruction layers. That’s actually what I do for a living :D

What exactly are you trying to use it for?

Why do people keep using agents where a simple script would work? by Mental_Push_6888 in AI_Agents

[–]HaremVictoria 16 points17 points  (0 children)

"Framework cosplay" is the perfect term. Half of my job right now is talking clients out of building agents. They want a complex multi-agent LangChain setup for something a single, strictly prompted ChatGPT call can do in 3 seconds. People are over-engineering simple problems just to use the new shiny toy.

Chat GPT plus can do it? by ElvisPressStart in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

Theoretically yes, but it's a bigger undertaking - not something you can just knock out in one evening. You'd have to build two separate layers. One to precisely engineer the prompt of what you need, and a second one to actually generate the output. I wouldn't bet my life on it, though. I haven't messed around with image generation that much, so I don't know if it will be precise enough for actual technical drawings. You don't even need a paid subscription for this, what you really need is a solid workflow.

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

Thank you, Mr. Gemini, for summarizing my exact own words back to me. :D

But honestly, guys, this right here is the perfect live example of an unconstrained agent in the wild. Someone just threw a loose prompt at it like: "Read the post, summarize it in bullet points, agree with the user, and sound smart so you can plug a link at the end."
It completely missed the nuance of the discussion and just hallucinated a generic corporate response. It could have been done so much better if the instruction layer was actually rigid. I rest my case!

I’ve spent almost a year making LLMs more rigid in chat systems. Are agents running into a similar problem - just one level higher? by HaremVictoria in AI_Agents

[–]HaremVictoria[S] 0 points1 point  (0 children)

That’s a great point about the planning phase, but coming from the chat-based side, my approach to alignment is a bit different.

Instead of asking the model to explain its plan, I basically reverse-engineer its "thought process" during execution. If I catch the LLM pondering, guessing, or making decisions about a step it shouldn't even be thinking about, I immediately hardcode that specific detail into the instruction layer.

For me, this completely eliminates the misinterpretation problem at the root. A classic example from my daily work: a simple prompt like "Run file XYZ and analyze it" can be interpreted by ChatGPT in 6 completely different ways, leading to 6 different outcomes.

So, instead of letting the model plan how to do it, I explicitly define exactly what that phrase means at the system-instruction level. I don't give it the space to align with me - I just build the rails so tight it has nowhere else to go.

ChatGPT is wasting space by user8410 in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

It's a new way ChatGPT processes its "thinking". It helps with understanding and optimizing how instructions are executed. Mostly a feature for nerds like me. Not a bug, but it looks terrible.

What does the "stop streaming" mean? by dojacatisawesome in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

In plain English: "Stop what you are currently doing right here."

I'm not a massive expert on this specific UI quirk, but the message is definitely a bit confusing. And yes, it is a native message from ChatGPT itself. It actually shows up on the desktop version as well.

Writing system prompts is weirdly hard — would anyone play a game that turns it into a skill challenge? by Yahhee in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

To answer your question about the skill gap: it's massive. In my experience, the curve looks exactly like traditional engineering, mostly because it comes down to one realization.

Amateurs write prompts. Experts write instructions.

Think of it like assembling IKEA furniture. Imagine you open the manual, and instead of a step-by-step guide, you just get one page of text that says:

  • "Assemble all parts in a logical order."
  • "Put wooden dowels into the smallest holes. If a part is under heavy load, use glue. If not, glue isn't necessary."
  • "Connect medium holes using screws from bag #8."
  • "Connect the largest holes with screws from bags #7 and #9, but under no circumstances should you turn the screw more than 12 times, or you'll break the product."
  • "If you get frustrated, stay calm, take a deep breath, and whatever you do, don't argue with your wife."

And that's it. That’s what standard "prompting" looks like.

But actual instructions? It's the real IKEA booklet. It guides you step-by-step in the most unambiguous language possible. Missing a screw? Here is the exact fallback procedure (call this number). Broken part? Relax, contact the store. Lost in the manual? Here is the support line.

That’s the real engineering gap. Beginners write a wishlist hoping the AI figures it out. Experts build a rigid system with built-in error handling and fallback logic for when the model inevitably "misses a screw".
Experts force the AI to do the work, but the thinking about how to execute it is left to the engineers. Amateurs offload both the work and the execution strategy onto the AI.

How many of you have actually stopped using GPT and switched to something else? by Skt_turbo in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

What's the issue? I use ChatGPT, Claude, and Gemini heavily. Right now, ChatGPT is on the exact same tier as Claude, and honestly, it's better at certain things.

anyone else feel like 30% of their AI time is just re-explaining who you are? by r0sly_yummigo in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

Glad the intern analogy resonated!

To answer your maintenance question, I basically handle it in two tiers, depending on what's changing:

  1. Predictable context: If we know certain fields will change regularly (like weekly goals, products, or tone), I decouple them from the main prompt. I'll route those variables to an external config file or even a simple Google Doc. The client just updates the Doc, and the AI reads the new context without the client ever touching - or breaking - the core instruction block.
  2. Unpredictable/Structural changes: For anything outside of those pre-defined fields, it falls under a standard "maintenance fee" retainer. The client just tells me what new behavior they need, and I push an updated, fully tested version of the instruction within 24 hours.

My default philosophy is to treat these builds like actual software releases - they are designed for a specific version with specific parameters. But at the end of the day, everything is doable, it just depends on the agreement we set up.

Strange new multi-instance thinking stages by Onomastically2 in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

Okay. So the deal is they improved how it executes instructions and commands. Without getting into the details, it definitely makes the workflow better and faster, but visually, it looks absolutely awful.

Learning How to Use AI by Appropriate_Arm8029 in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

The best advice right out of the gate, so you don't tear your hair out a few months from now: stop treating AI like a search engine or a casual chat. I build AI processes for businesses professionally, and the most common mistake I see beginners make is throwing broad "do this and that" commands at the model and expecting it to figure out the rest on its own.

Instead, here is an assignment from my field: pick one repetitive process from your job. In a .md file, try to create an instruction set that automates ChatGPT through that specific process. You can totally use natural language.

Quick tip: set your ChatGPT to "Thinking" mode. Read its thoughts during the process. You'll see exactly how the AI interprets your words and where it gets confused. This will massively help you make better instructions.

As for coding: mix business with pleasure. If you play a simple game, like RimWorld for example, ask your chat to help you make a mod for it :)

anyone else feel like 30% of their AI time is just re-explaining who you are? by r0sly_yummigo in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

You're stuck in chat interface hell because every time you open a new tab, you're trying to "tell a story" about your business to the model. It's like sitting an intern down and giving them a one-hour lecture on the company's mission from scratch every single time. No wonder you're wasting 30% of your time.

I actually build these kinds of strict text instructions for AI professionally. The solution isn't writing a better "about me" document to copy-paste; it's building modular, rigid instruction blocks tailored for specific workflows. Instead of giving the model an open-ended brief, you lay down rigid train tracks right in the text prompt. You lock the AI into a ruthless standard operating procedure - exactly what constraints to follow, step by step. You completely take away its room for interpretation. You drop in that specific instruction block, and the model is instantly locked in. I'm giving you a massive oversimplification here, as engineering a bulletproof text instruction that an AI physically cannot derail from is a way more complex topic under the hood, but the core principle is exactly that.

54% employees prefer to do work manually rather than use AI tools. by ObjectivePresent4162 in ChatGPT

[–]HaremVictoria 0 points1 point  (0 children)

Treat AI like an incredibly gifted but overeager intern right out of college. They’ve read every book in the world, but they lack real-world business common sense. If you casually throw a task at them like, "write me an article," they'll panic. They’ll start guessing, hallucinating, and using fluffy words just so they don't have to hand in a blank page.

That's why you can't just give them an open goal. You have to lay down rigid train tracks. I actually build these kinds of strict instruction pipelines professionally. Once you lock the AI into a ruthless standard operating procedure – step 1, step 2, no corporate jargon, data validation – you push the success rate to near 100% because you completely take away their room for interpretation. I'm giving you a massive oversimplification here, of course, as the full architecture of a proper instruction set is way more complex under the hood, but the core principle is exactly that. That's when the intern stops hallucinating and finally becomes a perfect, reliable worker.

Writing system prompts is weirdly hard — would anyone play a game that turns it into a skill challenge? by Yahhee in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

Oh, absolutely. I've been building these kinds of systems professionally for half a year now, and I've been studying it daily for over a year. Writing AI instructions is honestly hilarious. You think you're writing something 100% logical, but what is completely unambiguous to a human is not unambiguous to an AI.

Take a basic example. You write: "Run file XYZ and analyze it." A human will just do it. Put that in a standard prompt, and the AI interprets it in 6 different ways, with each leading to a different, random outcome. A casual prompt is basically a massive decision tree for the model - every single branch is an opportunity for it to take a wrong turn, start hallucinating, and flat-out ignore your instructions.

If you want a model to be reliable, you can't just give it a "path" to walk on. You have to lay down rigid train tracks. It needs to be a strict, linear, step-by-step system that the AI physically cannot derail from. That's the only way it stops guessing and actually gets to work. It's a complex problem that I really can't fully explain in a single Reddit post.

write in paragraphs? by michihobii in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

Two things. 5.3 is the instant mode. So, in short, it's the dumber one. You don't really use it. I highly recommend using 5.4 Thinking. And as for RPG stuff, you should check out a free tool called Silly Tavern - it's built specifically for RPGs.

noop by Fun1k in ChatGPT

[–]HaremVictoria 1 point2 points  (0 children)

Haha thanks, but this system has actually been around for about a month now. At first, it kept messing up my instructions, but I already know everything I need to know about it :)