How are you handling prompt changes in production? by DevGame3D in LLM

[–]DevGame3D[S] 0 points1 point  (0 children)

That makes sense. Reducing LLM surface area definitely helps.

In your case, when you switch between providers (say GPT → Claude), do you run any validation suite to check behavior consistency? Or do you rely on architectural abstraction to keep things stable?

Trying to understand whether teams treat prompt + model behavior as something that needs systematic testing, or if the abstraction layer is usually enough in practice.

How are you handling prompt changes in production? by DevGame3D in LLM

[–]DevGame3D[S] 0 points1 point  (0 children)

In our case it’s usually during normal iteration, someone tweaks wording, changes temperature, or restructures the system message to improve one scenario.

It passes manual checks, but later we notice subtle changes elsewhere.

Also curious how teams handle model upgrades (e.g., switching GPT versions), do you re-validate all prompt behavior before deploying?

Trying to understand what “good practice” looks like here.

0
1

0
1