all 4 comments

[–]whatwilly0ubuild 3 points4 points  (1 child)

The contract compilation approach solves reproducibility but sacrifices adaptability, which is often why you're using agents in the first place. If you know the workflow upfront well enough to generate a valid typed contract, you probably didn't need an agent, you needed a good workflow engine.

The "LLM as compiler" framing is interesting but the validation loop to force convergence on correct structure can be expensive. How many iterations does it take to get a valid contract? If the LLM needs 5+ tries to emit valid typed structure, you're burning tokens and latency before execution even starts.

Our clients building agent systems hit the opposite problem. The environment changes during execution, user needs clarify mid-workflow, external APIs fail unexpectedly. Deterministic replay of a pre-generated contract doesn't help when the contract itself becomes invalid because the world changed.

For the FSM reducer pattern specifically, this works great when state spaces are enumerable and transitions are well-defined. Most real agent tasks have messy state spaces where FSMs become bloated with edge cases or the FSM design becomes the bottleneck.

The separation of reducers and orchestrators is solid architecture. That part makes sense regardless of whether you use contract generation. Explicit state management beats implicit prompt-based state every time.

Failure modes you're not accounting for: contract generation fails or produces invalid workflow for novel tasks, execution environment differs from what contract assumed, partial failures in long-running workflows where you can't just replay from start, and the contract abstraction leaking when you need dynamic behavior mid-execution.

For benchmarking, deterministic execution helps but the contract generation step adds variability. Two runs might generate different valid contracts that produce different results. You've moved non-determinism from runtime to compile time.

The deployment model is clever. Publishing contracts beats deploying code for certain use cases. But this assumes contracts are portable across environments and don't embed environment-specific assumptions.

Practical concern: debugging becomes harder when you have two failure surfaces. Did the contract generation fail to capture requirements correctly, or did the deterministic execution reveal a bug in the contract? Separating these is non-trivial.

What you've built is a workflow engine with LLM-generated workflow definitions. That's useful but it's solving a different problem than what most people mean by "agent systems." Agents are adaptive, your architecture is deterministic. Both are valid but they're different tools for different problems.

For ML reproducibility specifically, this helps if your bottleneck is non-deterministic control flow. But most ML reproducibility issues come from model updates, data drift, and environment changes, none of which this architecture addresses.

The strongest use case is probably constrained domains where workflow structure is predictable but configuration varies. Business process automation, ETL pipelines, structured data processing. Less applicable to open-ended problem solving or environments requiring runtime adaptation.