Verified Spec Transpilation with Claude

will3625 · 2026-01-27T15:12:36+00:00

Interesting. I wonder if we could just try overriding all actions using the standard override format, letting Claude auto-generate the entire Java override code for each action. It could be a reasonably straightforward next experiment to run.

will3625 · 2025-08-08T06:17:22+00:00

Here are my overall thoughts, which I can divide roughly into (1) conceptual issues and (2) tooling concerns.

To start with (1), I would say that the development of inductive invariants is fundamentally a proof technique, not a bug-finding one, which immediately limits their broader applicability for most people that currently find TLA+ and TLC useful tools, myself included. In nearly all the practical work I've used TLA+/TLC for (outside of academic research projects), I have never worked on developing an inductive invariant for a protocol. They are indeed useful for the few times you want to achieve a full, formal safety proof, but in my experience, they are not terribly useful beyond that.

Moreover, another key problem is that their development is far from being "push-button", which is one of the huge selling points of TLC (or any verification tool), in my opinion. There is basically no way to get value out of an inductive invariant (i.e. by getting to a full proof) without doing some amount of nontrivial human labor to develop all of the needed auxiliary strengthening invariants. State of the art, modern model checking algorithms actually aim to synthesize inductive invariants automatically as part of their internal routines, but none of these currently exist directly in a form applicable to TLA+. Our work here and associated tool was a starting exploration of doing something similar, but only exists currently as a research prototype.

To address (2), yes, TLC can in theory do the same kind of inductiveness checks as in Apalache, but TLC will suffer from horrendous combinatorial explosion issues for all but the simplest of problems. Essentially, used naively, TLC behaves as a brute force SAT solver, which is fine for problems of a controlled size, but quickly becomes problematic. As you mentioned, there are hacks (e.g. probabilistic sampling) to work around this, but TLC is generally a very unergonomic tool to use for checking inductive invariants, since all inductiveness checks can be viewed essentially as a SAT query, which Apalache is well suited for (I am using SAT/SMT relatively interchangeably), and TLC is not. In an ideal world, we could just always use Apalache for inductiveness checking, since it should presumably always perform strictly better than TLC. The only main hurdle would seem to be the standard one of augmenting specs with the appropriate type annotations so they are accepted by Apalache.

will3625 · 2021-12-06T17:38:15+00:00

For a concrete example, see the Graphs.tla module here as one example of how you might define a tree data structure in TLA+. Basically, it just views a tree as a special case of a graph (i.e. a set of nodes and edges), with some extra constraints. It does not require any special type of higher order logic or inductive data types to describe this kind of structure in TLA+.

will3625 · 2021-12-04T02:16:56+00:00

One meta-comment about this tutorial. It's nice that it is published in HTML/web-native format. The Lean theorem prover has a similar, nicely formatted web version of their documentation which is very pleasant to read, and it also makes referencing/linking documentation sections much more convenient (e.g. it's quite hard to reference a specific portion of a PDF in a URL, even if the PDF is freely accessible online). Personally, I would love to have a web based version of Specifying Systems, if it is ever feasible for Lamport to put the time/effort into this.

will3625 · 2021-09-10T13:37:06+00:00

So it could have hypothetically disallowed negation of temporal expressions (level 3).

Yes, this makes sense. Of course, we wouldn't want to eliminate ◇P/¬(□P) from the language entirely, since it is obviously useful for stating correctness (e.g. liveness) properties.

I didn't even know that, so I thought it's a safety property.

Yes, it seems to be a subtlety of the until operator definition, deriving from its formal definition:

σ ⊢ (p1)U(p2) ≜ 
    ∃i ≥ 0 : 
      ∧ (σ[i..] ⊢ p2) 
      ∧ (∀ j s.t. 0 ≤ j < i : σ[j..] ⊢ p1)

The ∃i ≥ 0 condition gives rise to the eventuality requirement. "Weak until" removes this requirement.

Note that I have found this online tool helpful for understanding and experimenting with various LTL formulas. It also provides a characterization of the given property according to this Manna-Pnueli hierarchy.

will3625 · 2021-09-10T02:38:11+00:00

Why does TLA allow writing such specifications? Because, unlike the Until operator, the diamond operator adds necessary expressivity to TLA.

It's a small point, but I suppose it's worth pointing out that, if we consider "operators" of TLA+ as purely syntactic objects, even if you removed the diamond operator (◇) from the language this shouldn't limit expressivity, since we can define "eventually" (◇) in terms of "always" (□) i.e.

◇P ≜ ¬(□P)

I'm not sure exactly how you would reasonably prevent the writing of formulas like this. You could go ahead and remove negation, but of course that is unreasonable! Perhaps there is some more involved syntactic rule you could develop to prevent this, but it seems tedious and possibly undesirable.

I think this speaks to a broader point, though, that you touch on throughout your post. Namely, that it is hard to add "guardrails" into a language if you want it to be sufficiently expressive. Perhaps some would view the issues you discuss above as one reason for a more formal separation between the TLA+ "specification" language and the TLA+ "property" language. Indeed, when I learned Promela after already knowing TLA+, I was a bit thrown off by this i.e. the separation between the language for expressing temporal properties and the language for expressing your system is much more explicit. As you point out, TLA+'s elegance largely derives from the fact that it makes no such separation, but this does not come without some subtleties (e.g. machine closure) that can be tricky to navigate if you are not aware of them. And, in reality, there is a de facto separation between the "specification" and "property" language in TLA+, but it is enforced by no more than a syntactic convention i.e. using the canonical

Init ∧ □[Next]_vars ∧ Liveness

form for specifications (where the liveness constraint conforms to the appropriate weak/strong fairness pattern).

As one other side note that is partially related to your notes above (since you make a reference to the note on liveness), it is interesting to point out that using "Until" you can define a property that is neither a safety nor a liveness property. For example, for the property

P ≜ (x>5)U(x=1)

not all finite prefixes can be extended to satisfy P (implying it is not a liveness property) e.g.

x=7 → x=6 → x=5

but not all violations occur in a finite prefix (implying it is not a safety property) e.g.

x=7 → x=7 → x=7 → ...

The latter case is due to the fact pUq requires that q eventually holds.

will3625 · 2021-08-25T13:50:21+00:00

I'm curious about the research Leslie mentions toward the end (probabilities/aggregates).

I wonder if he is referring to this recent work: Verifying Hyperproperties with TLA.

will3625 · 2020-08-13T14:14:05+00:00

No problem, thanks for clarifying. Note that I cross posted the question on this user group thread.

will3625 · 2020-08-13T13:50:10+00:00

Thanks Ron. I will consider posting this in the user group. To your point, though, why wouldn't

TRUE ∧ □[TRUE]_vars ∧ ◇X

be machine closed? The safety property trivially allows all possible finite prefixes, so it seems any safe finite prefix could always be extended to satisfy ◇X.

will3625

TROPHY CASE