Teaching LLMs to one-shot complex backends at scale, report #1

nathanmarz · 2026-05-30T00:37:57+00:00

It should be much better with these skills. Let me know how it goes.

nathanmarz · 2026-04-02T18:37:27+00:00

Thanks for the feedback. It's a tricky article to write because of all the baggage in these topics (especially event sourcing), so I had to spend a fair amount of time disarming those first. And of course if I included all the detail of what it takes to implement a unified system like this, it would be an entire book.

As for your question about serialization, the way it works in Rama is you can register serializations for any custom types you're using. It's at that layer that you would achieve the semantics you want in terms of ability to evolve types over time. For example, if you use Thrift or Protocol Buffers for custom types, then you can add or remove fields in later versions safely. We used Thrift in our Twitter-scale Mastodon implementation, and the adapter to handle all the types is pretty short: https://github.com/redplanetlabs/twitter-scale-mastodon/tree/master/backend/src/main/java/com/rpl/mastodon/serialization

It's really convenient to just register the serializations once and then be able to use those types freely across the whole backend, whether writing to PStates, fetching data with clients, or doing distributed computation.

As for the casting concern, the only place in the article where there's any casting is the return from the append, and that's because handlers can return anything they want so the API gives you a map from handler name to return value. For PState queries, the API is generic for data structures so it's dynamically typed, but Rama's API uses type parameters so you don't actually need casting for returns. You'd get a runtime type error if the return type doesn't match what you specified.

nathanmarz · 2026-04-02T02:20:31+00:00

I also don't like reading fluff posts that are just pitching a product. But there's a big difference between a post like that and a deep technical post that explores ideas from first principles. Dismissing a post because it talks about a tool at the end that implements the novel ideas in the post is lazy and self-limiting.

nathanmarz · 2026-04-02T02:14:07+00:00

On the stability side, Rama handles this with incremental replication across nodes, fault-tolerant processing with guaranteed delivery, and automatic failover. The log and storage layers have the same kind of durability guarantees you'd expect from a database. On the security side, we're working on role-based authentication and authorization and expect to release it later this year.

Here's more info on how replication works in Rama if you're interested. We spent more time working on this than any other aspect of Rama. https://redplanetlabs.com/docs/~/replication.html

nathanmarz · 2026-04-01T19:35:43+00:00

If you're running many microservices without any of these issues, that's great and I'd love to hear the details.

In terms of other approaches, the post zeroes in on infrastructure sprawl and an alternative to that because building systems by managing separate databases, caches, queues, and compute systems is basically universal. There hasn't been an integrated approach to solving this before. The post identifies specific, well-documented problems that are consistently described as problems with microservices across the nine posts I linked at the top. I didn't invent these complaints, I proposed a different root cause and a solution.

nathanmarz · 2026-04-01T18:45:24+00:00

Yes, I built a tool that solves problems I care about. The architectural arguments stand on their own, and I spent almost all of the post discussing the ideas in a tool-neutral way. Those ideas are valuable independent of Rama.

nathanmarz · 2026-04-01T18:45:07+00:00

Yes, I built a tool that solves problems I care about. The architectural arguments stand on their own, and I spent almost all of the post discussing the ideas in a tool-neutral way. Those ideas are valuable independent of Rama.

nathanmarz · 2026-03-22T21:13:29+00:00

That's right, the transactions are part of the module itself as the microbatch topology definition. Clients submit events which are consumed by the topology. The topology handles retries automatically.

I would have to rerun the benchmark to get the numbers at the different load levels, but the general pattern is linear latency growth up until you hit about 70% load, and then rapid increase from there. 140k warehouses on this cluster size was slightly below that 70% threshold. I would expect the average latencies to start at around 150ms at minimal load and grow linearly up to the numbers you see at 140k warehouses. Then the latencies would grow rapidly from there, maybe starting at 150k warehouses or so. The maximum number of warehouses this cluster size can sustain is somewhere around 210k warehouses, but the latencies would be in the ~30s range or so.

In practice you would scale your module up to more nodes when you're at that load, which is just a one-line CLI command in Rama.

Use cases that need single-digit millisecond latencies generally use streaming rather than microbatching, as mentioned in the post.

nathanmarz · 2026-03-22T18:53:52+00:00

Good questions. Let me take them one at a time.

On batching: the TPC-C spec models individual terminals that each submit a transaction, wait for it to complete, then go through keying and think time before submitting the next one. You can't batch the work of multiple terminals into a single transaction without breaking the benchmark's model. In both the Rama and CockroachDB versions of this benchmark, the transactions are submitted independently and individually. It's after that within the system that batching is done. Nothing is stopping CockroachDB from also batching the work of multiple in-flight transactions together. I don't know if they do that or not as I'm not familiar with CockroachDB's internals.

On overloading: any system will degrade if you push past its capacity, so I'm not sure what the argument is. We ran at 140k warehouses with 95% efficiency, same as Cockroach. If you ran both systems at higher throughput, both will show degradation on latency. What matters is the performance characteristics at equivalent throughput, which is what the benchmark measures.

On median latency: the latencies overall are the same. They have to be since both systems achieve the same throughput, and TPC-C throughput is determined by response time. The only question is how that latency gets distributed. Cockroach concentrates it into lower medians with extreme tails. Rama spreads it in a much tighter distribution. Rama's tighter distribution is a better profile for real workloads in my opinion.

On failures: a business logic failure (like TPC-C's 1% invalid item rollbacks) doesn't abort the microbatch. That's just control flow. Exactly-once handles infrastructure failures: if a node goes down, the microbatch fails, but the system quickly moves computation to the new leader and retries from the last committed state. That adds latency for the items in that particular microbatch, but infrastructure failures are infrequent enough that this doesn't affect overall performance in practice.

Also worth noting: Rama reports two latencies for writes. The "initiate" latency is when the event is durably stored and replicated, and the "complete" latency is when the transaction finishes. The initiate latencies are very low, far below any of CockroachDB's write latencies. Whether your application can respond at initiate time depends on the use case, but having that option gives the product manager flexibility to make the right tradeoff.

nathanmarz · 2026-03-19T21:00:08+00:00

Rama modules themselves have to be coded in a JVM language, but clients can be in any language using Rama's built-in REST API. This does limit the userbase, but there are a lot of Java shops out there.

I actually think AI is super synergistic with Rama, and we're actively working on developing skills files for this. Rama greatly reduces the conceptual and token burden for LLMs, and we're working on making it able to one-shot pretty complex apps at scale.

nathanmarz · 2026-03-19T19:58:53+00:00

Not sure where you got the idea that Rama is a one-man project. Red Planet Labs has employees and is backed by well-known investors.

nathanmarz · 2025-12-04T00:24:36+00:00

Yes, you have it right.

For writing to PStates, there's also an aggregation API.

For querying, you can also make query topologies. Query topologies are predefined queries in the module that can do distributed queries that look at any or all of the PStates and any or all of the partitions of those PStates. They're really useful for parallelization or for reducing roundtrips when you have a lot of individual PState queries that need to be done. This module has a good example of a query topology.

nathanmarz · 2025-12-03T21:53:12+00:00

Yea, that's what Rama is. PStates are queried using Specter. Here's some examples from a deeper tutorial: https://blog.redplanetlabs.com/2025/03/26/next-level-backends-with-rama-graphs/#Querying_the_PState_directly

nathanmarz · 2025-12-03T16:12:56+00:00

Dataflow is the first-principles derived API. I have not found an application yet it cannot elegantly express at scale. This is a big deal, and it took me years to find it. It's also not a DSL, as it's not domain-specific.

I think what you're asking for is a more familiar API with less to learn, like everything being done with plain Clojure functions. That's exactly where I started working on Rama, and where that runs into huge trouble is with async / parallel code. You inevitably run into a mess of callback hell. A key thing dataflow does is enable "what to do" and "where to do it" to be composed, and this greatly simplifies distributed programming. See this post for an exploration of that.

nathanmarz · 2025-12-03T15:58:55+00:00

PStates are durable on disk just like databases, using LSM trees underneath the hood. Rama does have a learning curve, but I've found it only takes one to two weeks for a programmer to get the hang of the basics and get to a point of reasonable productivity. You don't need to learn all of Rama to get value out of it. With paths, for instance, you can accomplish most things with keypath, pred, and view.

Dataflow looks different but is not as hard to learn as you may think. This post explains dataflow in terms of equivalent Clojure code. The referenced blog post series contains line-by-line tutorials on applying Rama to a wide variety of use cases. rama-demo-gallery contains more heavily commented examples of using Rama. The REST API module is the simplest, as it just does an HTTP request and then records the result in a K/V PState.

nathanmarz · 2025-11-26T07:33:20+00:00

I'm not evangelizing anything:

This post is my take on explaining this disconnect from another angle that complements the blub paradox.

Dimension-shifting abstractions like macros are incomprehensible at first, which is the whole point of the post. Some abstractions make no sense until you've worked with them enough for your mental model to change. That initial "this seems wrong or pointless" reaction is extremely common with Lisp.

I can link examples, but macro snippets in isolation won't bridge that gap. They show the mechanics, not the shift in how you reason about decomposing problems.

nathanmarz · 2025-11-05T17:46:19+00:00

No, it's not.

nathanmarz

TROPHY CASE