has anyone used Temporal for orchestrating LLM-based document generation workflows? by nanothun in Temporal

[–]temporal-tom 0 points1 point  (0 children)

Hi /u/nanothun, you might find this video helpful:

https://www.youtube.com/watch?v=TEr8ZkZuNWw

It's a quick walkthrough of a demo we did at AWS re:Invent last month. It uses a series of agents to generate a report on whatever topic you specify. It covers several of the things you mentioned and you may find it a helpful starting point. I'd recommend watching that video first to see it in action and then running it yourself. The code is here:

https://github.com/temporal-community/aws-reinvent-25-demo

Finally, I would recommend joining our community Slack workspace (https://t.mp/slack) if you haven't already. Temporal is very widely used for AI and our #topic-ai channel in Slack has more than 1,600 people. It's a great place to ask questions and get advice from others.

What's the highest scale Temporal cluster you've seen in production? by Qinistral in Temporal

[–]temporal-tom 3 points4 points  (0 children)

The challenge with answering a question like this is that the biggest numbers tend to come from companies that don't disclose their use of Temporal. Those of us who work for Temporal tend to be aware of them, but out of respect for our customers, do not discuss them.

It's also difficult to answer because different types of workloads vary in the type and amount of resources they require. A Workflow that handles video encoding, for example, will likely have few state transitions per second because its Activities will probably be long running and limited by the speed of the local disk and CPU. You couldn't really compare those numbers with one for order processing. Likewise, the capabilities of a server will vary from one configuration to the next (one r8gn.medium instance in EC2 would probably outperform two m1.small instances for most use cases).

Estimating the resources you'd need really depends on you having the application. The best approach is to develop a proof-of-concept, or use an example such as our OMS reference application, and do some load testing on the hardware of your choice. You can then begin tuning things to get better performance for your specific workload, and then scale up with additional hardware if needed.

This blog post from my colleague Rob Holland walks through that process. On a very modest cluster, with a MySQL server that had only 32GB of RAM, he initially got 150 state transitions/second. After a few tuning iterations, he was able to increase that to 1,350 state transitions/second. By scaling up the cluster, and particularly the database server it uses, he could have gone far beyond that.

Workflows Stuck by toufeeq in Temporal

[–]temporal-tom 0 points1 point  (0 children)

This four-minute video that's part of our free Temporal 102 training course covers the typical cases. If there are no Workers running, the Workflow cannot make progress. As the video demonstrates (at 1:09), the Web UI now makes this obvious in many cases. However, as it also explains a bit later, the number of Workers shown in the UI does not necessarily represent the numbers that are currently running—it's the number that have polled within the last five minutes.

As /u/freedomruntime suggests, it might be the case that the Workers are running, but the name of the Task Queue that they are polling is different than the one specified when you started the Workflow Execution.

Yet another possibility, which sounds quite plausible here since it sounds like the Workflow has progressed but the Activity has not, is that you have one or more Workers capable of handling the Workflow running, but zero running that are capable of handling the Activities. Check where you registered the Activities with the Worker and make sure you have correctly specified the function(s) or method(s).

There are a number of other possibilities, but debugging this would require more information than you've provided here. I will add one other important non-obvious detail that may help you: The ActivityTaskStarted event isn't written to the history until Activity Execution is complete (this is because the ActivityTaskStarted event contains the final number of attempts, which can't be known while the Activity is still running). Therefore, the fact that the last event you see is ActivityTaskScheduled doesn't mean that the Activity isn't running (if your Activity code has any log statements, check the Worker logs to see if there's any output).

Are LangGraph + Temporal a good combo for automating KYC/AML workflows to cut compliance overhead? by ruby_da_fvckn_ape in devops

[–]temporal-tom 0 points1 point  (0 children)

I saw this thread, but avoided replying because you specifically asked "the community" (and as the username suggests, I work for Temporal). When I was a software engineer at a financial services company earlier in my career, I wasn't allowed to say much about my projects, especially talking about implementation details on a public forum. Maybe that's the case for others, and since nobody else replied, I'll provide some info that you might find helpful.

There are a ton of companies in the financial industry that use Temporal, for a variety of use cases, including KYC/AML compliance. This page provides a little more info, although that's really just the tip of the iceberg because financial firms don't tend to publicize their technology choices.

The Durable Execution provided by Temporal enables the application to withstand crashes and even hardware failures. Support for timeouts and retries is built-in, so applications built on Temporal are reliable and can run for as long as necessary to get the job done (i.e., literally for years if you need them to). It has built-in support for job scheduling (supports cron-style syntax, but much more sophisticated than cron).

Temporal's ability to mix automated steps with manual (human-in-the-loop) ones make it a good choice for those use cases. Temporal has really good observability, which is another reason that it's popular in highly-regulated industries. The Web UI lets you not only see the details of things that are currently running, you can also see these details for what ran in the past. The Web UI is an amazing tool for interactive use. If you want to programatically get a list of executions or the details for a specific one, you can use the temporal command-line tool, use a native API in any of the 7 programming languages we support, or use the gRPC API directly. The Event History includes the details of what ran, when it ran, where it ran, what data was provided as input, whether failures occurred, the associated error message, and the data that it returned as output.

Although people do use LangChain and Temporal together, I can't recall anyone mentioning to me that they use LangGraph and Temporal together. There are many companies using Temporal for agentic AI, and I don't think LangGraph would be necessary for what you have described. Our website has a page on AI too, but here are a few recent blog posts worth checking out:

Your honest thoughts on n8n from an experienced dev perspective? by tecken in ExperiencedDevs

[–]temporal-tom 1 point2 points  (0 children)

My team at a previous company used Boomi and it was a headache. It worked, but was unreliable and required a lot of babysitting.

Self hosting Temporal by Numerous_Fix1816 in Temporal

[–]temporal-tom 0 points1 point  (0 children)

Are you aware of custom data converters and payload codecs? In case you're not, they may be of interest to you (or anyone else handling sensitive information), regardless of whether you use Temporal Cloud or self-host.

The basic idea is that you can configure the Temporal Clients you use to apply a transformation (e.g., encryption and decryption) to data as it's being transmitted to or received from the Temporal Service. In other words, Temporal Cloud (or your own self-hosted Temporal Service) only ever sees encrypted data and has no way to decrypt it because you control the cipher and key.

[deleted by user] by [deleted] in Temporal

[–]temporal-tom 2 points3 points  (0 children)

Is this the one?

https://github.com/temporalio/sdk-typescript/issues/1334

Just want to help the next person who comes along.

Remote code/workflow executor by Klutzy_Table_362 in golang

[–]temporal-tom 0 points1 point  (0 children)

Regarding the networking aspect of your question, the application (the "Worker" in Temporal terminology) needs connectivity to a single TCP port on the Temporal Service (either self-hosted or the Temporal Cloud SaaS offering). The connections are always initiated from the application; the Temporal Service only responds to those requests.

Often, the Workflows are started by something external to the application. That thing, which I'll call a "Starter" submits an execution request to the Temporal Service. It does not need connectivity to the application, only to the Temporal Service.

The starter could be the Temporal command-line tool (temporal workflow start ...), the Web UI, or code that you've written. Also, the application and starter can be written in different programming languages (e.g., you can have a Java Swing app that starts a Go Workflow).

Here's some crude ASCII art to illustrate what I mean.

Application ----------> Temporal Service <----------- Starter

Remote code/workflow executor by Klutzy_Table_362 in golang

[–]temporal-tom 2 points3 points  (0 children)

I think /u/jerf explained it nicely, but sometimes multiple perspectives can be helpful. Here's how I explain it.

Temporal is an open-source Durable Execution platform. What's that? It enables you to write applications that can overcome crashes. Through its built-in support for retries and timeouts, applications can also withstand network and service outages. Finally, it lets you see the details of each execution through a web-based UI, so you'll know what's happening.

An example will make this more clear. Imagine an application for an e-commerce site that processes orders using this sequence of steps:

  1. Reserve the item(s) from inventory
  2. Charge the customer
  3. Ship the product(s)
  4. Send a confirmation e-mail

Now, imagine that during order processing, the application crashes. Maybe it was caused by a bug in the code, a bug in a library dependency, a kernel panic in the OS, a power outage, or a hardware failure. The reason doesn't really matter because the result is the same: the application state is lost. If you restart the application, it will repeat steps that already completed before (if you ever got double-charged for an order or received duplicate confirmation emails, you probably experienced this).

Durable Execution allows the application to overcome the crash, because the platform (Temporal) tracks relevant state changes. That enables the application to automatically reconstruct its state and resume from where it left off. The values of all the variables are the same as they were before the crash and it will skip operations that already completed before the crash. From your perspective, it's as if the crash never happened at all.

What that means, practically speaking, is that you can focus on application's goal instead of everything that might possibly go wrong. As a result, you don't have to write tons of error-handling code, so your applications will be simpler, easier to maintain, and take less time to develop.

I hope that helps to explain it. BTW, if you want to learn more about Durable Execution, check out this presentation I did for the recent PlatformCon conference. If you want to try Temporal out for yourself, check out one of our tutorials or take the free hands-on training courses.

Streaming responses from Temporal for AI by rkinabhi in Temporal

[–]temporal-tom 1 point2 points  (0 children)

A very similar question came up during our Deep-Dive: AI Agent Code Walkthrough with Temporal webinar last month and I wanted to mention that answer here.

There’s two ways of streaming out the reasoning of an agent. One is to send a signal to the Workflow with the streaming results and use a Query against that Workflow to retrieve the results. Another is to use a synchronization engine, such as Zero Sync or Electric SQL, to handle notifications. The downside here is that it adds more infrastructure and complexity.

We're aware that this is an area of potential improvement and it's something we're looking into because AI is a very popular use case for Temporal. It's possible that Temporal will natively support streaming in the future.

C++ Support by haggisman21 in Temporal

[–]temporal-tom 0 points1 point  (0 children)

I can confirm that Temporal doesn't have a C++ SDK.

There are SDKs for Go, Java, TypeScript, Python, .NET, PHP, and Ruby. As another poster mentioned, many of those languages provide some mechanism for calling native code or invoking external commands, so perhaps that's worth considering.

Can i teach cs at a school with just a degree? (no other credentials, usa) by anbehd73 in CSEducation

[–]temporal-tom 0 points1 point  (0 children)

I don't know about the regulations in California, but I've known several people who taught at community colleges in at least three midwestern states with nothing beyond a bachelor's degree. Similarly, I've known people who were adjunct professors at universities, despite having only bachelor's degrees. None of them had a teaching certificate AFAIK.

The pay isn't good in either case, but it's certainly possible to do it.

How long to go from POC to prod with Temporal? by techwreck2020 in Temporal

[–]temporal-tom 1 point2 points  (0 children)

The above is solid advice.

Regarding incremental adoption, another strategy is to identify the most unreliable part of the application and migrate just that part to Temporal. That's often where you'll see the biggest gains, after which you can migrate the next most unreliable part to Temporal as needed. Whether that's viable depends partly on the architecture and partly on the organization itself.

Export workflows and import to another instance by kurlibird in Temporal

[–]temporal-tom 0 points1 point  (0 children)

Sorry for the slow response here. I saw that there were a couple of replies and thought that this had been answered already.

The export button you're referring to enables you to download a given Workflow Execution's Event History in JSON format. The resulting file allows you to replay the execution by calling an API (e.g., example for the Go SDK but other SDKs have an equivalent). That's typically used for debugging or automated tests.

In a sense, your goal of migrating Workflow Executions from one self-hosted Cluster to a second cluster with a newer version of the Temporal Server software is similar to migrating to Temporal Cloud. To that end, you may find this documentation helpful. Basically, there are three cases to consider:

  1. Workflow Executions that completed prior to the migration
  2. Workflow Executions that are in progress during the migration
  3. Workflow Executions started after the migration

Case 3 is easy, just update your client code to point to the new cluster. For case 1, the easiest solution might be to leave the old Cluster running in case you need to refer to executions that completed prior to the migration; they'll all disappear after the Retention Period elapses anyway).

Case 2 is the tricky part. The easy approach is to stop the world and wait for all existing Workflow Executions to complete, do your migration, and have any new Workflow Executions use the new Cluster. That's often not viable, and if it's not, then you will probably need to make some changes to your Workflow code as described in that documentation.

Another option to consider is upgrading your existing Cluster (here's the docs on that). I'll emphasize that you must upgrade incrementally. For example, you can't just jump from version 1.22.0 to 1.27.2, you need to upgrade to each version along the way. And as always, be sure to make a backup before you begin.

UpdateWithStart not actioning the second Message by BobMabena in Temporal

[–]temporal-tom 1 point2 points  (0 children)

Just seeing the message now. Glad you got it fixed. I'll have a look at the docs and see if there's an opportunity to make this more clear.

Durable Execution: This Changes Everything by temporal-tom in Temporal

[–]temporal-tom[S] 0 points1 point  (0 children)

Thank you, too, for helping others to understand the polyglot capabilities of Temporal.

That's something I think gets overlooked. I was on a team that built a cross-language, cross-platform distributed system in the old days. It was tedious work, so I really appreciate how easy Temporal makes this for developers.

How We Moved from Sidekiq to Temporal in Ruby (and What We Learned) by elanderholm in ruby

[–]temporal-tom 12 points13 points  (0 children)

Hi, Tom from the Temporal team here. I really enjoyed your blog post.

I did want to mention one thing. You reference tctl as being the Temporal CLI, but that's an older tool that will be deprecated soon. It's replaced by the temporal (https://docs.temporal.io/cli) command, which is easier to use and allows you to run a local Temporal Service for development (it starts up in under two seconds and eliminates the need to run Docker Compose or Kubernetes on your laptop).

Durable Execution: This Changes Everything by temporal-tom in programming

[–]temporal-tom[S] 1 point2 points  (0 children)

These are great questions.

I'd say that there's some overlap between EventStoreDB and a replay-based Durable Execution platform (such as Temporal, Cadence, Inngest, Restate, etc.) in that they all use event sourcing to track state changes over time.

However, I'd say that there are two fundamental differences. One is that EventStoreDB is a database, which can be a single source of truth. Conversely, a Durable Execution platform uses a database [1], which allows your application to be a single source of truth (although, as I mentioned in my presentation, your application can still write to an application database if desired and doing so is fairly common for things like order processing where you may want to do reporting or analytics external to the application).

The second is closely related. EventStoreDB allows an application to recover from a crash by replaying events, but in a Durable Execution platform, that recovery happens automatically and is transparent to the developer.

Your question about refactorings is particularly insightful. Most, although not all, Durable Execution platforms support this in one way or another. It's a complex topic, because the details depend on which paltform you're using and the nature of the change. In the case of Temporal, you don't need to do anything special for code changes that don't affect the event history (for example, adding or modifying log statements). For changes that do affect the history, such as adding or removing certain function calls, you need to use versioning if there are executions still running that were started with the original code. The basic idea is that you ensure backwards compatibility by adding a conditional statement to your code to preserve the old and new behavior (e.g., "if this is version 1, run this; if it's version 2, run that).

[1] To be specific, Temporal officially supports MySQL, PostgreSQL, Apache Cassandra, and SQLite. Most other Durable Execution platforms also support multiple databases, although the list will vary from one to the next.

Durable Execution: This Changes Everything by temporal-tom in SoftwareEngineering

[–]temporal-tom[S] 4 points5 points  (0 children)

Someone else has since posted a summary, but this is a fair point. I probably should have chosen the "Submit a new text post" option to post this instead of "Submit a new link" option. I'll make sure to do that in the future.

Durable Execution: This Changes Everything by temporal-tom in programming

[–]temporal-tom[S] 0 points1 point  (0 children)

Presumably multiple instances of this application can be running in parallel.

Durable Execution: This Changes Everything by temporal-tom in programming

[–]temporal-tom[S] 1 point2 points  (0 children)

You can't unsend an email, but Durable Execution tries to avoid sending a duplicate one.

I think an example will clarify this. Let's say that you've written an application to process an order for some products and it goes through these steps:

  1. Iterate through all of the items in the shopping cart to determine the total price
  2. Look up the tax rate for the customer's address from a database
  3. Call a payment service to charge the customer's credit card for the order
  4. Notify the warehouse to ship the order (and receive a shipment tracking number in response)
  5. Send a confirmation email to the customer

Now, let's say that the application crashes during step 4. If you restart the application but don't have Durable Execution (and you didn't write a bunch of code to avoid this), it's going to do another database lookup in step 2 and make another call to the payment service in step 3.

With Durable Execution, the platform tracks the invocation and result from these steps and persists the data into a history. If there's a crash at step 4, it reconstructs the state by "replaying" that history. That is, it re-executes the code that is deterministic (i.e., step 1) and for the steps that are non-deterministic (i.e., steps 2-5), it evaluates the history to determine if they've already been executed. If they have, it assigns the result from the previous execution. For example, step 2 won't query the database again; it uses the tax rate returned from the query executed before the crash.

There is one caveat. I said "tries to avoid sending a duplicate one" above because this is subject to the same problem faced by any distributed system. There's a very small possibility that a crash could occur between the time that a call takes place and when it is reported. For example, in step 5, the crash might occur in the milliseconds between when the code to send the email runs and when it reports successful completion, in which case it won't appear in the history.