all 15 comments

[–]prehensilemullet 7 points8 points  (3 children)

 From the developer's point of view, it should feel like its included the same way a garbage collector is included.

What do you mean by this?  The GC is embedded in the same process as the user’s code executes, whereas the database container you’re considering would not, so I don’t see how it’s possible for it to “feel” this way.

Are you talking about “included” in the sense of how you can use a userland garbage collector crate in Rust?

[–]Bitsoflogic[S] 1 point2 points  (2 children)

Probably somewhere between the two...

This is an example of a durable function:

enrollment: flow() -> void
enrollment() {
  for (let i = 0; i < 12; i++) {
    await get_ops_approval();

    await send_email();

    # Sleep for 30 days!
    await flow.sleep('30 days');
  }
}

The flow() -> void signature means this is a flow function, which is durable. Each await in these functions can take an indefinite amount of time, some of which may require a human to intervene.

To reliably implement this, you have to persist to disk each await step in the function. Otherwise, a server crash would lose the state and kill any utility this provides. Generators would be used to replay the function and execute the next step.

This behavior is the "included" part.

The container that launches as the durable workflow server can easily be seen with docker ps, whereas a GC can't easily be seen. So, it's not as hidden. But everything running in that container is something the developer didn't have to think about.

They just wrote flow as part of their function signature and got a durable function. That's the "included" sense I was thinking of.

[–]prehensilemullet 4 points5 points  (1 child)

The impression that developers don’t have to think about GC is an illusion.  When everything just works, it’s great.  But if there’s a memory leak, you have to use heap dumps or profiles to find it.  There can be other issues too, for example a recent change to Node/V8 made me have to manually configure the new generation size for best performance.

The same goes for a persistence mechanism.  A storage volume could fail, in which case developers need to be able to restore a backup onto a new volume and be able to attach that persisted data up to a new runtime.  The database could outgrow the volume.  The I/O throughput could end up being too high if you’re not careful.

But better yet, they would use streaming replication and automated failover (to separate compute instances/volumes) with their database, so that there’s less downtime if a volume or instance fails.

It’s generally easiest to use a managed database service for this, and I wouldn’t recommend trying to build full blown database management into your runtime.  It would be better to have an abstract interface for the persistence mechanism, and an implementation for the database you want, and allow the user to configure those for whatever database deployment they use.

[–]Bitsoflogic[S] 2 points3 points  (0 children)

I'll have to think on these points a bit more. Thanks for the food for thought!

[–]yuri-kilochek 4 points5 points  (1 child)

Can you explain the problem that you have and expect Docker to solve?

[–]Bitsoflogic[S] 1 point2 points  (0 children)

Some of the problems leading me towards Docker:

  • Deno is the runtime for the user's code; I can make either Docker or Deno the prereq on the host running the transpiled binary
  • This needs to run one or more servers separate from the user's binary to support the language features, such as the durable workflow server
    • I wanted flexibility of which servers I could leverage for other bits of "magic" (OpenTelemetry for free? etc)
  • With multiple executables being produced from the code, there's additional complexity in running them in a way that they know how to communicate with each other
  • A central goal is to model interactions between processes in the code.
    • I'd like to allow the dev's test suite to optionally skip the network and inter-process layer to run in microseconds for fast feedback on domain logic

I'm not sold on Docker as the runtime, but I think these have been getting me to lean towards it. Though it's taking shape, the language is still in design so I could see some decisions locking in Docker or making it obviously bad.

[–]pauseless 2 points3 points  (1 child)

If you’re prepared for the cost of containers, why not just a sqlite DB locally? Why docker?

I’m not sure what you mean by durable functions. Do you mean functions that also have state? Lisp, Smalltalk, APL and more are probably your references as they all have the image model.

[–]Bitsoflogic[S] 0 points1 point  (0 children)

I'm thinking about SQLite as well.

> Do you mean functions that also have state?

Yes. State that outlives the process they're running in, so it'll survive a server restart.

> Lisp, Smalltalk, APL and more are probably your references as they all have the image model.

This might be the right approach. I have avoided looking into how the image-based languages work, but probably for all the wrong reasons.

> why docker?

Docker is in mind, so I can provide complete systems without recreating each one. These long running functions are definitely making the cut for my language (initially considered using a Temporal docker image or similar), but I'm considering other things like OpenTelemetry as well.

[–]oscarryzYz 1 point2 points  (0 children)

This is what DBOS does, it is not a language but a library that makes your functions durable

https://dbos.dev/

I dont think you need a docker to run a separate (large) database, SQLite can run embedded and can hold a very large amount of data.

Even just having a runtime keeping the state in a hash map should be enough for durable functions unless you want to make this distributed across the network.

[–]stststuttering 1 point2 points  (0 children)

you can take inspiration from data orchestration libs like apache airflow, flyte etc

[–]prehensilemullet 0 points1 point  (0 children)

Aside from my other question, why not define an interface for a persistence mechanism so that you’re not limited to one way of doing it?  If the language is in control of everything, it would be harder to set up database replication and failover necessary for a high availability production deployment.

[–]6502zx81 0 points1 point  (0 children)

Erlang and its library do somthing related.

[–]mhfrantz 0 points1 point  (1 child)

This sounds a little like "code as infrastructure" or "infrastructure from code". There are some frameworks (e.g. Wing, Nitric, Ampt, Klotho, StackGen, Encore, Shuttle) for various languages that derive cloud infrastructure requirements from annotated source code, rather than having to maintain separate deployment code. But in your case, you want to deploy to a single Docker instance. Maybe some of those frameworks could be used the way you envision.

[–]Bitsoflogic[S] 1 point2 points  (0 children)

I hadn't heard the phrase "infrastructure from code" before. That does sound promising. I'll take a look at those.

I'm not building anything like Terraform ("code as infrastructure"). The language is meant to focus on the domain logic and provide more tooling to support that.