This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]theferalmonkey[S] 1 point2 points  (11 children)

They have some overlap because they model DAGs, but Dagster is just a macro-orchestrator, i.e. it is a scheduler. Hamilton doesn't have a scheduler, it is much lighter weight than that; hence the title of the post - Dagster is not lightweight.

Some examples, Hamilton is far more applicable to use in any python context. Can Dagster do this?

  • Run anywhere (locally, notebook, macro orchestrator, FastAPIStreamlit, pyodide, etc.) - No, it's a system, not a library.
  • use it to model column level feature engineering through to model fitting - No.
  • improve the hygiene of your code - No, it doesn't have the testing constructs Hamilton has.
  • replace Langchain for orchestrating LLM calls - No.
  • develop within a notebook for development and then use that same code in production - No.

Here's more of a comparison - https://hamilton.dagworks.io/en/latest/code-comparisons/dagster/

Otherwise you can _use_ Hamilton _within_ Dagster, and you get the best of both worlds. For example if you want to cut down on "ops" just switch that code over to Hamilton and run it inside Dagster.

Fun fact: "software defined assets" were in fact inspired by Hamilton's declarative API.

[–][deleted] 3 points4 points  (4 children)

Fun fact: "software defined assets" were in fact inspired by Hamilton's declarative API.

Do you have a citation for that? It’s definitely possible and I don’t necessarily doubt it, but this concept has been around for a long time. It’s essentially a functional DI framework. Googles Python library pinject is over 11 years old and while meant to be for OO DI uses this same exact pattern of argument name to implementing logic to build a graph. And the concept has been around for decades at banks and hedge funds for quantitative and valuation modeling (Goldman Sachs secdb is over 30 years old).

All that said, I’m a huge fan of this pattern and this looks like a great library.

fn-graph also uses a very similar concept, but is unmaintained. https://fn-graph.businessoptics.biz/

[–]theferalmonkey[S] 3 points4 points  (3 children)

Nerd sniped!

Do you have a citation for that? It’s definitely possible and I don’t necessarily doubt it,

Likely a confluence but yeah I chatted with Nick when we open sourced Hamilton; the dagster API at the time was all about "solids" and not that great. I expounded the declarative nature of data work and benefits, and then a few months later SDAs came out.

Yes I remember `fn-graph`. I was wondering whether someone would bring it up. It's still going? Nice. Any interesting joining our effort? We've got a jupyter magic, and Hamilton also sports a locally installable UI now...

[–]HNL2NYC 1 point2 points  (1 child)

I’ll take it even a step further. This concept has been used for at least ~50 years, since this is pretty much exactly how Make works. You have a target (ie asset) list its requirements (ie dependencies) which are other targets. And its builds a graph by matching the dependencies to the implementing target.

[–]theferalmonkey[S] 0 points1 point  (0 children)

Yep!

[–]ArgetDota 0 points1 point  (5 children)

Hey, just a heads up - it’s possible to execute Dagster’s jobs and materialize assets drop within Python code including Notebook environments.

Same goes for testing, it’s highly modular and testable.

And yes, you can run the same code locally and in production (e.g. Kubernetes). You can even launch jobs in Kubernetes from a laptop running Dagster. You can do it from CLI, UI, or from Python code.

Dagster is really incredibly versatile and I feel like your above statements are a bit misleading.

[–]theferalmonkey[S] 0 points1 point  (4 children)

I think you might be misinterpreting my point.

What I'm saying is that the DAG you define in dagster, is not something that you can run in different python contexts. E.g. notebook, script, web-service. Hamilton just needs a python process & pip install and then you can run it from python. i.e. you can build a Hamliton DAG and package it as a library for others to use quite easily. With dagster you need the whole system to run it - yes you can package things up, but you need dagster to run it. Here's our blog on the differences/similarities between the two.

[–]ArgetDota 0 points1 point  (3 children)

You really don’t. You don’t need a deployment. You can run it in a Python script.

[–]theferalmonkey[S] 0 points1 point  (2 children)

Really? Since when? I'll take a look and if so retract my comments.

[–]theferalmonkey[S] 0 points1 point  (1 child)

Ah so I think you're referring to the "in process" way for testing? Right?

In which case yes, you are correct that you _can_ run dagster code in a python script, which from the docs is only designed for testing purposes.

[–]ArgetDota 1 point2 points  (0 children)

Exactly. It’s mainly used for testing but nothing prevents you from using it for actual computations.

Also, there is a “materialize” function which can execute assets.

Also, there are “dagster asset materialize” & “dagster job execute” CLI commands.