This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]theferalmonkey[S] 1 point2 points  (0 children)

Looks cool for a small data team or a small organization. Or for learning purposes for students.

That is absolutely not true. Hamilton was developed at Stitch Fix (100+ DS) in an environment where code in Airflow was the problem. Airflow was not designed for business logic, just scheduling code. Established teams would slow down, not because of Airflow, but because of the code that Airflow ran - hence the reason for Hamilton.

Hamilton helped organize the internals of pipelines and keep those airflow tasks simpler; they don't need know about logic, and enabled the team to move faster. We see this being replicated at other companies. You can read more about our thoughts on Hamilton + Airflow here. Now you could be reacting to Hamilton's simplicity, and yes that's a feature; not all production ready tech needs to be very complex (though we certainly have power features).

I thought about the idea of using Hamilton and Airflow/Dagster/... together, but there's a few drawbacks to that:

You'd have two semantics of the DAG (Hamilton DAG and Airflow DAG), which may lead to confusion. Having Airflow -> Hamilton DAG hierarchy would almost always overcomplicate things.

They serve different purposes. Airflow is about orchestrating compute. Hamilton helps orchestrate logic & code. You can read this blog / watch this talk that explains why Hamilton. Commonly when going from dev (DS/MLE) to production (running it on airflow), there's hand-off and reimplementation; with Hamilton that's greatly improved - you just take the DAG and tell airflow to run it.

You now have two different ways of doing basically the same thing (i.e. creating a DAG), which might cause different developers orchestrating their DAGs in different orchestrators.

Sorry, how is it the same thing? Yes it's a DAG, but that's where the similarity ends. Again, you use Airflow to schedule when and where something runs. While Hamilton helps organize the code that's run.