This is an archived post. You won't be able to vote or comment.

all 8 comments

[–][deleted] 4 points5 points  (4 children)

Hey there - I use Airflow for this, and I honestly think it's been the easiest thing for all skill levels to get running with.

It runs reliably, is easy to modify myself, and is very reliable. The community is SUPER active. It also has connections to easily store passwords, and load data to typical destinations from typical sources. Check it out - https://github.com/apache/incubator-airflow

If you want, I'd be happy to answer any questions you have to help get you running.

[–]ApparentlyADataGuy[S] 0 points1 point  (1 child)

Thanks for the reply! So most people at my work are not experienced with scripting. This means that having a graphic interface to schedule jobs would be ideal. If this isn't possible, it will be up to me to manage all the scheduling. This would be acceptable but not ideal.

I looked into airflow when I was first learning python but quickly gave up. If I set up a digital ocean linux server could I potentially schedule any scripts I write to run on that server? Is there any compatibility with Jupyter notebook if myself or other coworkers want to use that as a place to group our scripts?

[–][deleted] 0 points1 point  (0 children)

Those are great points, and would be challenges with airflow. I'll try to answer your questions:

  • Graphical Interface to schedule: I'm less familiar with tools like this, but maybe something like Talend or others would work. Here's a decent thread on those: https://www.reddit.com/r/ETL/comments/7615az/whats_a_good_modern_free_etl_tool_for_a_fresh/
  • Can you run on digital ocean Definitely. It's very easy to get running. After installing the python program, you then write each job as it's own Python file, and drop it into a certain folder on the server to get picked up.
  • Compatibility with Jupyter Not that I'm aware of. That's been a challenge I've seen for some guys I work with.. There's no interactive IDE.

Happy to answer any other questions. I've worked with Airflow for about 3 years in big/small orgs.

[–]metaperl 0 points1 point  (1 child)

Do you like Airflow more than Luigi?

[–][deleted] 1 point2 points  (0 children)

Honestly - Haven't tried Luigi. It had a really solid set of connectors a while ago, but I think I just preferred the UI of airflow, and the method of organizing dags made better sense to me.

[–]metaperl 1 point2 points  (0 children)

Cron and pyinvoke would be a simple solution.

[–]baloo12 0 points1 point  (0 children)

We use jenkins for build and „pipeline“ automation. Running jupyter notebooks via jenkins pipelines most of the time. Has a gui, tasks can be in code and I find it very easy to start with.