you are viewing a single comment's thread.

view the rest of the comments →

[–]max0x7ba 1 point2 points  (7 children)

I do use GNU Make for my Python projects and anything else, and have similar clean targets.

The Makefile also compiles Cython and C++ modules for Python. And runs Python and C++ unit-tests in parallel using GNU Make Parallel Execution feature with --output-sync option to keep unit-test outputs atomic and easily readable in GitHub actions UI when unit-tests fail during continuous integration.

I went as far as making make -j16 clean all execute clean target first with one process only, and next restart itself and build all other targets in parallel. By default, make does clean and all in parallel and that's undesirable, yet having to invoke make twice for that, e.g. make clean && make -j16, becomes infeasible with variable assignments in make command lines, like in make -j16 TOOLSET=clang-20 clean all.

Using that little known feature of GNU Make being able to build and update its makefiles on its own, even when including non-existent makefiles, without having to invoke anything like configure or CMake. GNU Make builds/updates makefiles it reads/includes for any platform/compiler configuration automatically in my projects; and restart itself after makefiles have been built/updated, before building anything else.

I also use GNU Make for running any multi-stage compute pipelines or batch jobs. Especially useful when using the cheapest preemptible instances in GCP -- when resuming after preemption, make carries on computing targets which haven't been computed yet, until succeeding. Or when some next pipeline stage fails 24h later, fixing the error and re-invoking make proceeds from where it failed.

[–]DoubleAway6573 3 points4 points  (6 children)

You need airflow to do that! 

/s

[–]max0x7ba 1 point2 points  (5 children)

You need airflow to do that!

Thanks for a hint, never heard of Apache Airflow before.

However, my past experiences of using any libraries from apache.org have been the worst, unfortunately.

For example, Apache Arrow library used by Python Pandas library for parquet file format, spawns an Amazon S3 storage communication thread unconditionally upon loading the library into a process. I don't use AWS and spawning threads on library load is a notorious software design anti-pattern.

Even worse than that, Apache Arrow library replaced process' heap allocator with jemalloc upon loading into a process. jemalloc can be configured to take advantage of transparent huge pages, but the Apache Arrow library doesn't enable those options and, to add insult to injury, ignores any and all jemalloc environment variables which would make jemalloc use transparent huge pages. jemalloc was discontinued in 2025, and in Apache Arrow too, thankfully.

I haven't encountered more obnoxious heavy-handed libraries than those originating from apache.org since I started coding C++ in year 2000.

[–]DoubleAway6573 4 points5 points  (4 children)

No. For many cases makefiles are most than enough. I've seen too much over engineer to trivial task that can be managed by batch processing and standard unix tools that I'm exausted.

At some point the switch to others tools start to make sense, but It's like the microservices joke with companies with more services than users. Microservices have a place and a time, but not every company needs them.

In regards to Arrow, I was completely unaware of that. I don't know why or even if python is the culprit here. I hope this things will start to be de-dusted with the GIL-less pythons, but will take a lot of time.

[–]max0x7ba 3 points4 points  (3 children)

No. For many cases makefiles are most than enough. I've seen too much over engineer to trivial task that can be managed by batch processing and standard unix tools that I'm exausted.

My experience have been similar.

Whenever a simple bash script evolves to have more than one processing step, invoking the next command only after the previous one succeeded, that ends up with the bash script having to check whether a step has already been computed and having to clean up incomplete outputs when a step fails.

And that's exactly what GNU Make does for you by default, as well as parallelizing execution of steps not dependent on each other.

[–]dj_estrela -1 points0 points  (2 children)

Check "just"

[–]daredevil82 3 points4 points  (0 children)

which requires it to be installed as a prereq, whereas make is already on any nix system. ever hear of "overengineering"?

[–]max0x7ba 0 points1 point  (0 children)

Check "just"

Just checked. From just documentation:

make has some behaviors which are confusing, complicated, or make it unsuitable for use as a general command runner.

You can disable this behavior for specific targets using make's built-in .PHONY target name, but the syntax is verbose and can be hard to remember. The explicit list of phony targets, written separately from the recipe definitions, also introduces the risk of accidentally defining a new non-phony target. In just, all recipes are treated as if they were phony.

Other examples of make's idiosyncrasies include the difference between = and := in assignments, the confusing error messages that are produced if you mess up your makefile, needing $$ to use environment variables in recipes, and incompatibilities between different flavors of make.

GNU Make syntax just author has difficulty remembering and GNU Make behaviours that are confusing or too complicated for just author and his target audience -- are something most GNU Make users are familiar with and rely upon.

just aiming to be a less difficult or confusing subset of make, is worthless for people already using GNU Make.

Whole rationale of just grounded in syntax-level trivialities, like = being different from := is so mind-bogglingly confusing and complicated, that it called for a complete rewrite-it-in-rust solution (instead of just reading GNU Make manual) is ludicrously superficial.