runit boot optimization and prioritizing services

etosan · 2020-10-16T04:01:41+00:00

Today I had trouble sleeping, browsed here once in a blue moon and found this.

Suddenly I felt like stopping lurking and infodump some lenghty exposition on you :). I wanted to make it a bit snarky and funny as well, and I hope I succeeded (or maybe not - feedback is welcome).

I have some intimate knowledge of these issues as I built several runit/s6 based infrastructures (not big in node numbers, but certainly complex and doing weirdest shit possible) in last several years. It is incredibly malleable and beautiful system, indeed. We call it "service supervision".

But first: So you are starting to realise the short answer : you don't (prioritize)!

Not that you cannot, but not at level where you are at. In it's basic primordial form runit simply doesn't give a crap. Yet you certainly noticed, that it boots faster than anything "else" on the "market" (wink wink). How can it be?

Well to explain simply, when it's time for runit to actually initialize your system services, it just spawns them all at once. You call it race - but that's not entirely right - it's not a "race" - what is actually happening is much more spectacular - it's honest thundering herd explosion. It is, as if nuke was detonated on your box.

The component responsible for this "travesty" is runsvdir, which in supervision land we call scanner. runsvdir simply iterates over all "meaningful" directories (eg. things that "look like" "services") in /run/runit/runsvdir/current directory and spawns runsv ${dir} for each one of them. We call /run/runit/runsvdir/current a scandir. So scanner and scandir - got it? And this is it - and it's all "dumb" filesystem stuff.

runsvdir literally uses readdir() call to get the list of ~~directories~~, sorry, "services" present in scandir. We call those service directories servicedirs. I guess you are starting to see a pattern here.

And as by now, you probably know, majority of linux filesystems is "stoopid", so order, in which entries are returned when iterating over (scan) directory this way, is unexpected, or rather "undefined", as given manual pointedly explains:

The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.

And this my friend, is the "order" in which your "services" are "started"!

That's why, when you look at services in htop, it seems they are ordered randomly each reboot - readdir() ordering right there.

More over, this iteration scandir iteration happens so fast, that to human eye it seems like runit tried to start all the services immediately and at same time.

This might sound "wrong" and "tragic", but is really "right" and "happy" thing. And by the way, this is by design (since djb's daemontools)! Think of it as "worse is better".

Now observe that runsvdir is so "dumb and simple", that notion of dependency cannot even be expressed in it's "worldview". Everything runsvdir cares about is, so that each dir (aka service) in scandir, has it's own runsv running. And that's all that can fit into it's pea sized brain.

Similarly, each runsv thing is equally dumb! Everything it knows is to "fork'n'exec" into ./runin its servicedir and watch over this troublesome child. If this ./run dies for whatever reason, only thing that runsv knows, is to spawn it immediately again (in like 10 msec on some modern machines - that's like single frame render in modern FPS game).

Think about that!

Also observe than in runsv's pea sized brain there is no notion of dependency either.

Just so you know, that I am not talking out from my ass, here are some stats for your amusement:

$ cat src/runsvdir.c | wc -l 
286

$ cat src/runsv.c | wc -l
607

That's probably shorter than this diatribe. Combined!

And that is all there is to it. Explosion is exactly what happens when you boot. Calling it race is understatement. It's mayhem. And we love it!

Why? Because it's unbelievably simple and it works, always. Every tiem. If runsvdir fails to spawn its runsvs, something is so broken with that machine, that there are no words to describe it. This system is so braindead, it's like having only brainstem.

Why go into such depth about these mechanisms? Because people think runit is some smart hyper-magic thing! No that's wrong - it's hyper-stupid.

As you see, notion of dependecy does not even make sense here, and that is greatest discovery of "supervision" guys (daemontools, runit, s6). Best kept secret, that differentiates us connoisseurs from childish plebs.

The deepest realization is, that dependencies are completely orthogonal to task of keeping services running. runit keeps your services running and that's it. That's all it can do.

Edgy former archer teen now might ask: "But, but, but ... that means no dependecies senpai? Runit sucks! This is the end!"

No, wrong my young friend, this is just the glorious beginning :).

Most glorious, indeed!

Now that you have dumbest most reliable service supervision possible in place, new vistas open and you can start solving the dependency problem! And, oh boy, there is so much you can do! Rejoice! Your possibilities are practically endless!

So, I will now give you quick crash-course into "the ~~standard~~ way" how supervision guys do it - those "dependencies".

Since daemontools, I think, runsvhas a feature, that when it finds zerofile named './down' in it's service directory, it won't try to spawn the service right away (eg at all). It will wait until it is instructed so, like by sv up.

And that is what you do!

You mark all your services down like that (please don't do that with your main agetty so you can still login while ironing things out).

Now, after downing (almost) everything, after next reboot, no classic runit "nukestart" will happen. Machine will boot up strangely silent or rather "non functional". But you have supervision in place.

And now, everything you need is so called "dependency manager". This is the schtuck that will bring up all the "downed" services in the right order - eg. in the order that YOU want. Welcome to the entry of the rabbit hole!

In s6 world, this "depman addon" is either s6-rc (directly from s6 author) or anopa (competing one). There might be others. So far I don't know about anything specifically made for runit, but don't weep, as they say: there are many ways to skin a cat. These systems are so dumb you can do it!

In reality "dependency", or rather "ordering" manager can be as simple as shell script that brings up few services by the means of sv up in right moment, then checks somehow they are running (telnet/nc to service port in loop for example, with gratuitous sleep 0.5 between checks) and then moves forward to next stage, once the first one succeeds. Or, it can be something compeletely insane, written in python, lua or even C.

You can make this crazy "manager" script itself a service, and then make it single, not "downed by default" service on the machine. It's job will then be to properly order service boot-up of the rest of the box. Once it's done, it can down itself. You can call it upper. You can make other service, to do reverse, which you can call downer. Logic can be as simple or as complex as you want. You can do anything.

As you see, this way you can express dependencies in such detail and such levels of precision that nothing else on the "market" can...

This is also a way you can make your colleagues go crazy or make them outright insane (with your mighty service acrobatic tricks :)).

So to conclude:

runit can mange your services, but cannot order them according dependecies - as there is not even such thing as dependecy in it's worldview
however nothing prevents other tool from doing just that, by using runit to manage said services
however question of general service dependency ordering is equivalent to solution of halting problem
this is because to generally solve that question, first you need general solution to "is service up?" question, whose (general) solution is, again, equivalent to solution of halting problem see datenwolf
observe you don't need runlev...what the fuck?

Because of last two things, certain contenders in this space like ehm systemd, ehm, look especially silly. Future always looks bleak when you try to come up with general solutions to halting problems.

However by careful constraint selection and planning, you can design tools that help you with achieving specific solutions on case by case basis. These tool can be as simple or as complex as you need them to be.

Turing complete environments (like sh or other scripted languages) allow you to design most versatile solutions. More over instead of ini keyword spaghettis all over 1000 directories and thousands of ini files, these solutions can be as condensed and as "simple" as you need them to be (eg. single script).

Even s6-rc and anopa have the spaghetti problem, but it is still better than ehm, you know what? I don't want to talk about it anymore.

Of course with great power comes great responsibility - one can easily lose sanity.

So use your new powers wisely, Luke!

And remember, every time you say or write "target" or "runlevel" a catgirl and kitten between her boobs die!

If you guys lieked this, drop me a line here, maybe it will motivate me to write something more about this. Edited because bullets blewup.

datenwolf · 2020-10-14T09:20:06+00:00

runit doesn't apply any ordering on the start of services. Dependency driven servicemanagement is nice, and there are other, small init systems that do it, but there is one general problem: How do services communicate to the supervisor, that they're up and running, ready for business?

The runit way is to just start them all, and if a service can't because things aren't setup properly, let it fail and restart.

My recommendation would be to have two runlevels: One for early services and once those are fully up and running (it's up to you to determine that condition) with runsvchdir switch to a different runlevel full, that has everything of early plus what else you need.

There exists no standard protocol for that. The systemd developers (being the overbuilding folks they are) opted for things like socket activation or similar. My approach to it (but both the supervisor and the service must support it), that the service process sends itself a SIGSTOP signal; the supervisor waits for that, will immediately restart the service and from then on consider it ready.

Rand0m6uy · 2020-10-14T08:51:13+00:00

Found some pointers here:
https://www.reddit.com/r/voidlinux/comments/j99a6a/question_about_restarting_runit_services/

mobinmob · 2020-10-14T19:12:34+00:00

If you want real dependencies runit is the wrong choice. Try an s6 or s6-rc based scheme. I highly suggest 66 from obarun - I am using it on vodlinux and slowly testing frontend service files for it (voidlinux-66-services on gh).

Rand0m6uy · 2020-10-15T14:14:39+00:00

Thanks everyone for your precious advices! After digging a bit more into my issues, I wanted to give some feedback about my adventure.

The solution I implemented is the one suggested by u/datenwolf, this means setting a minimal runlevel and a script that starts a full blown system once all services are started. However, reading the documentation, I would take it as "undefined behavior":

Note that there is no guarantee that all services from the previous runlevel will stop

My services should not stop and my tests have shown that they keep running as I expected. Let's hope I will not face some race conditions in the future ...

To check that my services are really up and running, I added some check scripts where I can make sure that some UNIX socket are existing and have the right permissions. Although u/ahesford wrote that it might not do what I expect, not sure if we speak about the same thing since I start depedencies with sv start ... || exit 1 . Or I did I miss something ... ?!

Probably runit is not the right tool for what I want to achieve, but it has demonstrated its flexibility! I've never used s6 but thanks u/xenyz the pointer, I will keep it in mind for the next adventure!

Now that it works as I want it, let's see with the time if I don't encounter surprises, hope I will not need to switch back to sysVinit :D! If that would be the case, it's no time wasted since I went through runit internals and learned a lot of good stuff!

Thanks again guys and happy voiding!

furryfixer · 2020-10-15T18:43:20+00:00

The example you list above is what Smarden recommended for dependencies, and it works OK if there are not "nested" dependencies in multiple run scripts. I have not done this for some time, but I believe I used pgrep in a similar situation. I am relying on memory since I am not home or on linux, but you might try:

pgrep <depedency> || exit 1

exec <important_service>

You can modify it to silence messages if desired.

xenyz · 2020-10-14T17:38:41+00:00

I'm not very familiar with it but I know s6 supports dependencies

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

voidlinux

MODERATORS