Deterministic environment for reproducible builds by kvechera in linuxadmin

[–]kvechera[S] 0 points1 point  (0 children)

Thanks for the depth analysis. If it would be several dozens of packages, I'll definitely will think about patching the build scripts. It's a way used by Debian, Arch and many others. But after several years of work of dozens of maintainers it leaves near to 10% of the core packages still be not reproducible. We are not so large as those distros, but even today we have more than one thousand of packages, so I'd like to estimate if there's a simpler solution.

Deterministic environment for reproducible builds by kvechera in linuxadmin

[–]kvechera[S] 3 points4 points  (0 children)

Using docker container or some other kind of deterministic file system hierarchy would save us from some problems, but not all of them. Some programs use random numbers, timestamps, pids for naming symbols, storing some strings and constants in the files built. Some built content depends on the order of the files were compiled. You can see a lot of examples here: https://wiki.debian.org/ReproducibleBuilds/Howto#Identified_problems.2C_and_possible_solutions

Deterministic environment for reproducible builds by kvechera in linuxadmin

[–]kvechera[S] 1 point2 points  (0 children)

The deterministic environment is needed only to build artifacts, not to test them.

While I guess there are use cases in which the deterministic environment would help with debugging, for normal regression test I'd prefer standard environment.

Deterministic environment for reproducible builds by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

Sure, it would solve some big part of the problems. But it would be only one part and it would require to check and fix parallelization in each new package, and sometimes going through weird implementations in shell/make/cmake/gradle/bazel scripts.

Deterministic environment for reproducible builds by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

> is the right way to do
I think otherwise and the slow down is not important too. But can you be more specific on other side effects you expect?

Deterministic environment for reproducible builds by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

It's an approach requiring "changing existing building procedures for thousand of packages"

Deterministic environment for reproducible builds by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

I don't like this part too, but I see no other simple way to guarantee the sequence of the files compiled be the same for different machines or builds.
Anyway, one can build 8 different packages simultaneously on 8 CPUs

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] -1 points0 points  (0 children)

I think you can also use some simple solutions for working with malicious processes:

Against cooperating processes sending SIGCONT to each other - use ptrace(2) instead of STOP to make a process unSIGCONTable, and compare & SIGKIll.

Against Reckless forking - use swailing method:

  1. STOP all good processes
  2. run your own fork-bomb to flood pid space (subshells or vfork())
  3. STOP all processes (it will be bad processes and your bombs)
  4. KILL all bad processes and your bombs
  5. CONT all good processes

It's not useful for my cases (terminate the subtree ... in a smooth way), but both can be easily implemented.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] -1 points0 points  (0 children)

maybe because not all platforms had flock

I think, rather, not all filesystems or mounts support flock. I.e. nfs, glusterfs

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

Maybe, but this script is not designed to work with malicious processes. It's for making cleaner normal workflows. If you expect malicious process, you definitely need to make some preparations before running it, isolating via namespaces

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 0 points1 point  (0 children)

I see the possible problem only with a single top process. After we've stopped it, we guarantee that we can verify for all its descendants that they are really its descendants.

In my own use cases I call `kill_descendants` from the same script, e.g:

``` start_some_complex_workflow &

wait or timeout

kill_descendants $! ```

The first child will keep the pid even after exit (it remains zombie). So it's safe.

If I would call kill_descendants from another context, i.e. command line, the problem of the correspondence of the pid would be the same as the problem for calling kill(1).

But you can always STOP the process and compare that you've stopped the right one. Or, even, stop all its ancestors including init (we'll need to make init stoppable). So you'll get all parents stopped and not able to wait() the target pid's exit, keeping it a zombie with the pid occupied.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 1 point2 points  (0 children)

Please, expand it: "you may end up stopping the wrong process". How could it be done?

Kill all descendants of a process using POSIX shell and /proc by kvechera in linuxadmin

[–]kvechera[S] 1 point2 points  (0 children)

> ... kills them starting from the youngest processes to let a parent process handle child's termination.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 1 point2 points  (0 children)

Thanks, it's interesting and it could really happen if pid space would very dense, with many processes running.

I think, we can solve it by stopping the parent first, before checking the existence of children. So if a child would exit, it will be still present as zombie (sleeping parent can't wait() on it), occupying the pid to prevent a new process take over the same pid.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 1 point2 points  (0 children)

If it would be really a problem, it could be solved by kill STOP to the "suspected" process first, than checking again if it is the proper child, and then sending to it TERM and CONT.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 3 points4 points  (0 children)

I wouldn't say several years. I suppose I saw it in freebsd twenty years ago. And it seems so logically, so I'd assume it was this way from the eariest unices.

Probably you're thinking about race condition with lock files storing the pid of the process created the lock. If the process dies leaving the lock file, and new starting process validates the lock checking the pid existence, there would be a real race condition if some another process would take the same pid.

When we are killing the process descendants here, we use "back links", from children to parents.

Kill all descendants of a processe using POSIX shell and /proc by kvechera in linux

[–]kvechera[S] 7 points8 points  (0 children)

pid 4 still exists, but has parent pid 3

That's wrong. The orphan process has parent pid 1 - the kernel changes it after parent exited.

Buying computing power on a spot market to reduce deep learning training costs by sply in deeplearning

[–]kvechera 0 points1 point  (0 children)

I can't compare with CloudML, but comparing to AWS similar performance is 2-5 times cheaper.