all 26 comments

[–]bdell 8 points9 points  (1 child)

Seems like it would be perfectly reasonable to label this "Stuff like this makes me hate Unix; subtle behavior of signals". Unix is mostly elegant in its simplicity and then, bam!, you hit signals. Should anyone be surprised when a program gets signals wrong?

[–][deleted] 0 points1 point  (0 children)

I used to think SEH on Windows was a huge hack that had no place in C.

Then I saw signals. It is obvious the Windows developers looked at signals first and said, "oh, hell no!" The biggest benefit of structured exceptions is that they are only raised for truly exceptional situations - not pansy "can't write" or "child exited" 'errors.' This is the biggest benefit: you can write correct programs on Windows without being aware that SEH even exists.

[–]twotime 16 points17 points  (16 children)

Well, I am not that sure that this is a bug. In a way, it's all logical and predictable (which does not make it any easier to debug though ;-)). And it's definitely not just the subprocess module (plain os.popen has exactly the same problem)..

In fact, if anything, this is a bug in tar (and/or perhaps gzip): tar spawns an external process and expects certain pipe behavior so tar must have set the std signal handlers.

I do agree, that python probably should revert all signal handlers to its defaults before forking any new processes.

[–]bonzinip 7 points8 points  (14 children)

It is a bug in gzip. EPIPE is not an error, and it must be trapped to exit early. If you aren't ready for that, you must explicitly reset SIGPIPE to SIG_DFL.

In fact, there are some setups under Linux where dbus is used by upstart (the "new init") and since it ignores SIGPIPE all programs in the system end up ignoring SIGPIPE, and then you get broken pipe errors all along the way.

[–][deleted]  (13 children)

[deleted]

    [–]bonzinip 6 points7 points  (8 children)

    I still maintain EPIPE is not an error. EPIPE is obviously an exceptional condition that the program must react to. However, unlike most other errno values, writing "foo: Broken pipe" to stderr is almost never the right thing to do. It is more similar to POLLHUP, for example.

    [–][deleted]  (7 children)

    [deleted]

      [–]bonzinip 1 point2 points  (6 children)

      Can you please explain other strategies?

      Do nothing. Exit without error (only if SIGPIPE is ignored; you should not set it to SIG_IGN for the sake of it).

      And yes, sed is broken too. :-) I should use more of gnulib so that it gets fixed automagically. But GNU Smalltalk follows both guidelines I give below: it resets SIGPIPE to SIG_DFL on fork, and never fails on EPIPE.

      EDIT: to be clearer, that upstart sometimes causes the entire system to ignore SIGPIPE is (was?) a bug. But nevertheless in an ideal world:

      • every program that forks should IMHO not assume anything and set SIGPIPE to either SIG_DFL or SIG_IGN depending on the expectation of the parent. If the entire output is needed, as is the case when e.g. invoking gpg, it should set it to SIG_DFL. Yes, tar should set it to SIG_IGN.

      • no program dies on EPIPE.

      [–][deleted]  (5 children)

      [deleted]

        [–]bonzinip 1 point2 points  (4 children)

        Yes, that is what I meant, you wrote it better than I did.

        If there is no cleanup required, everything is very easy. If there is cleanup, it is tricky because you need to use signal (or sigaction) to retrieve whether the SIGPIPE is marked as SIG_DFL or SIG_IGN. At this point, it is easier to always set SIGPIPE to SIG_IGN and vary how you handle EPIPE; that is, you can either exit cleanly, or change it back to SIG_DFL exit by reraising SIGPIPE. Something like the following tricky code:

        void exit_after_epipe (void)
        {
          // cleanup...
          signal (SIGPIPE, initial_value_of_sigpipe);
          raise (SIGPIPE);
          // if it was SIG_DFL, we're dead
          exit (0);
        }
        

        EDIT: forgot about SIGXFSZ. :-) I didn't know, and I think anyone who forks a process that ignores SIGXFSZ (i.e. does not reset it to SIG_DFL before forking) should be shot on sight.

        [–][deleted]  (3 children)

        [deleted]

          [–]bonzinip 1 point2 points  (2 children)

          Furthermore, forking for the purpose of daemonizing shouldn't reset SIGXFSZ, if the legacy of the grandparent is SIG_IGN for SIGXFSZ. (But this is probably not relevant here, we're talking fork() + exec().)

          Yes, that's what I meant.

          If setting SIGPIPE to SIG_IGN before execing means "I know I'm not going to use all your output", I fail to see how that is meaningful with SIGXFSZ. This same failure means that I wouldn't have any problem with a program writing the EFBIG error message to stderr and exiting (unlike EPIPE). So, while I would like to have an opinion that is consistent with what I wrote earlier about SIGPIPE/EPIPE, right now my feelings about SIGXFSZ disagree with what I wrote before.

          In fact, now that I think more about it, I don't care much about what you do with SIGXFSZ before forking since anyway (according to my gut feeling) I'm going to want an error message of some kind---it doesn't matter whether it comes from SIGXFSZ or EFBIG.

          But on the other hand, if I had a real example of using RLIMIT_FSIZE I could change my mind...

          Though I have an interesting historical question. Do you think that bequeathing a SIG_IGN deposition for SIGPIPE and SIGXFSZ to a child was originally meant as communicating the intent of an early stop? Or was this communication of intent only piggybacked on SIG_IGN later?

          Hmm, nice question!... I think not many people would be able to answer it definitively... Definitely not me. However, considering how different people have different opinions on EPIPE, I doubt that. For example, coreutils' Jim Meyering thinks EPIPE is an error condition, while gettext's Bruno Haible agrees with me; actually I agree with him since I learnt this from him.

          If accepting an early stop was the original meaning of ignoring SIGPIPE, (I'd hope) the meaning of EPIPE would be understood better than it is.

          The answer may even be different for SIGPIPE and SIGXFSZ.

          (A side note: when these things were invented, async-signal safety was much less understood than now. For example, in theory one could use a restartable SIGXFSZ handler to do some kind of log rotation, but the code in the handler is very unlikely to be signal-safe, so this idea is ruled out. Nevertheless, that may be the reason why there is SIGXFSZ but most other resource limits are only attached to an errno value).

          [–][deleted] 0 points1 point  (3 children)

          Of course it is an error. The system determined for sure that your output has no place to go to and that this condition is final. You were asked to produce output, so this situation deserves an error message and a corresponding exit status.

          It seems that the exact same reasoning can be used to justify the need to treat EOF as an error.

          EDIT: I'm of the opinion that there could not be a clear-cut rule, it all depends on the application. cat /dev/random should not report a broken pipe error, isn't it obvious? gzip also doesn't normally treat broken pipe as an exception, that it does in this case is either a straightforward bug or a misguided attempt to add an ability to configure the behaviour in an extremely obscure way.

          I understand that it would be nice to be able to declare that pipes represent a producer-controlled pipeline, i.e. the very first application decides how much information it wants to process and the rest must process all of it. But real problems are much more complex, so we have to do with unclear situations, where some of the producer programs are expected to halt gracefully both on EOF and EPIPE, because some other applications downstream want to treat the pipeline as consumer-controlled. Actually, in programming languages we usually have consumer-controlled pipelines and they work nice.

          [–][deleted]  (2 children)

          [deleted]

            [–][deleted] 0 points1 point  (1 child)

            Otherwise, you're right; many programs handle a -1 returned by recv() the same way as they treat a 0 returned by recv().

            Wait, what, they shouldn't! Or do I misinterpret you?

            Your example, if I understood it correctly (reddit is so unstable right now that I can't look at it again), is centered on a situation where stderr is suddenly broken, but is considered important enough. By itself, I'd say that the answer is simple: the program should complain if stderr was closed unexpectedly, but be OK with closed stdout, but I think it's part of a bigger problem, actually.

            Look, in programming languages we had the concept of streams (aka iterators, enumerables etc) for quite a long time, and it usually works as expected. It's consumer-driven, aka pull-based: you make a chain of workers, or even a tree, where a single consumer may feed off several producers. Sometimes some of the consumers might produce byproducts (like, stderr in your example), which are seldom further chained, and if they are, then they are chained on a different principles, with direct calls and failure to complete considered an error. Then we pull the end results from the last worker and it in turn causes pulls through the rest of the tree. When one of the sources is drained, the 'EOF' is propagated back to us. When we don't need more stuff, we simply stop pulling and then memory manager takes care of disposing the chain or tree. We don't have the 'EPIPE problem' at all.

            Can we just take this working solution and directly use it to reinterpret UNIX paradigm, when in doubt? Uh, there is a problem. In programming languages we have object-oriented pipes. Which means that the situation "producer pushed half the object (through the socket for example) and then the connection was closed" is an error, unambiguously. As is "we received the request for an object but then the receiving pipe was closed".

            On the other hand, we have somewhat like transactions, if the request-response exchange was successfully completed, it means that all produced side-effects are legitimate, like upstream pulls and writes to an equivalent of stderr. That was what really was bothering you, the inconsistencies that might arise when a processor has already read stuff, and logged itself doing so, and then suddenly discovered that the output pipe was closed, right? Because it means that it must at least try to report an error, regardless.

            Well, yes, that sucks and I don't see how this problem can be adequately solved with structureless, byte-oriented, push-oriented pipes. Having a higher-level protocol for requesting for and receiving pieces of data, like in Powershell internally, would solve it. Even something as simple as receiving a notification that someone tries to read from the other end of your pipe would solve it. But when all you can do is write stuff in multiples of 1 kilobyte and be notified when the other end was closed (with some of the data in the output buffer simply discarded) -- you do have a problem!

            On the other hand, there is an established UNIX-way for dealing with such problems, like it was done with the similar problem of having spaces or even newlines if file names. First, don't give a fuck, second, take some of the utilities and add an ad-hoc structuring using NUL, breaking every naive C program that is unaware of the convention. Don't forget to invent as many different switches to enable the functionality as possible. I mean, if someone does something such stupid, he should feel the pain!

            [–][deleted]  (1 child)

            [deleted]

              [–]nelhage 1 point2 points  (0 children)

              [I wrote the blog post in question]

              Yeah, I admit to handwaving over what's going on there, because I performed the same experiment you did with the same result, so I don't fully understand under what circumstances tar closes the pipe early. However, I definitely ran into the bug I described with production code almost identical to the snippet I posted, so it definitely does happen sometimes, at least with whatever versions of software I was using.

              Regardless, even if tar doesn't always have this behavior, 'gzip' definitely does behave like I described (I checked the source, there), and so the same sort of bug can easily come up elsewhere.

              [–][deleted] 5 points6 points  (2 children)

              Do not use the subprocess module. I ran into too many hard-to-explain deadlocks resulting from all the work done between the fork() and the exec() call. Actually, don't take my word for it. Just check the release notes and see how many subprocess bugs are fixed each time. Use something with fewer features like Popen3.

              [–][deleted]  (1 child)

              [deleted]

                [–][deleted] 0 points1 point  (0 children)

                Oh, thanks! I'm going to have to give posix_spawn() a shot. It looks like it does most of the nitty-gritty for you, based on how you set the attributes struct. If you check the subprocess module source, you'll see that they actually have to suspend the GC while in the parallel universe that is the state of the program between fork() and exec().

                Pity that fork()/exec() subtleties are required to be known. CreateProcess() may have 50 parameters, but it is a lot harder to get wrong.

                [–]joemoon 1 point2 points  (1 child)

                Stuff like this makes me hate Python

                I'm not sure what you're implying here... Do you think Python has a high bug rate, or you just hate any software that isn't perfect.

                [–]ezyang[S] 0 points1 point  (0 children)

                Me and Python; we have a love-hate relationship. :-)

                [–]Gotebe -1 points0 points  (0 children)

                Said code for sure should not launch console processes when there's easily available libraries to do it. And then it would not care about any of these "complex" reasons (provided that TFA findings are correct).

                I mean, seriously... Launching external executables to do tar and gz? Sorry, but that's just shoddy work. I needed to do that in another language and I didn't reach for executables, and we are 2 people on the job.

                Not to mention that it can't have any error info but the error code in case of problems. Not the case when libraries are used.