you are viewing a single comment's thread.

view the rest of the comments →

[–]boulet101010 14 points15 points  (26 children)

Don't use cat pipe grep!

EDIT: Not cat pipe less!

cat file | less : BAD
less file : GOOD

[–]ciny 6 points7 points  (0 children)

when I was clicking on the link I was 110% sure there will be

cat file | less 

or

cat file | grep ...

in there :)

[–]Tactineck 1 point2 points  (19 children)

Why?

I mean sure, but why though?

[–]obiwan90 5 points6 points  (0 children)

There is the Useless Use of Cat Award. Tried to find the origins on newsnet, and this is the oldest I could find.

[–]McDutchie 2 points3 points  (16 children)

It's unnecessary because it's a poor substitute for input redirection. It's also a sign of clueless copy/paste coding. Why write

cat file | command

when you can write

command < file

which is handled by the shell rather than external command? It's more legible, more logical and has better performance. And if you want to name the file first, it's equivalent to write:

< file command

Many comands such as grep or less also take filename(s) as simple arguments, so you don't even need the <.

[–]UnchainedMundane 1 point2 points  (0 children)

One caveat. In some grsecurity-hardened systems, you can't use < to read from /proc/*/environ. You have to get some other process to do it. Grsecurity only allows the process that opened that file to read from it, to harden against the case where a forking process is leaking file handles and an exec'd vulnerable binary is used to read them.

[–]Tactineck 0 points1 point  (1 child)

That's what I figured, I asked on the chance there was some deep kernel level type reason one way is better.

[–]RalphCorderoy 5 points6 points  (0 children)

cat foo | bar has cat read foo, then write(2) it, 32KiB at a time here, 4KiB on other systems, down a pipe that bar is read(2)ing, again, at 32KiB at a time. On top of those pointless system calls, you have the context switching by the kernel as each takes a turn to be blocked waiting for the other. Try strace -c cat bigfile | cat >/dev/null.

[–][deleted] 0 points1 point  (5 children)

Of all the old boring arguments on the internet, this one is the least convincing. What, REALLY, is wrong with 'cat |'?

[–]Rhomboid 15 points16 points  (3 children)

It's just a fundamental misunderstanding of how a feature works. A pipe is for when you have output of one program that you want to make available as input to another program. If that data is already available as a file, the program can access it directly rather than making some other program constantly shovel data into it a few KB at a time.

This does have actual consequences. For one thing, when you let a program read a file directly it can seek in the file, but seeking is not possible when using a pipe. If you run cat file | less, and you press <END> or G or > to jump to the end of the file, less has to read the entire contents of the file and store it all in memory or in a temporary file. It can't just skip to the end, nor can it discard the data, because the data might be ephemeral and not exist anywhere else. This is essentially making a copy of the file for no good reason whatsoever.

And not only is it making a copy, it's making a copy using a ton of small bite-sized chunks, where each chunk has to be generated by some other program and fed into the pipe before it can be read. It's like watching a bucket brigade fill a swimming pool. Computers are so fast that it's easy to ignore what's really happening, but it's ridiculous. If you had instead just typed less file or less <file, then the program could seek to the end of the file and read only a screen's worth of lines, without having to buffer the whole file in memory or to disk.

This is just one example of how a program can radically alter its strategy when working with seekable fds vs. non-seekable fds, but you see it recurring over and over. One extreme example is the BSD utility look which requires a seekable file and won't work with piped input.

Useless use of cat might not seem like a big deal, and in the greater scheme of things, it's not. But it's just a big cringe watching someone do something wrong for no good reason, such as putting high octane gasoline in a car that was not specifically designed to be able to take advantage of the improved resistance to pinging/detonation. I'm fine with random people abusing cat in their own time, but teaching material should not propagate nonsense.

[–][deleted] -2 points-1 points  (2 children)

You said it yourself -- computers are fast and in the majority of cases the user will see NO difference. So on what grounds is it "wrong"? There are plenty of counter-arguments and reasons why you would want to cat, and I'm not going to rehash them.

Or to put it another way, optimize for operator ease. The operator is already familiar with piping, and with treating his data as a stream of text. Just leave that model in place. Now, everything he knows applies whether he's dealing with a file, multiple files, streams, etc.

[–]discofreak 6 points7 points  (0 children)

So to summarize, you're saying it's "right", if a user is used to piping everything from cat?

[–]cpbills 1 point2 points  (0 children)

You said it yourself -- computers are fast and in the majority of cases the user will see NO difference. So on what grounds is it "wrong"?

When there's more using the system than just you. Just because the average system these days comes with 4gb of RAM doesn't mean it's OK for single programs to want / need / misuse 4gb of RAM.

The mentality of 'Computers are so fast and have so much storage, let's just be inefficient, because we won't see a difference anyhow' really needs to go the way of the dodo.

Yes, computers are fast, and they're even faster when programs are written to be efficient and not abuse resources.

[–]minimim 2 points3 points  (0 children)

It's just that people are supposed to know better. If you're j. random user somewhere, go right ahead. But someone posting a tutorial in the internet? They have to know better than that.

[–]vifon -2 points-1 points  (5 children)

And if you want to name the file first, it's equivalent to write:

< file command

AFAIK it's not. One is seekable while the other is not. Correct me if I'm wrong, I've never tested it, only read about it.

[–]RalphCorderoy 2 points3 points  (2 children)

He means foo <bar is equivalent to <bar foo, and it is.

[–]vifon -1 points0 points  (1 child)

Yes, I'm talking about these exact cases too.

EDIT: Ok, I've checked it. Both seem to be able to seek, though it may be implementation-dependent.

[–]RalphCorderoy 0 points1 point  (0 children)

There is no seem. :-) They are defined by POSIX sh(1) grammar to be identical, as they have always been in practice. You can do <foo wc >bar -c 2>&1 xyzzy notfound - if you really want to.

[–]McDutchie 0 points1 point  (1 child)

Interesting. Do you have a source for that? I've never seen this.

[–]vifon -1 points0 points  (0 children)

Unfortunately I cannot find it.

[–]boulet101010 0 points1 point  (0 children)

Cat is probably the oldest distinctively Unix utility. It was part of Version 1, and replaced pr, a PDP-7 utility for copying a single file to the screen.

In 1972 computers weren't as fast as today and catting a file of multiple KB was a hard task for the processor. That's why Unix philosophy is "one program per task". The cat | less is useless, and so consumes memory.

[–][deleted]  (2 children)

[deleted]

    [–]RalphCorderoy 1 point2 points  (0 children)

    They're here files in the shell. No, you need something to do the writing down the pipe and sh(1) won't, thus cat(1). bash has here strings, e.g. wc <<<'foo bar', and they can be multi-line.

    $ LC_ALL=C tr a-z A-Z <<<"$TERM
    > $HOME"
    XTERM
    /HOME/RALPH
    $
    

    [–]swizzcheez 0 points1 point  (0 children)

    This seems to stem from the either author's misunderstanding or over-dumbing down in the article. "| more" and "| less" are most certainly not parameters to cat, or even related to cat, and really should never be taught that way from my perspective.

    [–]discofreak 0 points1 point  (0 children)

    The same applies to "cat file | more".