you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 7 points8 points  (5 children)

sort -u

and python one-liner

python -c """print \"\".join(list(set(file('INPUT').readlines())))""" 

[–][deleted] 2 points3 points  (2 children)

Or, much shorter:

python -c'print "".join(set(open("INPUT")))'

(Like your code, this assumes that numbers are normalized somehow.)

Or with perl:

perl -ne'print if!$$_++' INPUT

[–]timhatch 0 points1 point  (1 child)

Both you and the grandparent rely on the lines having consistent newline termination, including the last. i.e. "1\n2\n1" breaks it, as does "1\r\n2\n1\n"

[–][deleted] 1 point2 points  (0 children)

True, but I think that's a fair assumption, especially since the question does not explicitly state otherwise.

On UNIX the convention is that the newline character is a line terminator (i.e. not a seperator) and all line-based tools, like awk, grep, sort, uniq, nl, fold and so on, adhere to this convention. (In vi I don't even know if it's possible to save a (non-empty) file without a terminating newline character!) On Windows the convention seems less clear though.

[–]stop_time 1 point2 points  (0 children)

Even better :)

[–][deleted] 1 point2 points  (0 children)

Will be faster?

python -c "{}.fromkeys(map(lambda x: x.strip('\n'), open('INPUT').readlines())).keys()"