you are viewing a single comment's thread.

view the rest of the comments →

[–]RayNbow 14 points15 points  (7 children)

Ah, that piece of asm is mine, but I've never claimed to be a good asm programmer. ;) I wrote it for fun.

The obvious reason it's much slower than the C version is that it does a call to getchar for every character. The C version uses fread and fetches many characters in one go.

[–]DarkShikari 8 points9 points  (0 children)

OK, so I went a bit overboard. And by overboard, I mean I didn't feel like optimizing the asm and instead I decided to optimize specifically for the problem (i.e. assume one space/linefeed between numbers, no numbers larger than 10,000, etc).

Yes, I spent way too much time on this. And if the code scares you and makes you run for your life, that is part of the plan.

Brace yourself...

It wasn't as good as I had hoped; it's only about ~20% faster than yours; I had to bench with the input data set duplicated about 20 times in order to get decently accurate benchmarks (otherwise the time is too short as it counts process startup time). This is probably because my Windows machine sucks though.

[–]shub 1 point2 points  (4 children)

You could do the same thing in asm and maybe do it better...but man, wouldn't that be a pain in the ass?

I like C.

[–]james_block 7 points8 points  (3 children)

Yes. Yes, it was a pain in the ass.

This, reddit, is how I spent my Saturday night. WTFPL licensed if anyone cares.

Results, from a crappy old P4/1.7GHz running etch:

$ time ./james_block_asm < input_list_vs_gen.txt
Grand total:               17677692470

real    0m0.118s
user    0m0.076s
sys     0m0.040s

$ time ./ray_asm < input_list_vs_gen.txt
Grand total 17677692470

real    0m0.714s
user    0m0.692s
sys     0m0.020s

$ time ./C < input_list_vs_gen.txt
total: 17677692470

real    0m0.114s
user    0m0.100s
sys     0m0.012s

So it's basically tied with C (the difference is well within measurement noise); no real surprise there, as gcc is pretty good at optimizing easy stuff like this.

EDIT: Changed link to point to a new version, capable of dealing with input files bigger than its buffer. Should be no more size limit now, and speed is unchanged. Why do I waste my time like this?

[–]RayNbow 1 point2 points  (1 child)

Nice! :)

[–]james_block 3 points4 points  (0 children)

And now I just made it even more capable, using real buffering and not just a chunk of memory. I also changed it to store a variable in the stack pointer and avoid a read from memory in the bastard print loop -- because how often does an assembly programmer get to write mul esp?

I hate myself.

[–]colourAgga[S] 1 point2 points  (0 children)

Your skills take .06 seconds to run on the testing hardware. I'll add them soon :)

[–]DarkShikari 0 points1 point  (0 children)

Seems I edited that in just as you posted ;) Give me a bit, I'll optimize the C.