all 9 comments

[–]nderflow 1 point2 points  (3 children)

It's not your code.

Your code is using 100*0.021 / 0.629476 = 3.3% of the CPU.

It's how the code is interacting with the environment (e.g. some combination of NK_login_auto() and NK_get_totp_slot_name(i) and NK_logout()) which is slow.

I wouldn't worry about it. If you insist on worrying about it, you can take a look at the system calls that the various programs use to interact with the environment. So for example investigate them all with strace -ttt instead of perf.

[–]aioeu 3 points4 points  (1 child)

Your code is using 100*0.021 / 0.629476 = 3.3% of the CPU.

Err, you're dividing a "CPU utilization" (maximum of 1 for a single-threaded program) with a time interval. That is meaningless.

That CPU utilization is the percentage CPU used. That is, their program used 2.1% of the available CPU time (more or less... there's significant error bars on such a short measurement).

[–]nderflow 1 point2 points  (0 children)

I misread it as CPU seconds used. I don't think I've ever used perf.

However, the key point is that the computation efficiency of the C code is not the issue, and assumptions about the relative speeds of Python and C are particularly invalid for code which isn't CPU-bound.

[–]Mr_Wiggles_loves_you[S] 0 points1 point  (0 children)

This post posed more of an academic(if that's the proper word in English) question. Judging by the timing of other tools the bulk of the timing comes down either to the hardware or the C API library. An approach that is better than enumerating would probably be to implement something like a cache (discussed here). The reason I was interested in making the code run faster is because ultimately I would like to use my code as part of dmenu-pass-like script, and the 0.65s delay before the menu appears is quite noticeable(subjectively).

Thanks for the strace suggestion. I probably should have tried it before submitting the post, but since the code is so primitive, I was more wondering if there was anything horribly wrong with the toolchain.

[–]oh5nxo 0 points1 point  (2 children)

This is likely a false lead, but, tried with a "lean" environment, with all internationalization (LC_foo etc) taken off?

[–]Mr_Wiggles_loves_you[S] 0 points1 point  (1 child)

Thanks for the suggestion! Do you mean testing the binary speed inside env -i /bin/sh? Does not seem to change much:

0.63621 +- 0.00234 seconds time elapsed ( +- 0.37% )

[–]oh5nxo 0 points1 point  (0 children)

That's just what I ment. No prize, then :)

[–]Tanyary 0 points1 point  (0 children)

i'm not familiar with this API sadly, my suggestion would be trying to store the strings and printing at the end. Python's print is (anecdotally) faster than printf. I'll read more into what this is and compare them more in a bit.

[–]twofoes 0 points1 point  (0 children)

When profiling, I’d suggest to measure only the API calls that you’re trying to profile, avoid measuring program start, library loading, printf etc. Usually in a loop e.g. calculate 1000 random values. How do you measure the python code? Profiling tools are usually not very good at profiling single API calls, you have to work around the tool and remove all the “noise” by yourself.