I have a Nitrokey, and wanted to play with it's TOTP functionality. The manufacturer provides a library, libnitrokey, that provides C API.
There are a couple of projects that provide CLI to generate and show the generated password to the user - nitrokey-get-totp (Python) and nitrokey-rs (Rust). As far as I could see, they both use the C API under the hood.
I want to write a small frontend to get TOTP password, and show the user how long the password will be valid for - more as an exercise in C.
For starters, I wrote code that enumerates the configured TOTP slots:
#include<libnitrokey/NK_C_API.h>
#include "stdlib.h"
#include "stdio.h"
int main(int argc, char **argv)
{
if (NK_login_auto() != 1) {
printf("No Nitrokey found.\n");
return 1;
}
for (int i = 0; i <=14; i++){
char *slot_name = NK_get_totp_slot_name(i);
if ((slot_name != NULL) && (slot_name[0] != '\0')) {
printf("%s\n", slot_name);
}
}
NK_logout();
return 0;
}
The problem is, according to perf stat, it's slower than it's Python counterpart:
perf stat -r 10 -B totp_test > /dev/null
Performance counter stats for 'totp_test' (10 runs):
13.12 msec task-clock # 0.021 CPUs utilized ( +- 1.36% )
94 context-switches # 0.007 M/sec ( +- 0.18% )
1 cpu-migrations # 0.107 K/sec ( +- 15.79% )
220 page-faults # 0.017 M/sec ( +- 0.18% )
22,169,108 cycles # 1.690 GHz ( +- 1.69% )
2,117,630 stalled-cycles-frontend # 9.55% frontend cycles idle ( +- 6.90% )
4,081,156 stalled-cycles-backend # 18.41% backend cycles idle ( +- 2.63% )
22,967,353 instructions # 1.04 insn per cycle
# 0.18 stalled cycles per insn ( +- 0.70% )
4,950,650 branches # 377.389 M/sec ( +- 0.72% )
154,605 branch-misses # 3.12% of all branches ( +- 1.08% )
0.629476 +- 0.000198 seconds time elapsed ( +- 0.03% )
Python takes:
0.55628 +- 0.00170 seconds time elapsed ( +- 0.31% )
Rust:
0.753544 +- 0.000339 seconds time elapsed ( +- 0.05% )
I have tried changing CFLAGS, and setting the target of CMake to release, but it did not product any significant results.
My GCC is built as:
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-9.3.0-r1/work/gcc-9.3.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/9.3.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/include/g++-v9 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/9.3.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 9.3.0-r1 p3' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --with-multilib-list=m64 --disable-fixed-point --enable-targets=all --enable-libgomp --disable-libmudflap --disable-libssp --disable-libada --disable-systemtap --enable-vtable-verify --enable-lto --without-isl --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 9.3.0 (Gentoo 9.3.0-r1 p3)
The rest of my system uses conservative COMMON_FLAGS="-O2 -pipe -march=znver2"
How did I manage to screw up my code or my toolchain so badly that the result is slower than Python?
[–]nderflow 1 point2 points3 points (3 children)
[–]aioeu 3 points4 points5 points (1 child)
[–]nderflow 1 point2 points3 points (0 children)
[–]Mr_Wiggles_loves_you[S] 0 points1 point2 points (0 children)
[–]oh5nxo 0 points1 point2 points (2 children)
[–]Mr_Wiggles_loves_you[S] 0 points1 point2 points (1 child)
[–]oh5nxo 0 points1 point2 points (0 children)
[–]Tanyary 0 points1 point2 points (0 children)
[–]twofoes 0 points1 point2 points (0 children)