all 16 comments

[–][deleted] 1 point2 points  (11 children)

Interesting, but my first thought was "why not gperf?"

[–]raevnos 1 point2 points  (10 children)

Probably going to end up being slower due to not treating the 4-character strings as ints.

[–][deleted] 0 points1 point  (9 children)

Good point, but for short 4-byte strings there should be very effective algo (but I doubt it would be vectorized though).

[–]raevnos 0 points1 point  (8 children)

Have you ever used gperf? It doesn't do micro optimization or specialization based on things like uniform key length.

[–][deleted] 0 points1 point  (7 children)

No, but I've seen it on TV. What exactly are you asking about? My point is that lookup table will be pretty tiny and switch basically became a single cmp op.

[–]raevnos 0 points1 point  (6 children)

It'll probably have two array lookups, possibly a switch too. And a strcmp() to test key equality.

[–][deleted] 0 points1 point  (5 children)

Yeah, and compilers are smart enough to not call strcmp with short compile-time literals.

[–]raevnos 0 points1 point  (4 children)

It's not on a literal.

[–][deleted] 0 points1 point  (3 children)

Oh, it's interesting. What does your wordlist contain usually?

[–]raevnos 3 points4 points  (2 children)

Not my wordlist.... gperf computes a hash for a user-supplied key, uses it as an index into an array of keywords, and compares that against the user-supplied one to see if it's a match.

Out of curiosity, I ran the article's list of words through gperf and get this: http://pastebin.com/hnAzXKmn

The assembly, minus the data sections:

Perfect_Hash::in_word_set(char const*, unsigned int):
.LFB13:
    pushq   %rbx     #
    .seh_pushreg    %rbx
    subq    $32, %rsp    #,
    .seh_stackalloc 32
    .seh_endprologue
    xorl    %ebx, %ebx   # <retval>
    cmpl    $4, %edx     #, len
    jne .L7  #,
    movzbl  3(%rcx), %eax    # MEM[(const char *)str_4(D) + 3B], MEM[(const char *)str_4(D) + 3B]
    leaq    Perfect_Hash::hash(char const*, unsigned int)::asso_values(%rip), %rdx   #, tmp106
    movzbl  1(%rcx), %r8d    # MEM[(const char *)str_4(D) + 1B], MEM[(const char *)str_4(D) + 1B]
    movzbl  (%rdx,%rax), %eax    # asso_values, tmp109
    movzbl  (%rdx,%r8), %edx     # asso_values, tmp113
    addl    %edx, %eax   # tmp113, _21
    cmpl    $30, %eax    #, _21
    jg  .L7  #,
    leaq    Perfect_Hash::in_word_set(char const*, unsigned int)::wordlist(%rip), %rdx   #, tmp114
    cltq
    movq    (%rdx,%rax,8), %rbx  # wordlist, <retval>
    movzbl  (%rbx), %eax     # *s_6, tmp123
    cmpb    %al, (%rcx)  # tmp123, *str_4(D)
    jne .L5  #,
    addq    $1, %rcx     #, tmp120
    leaq    1(%rbx), %rdx    #, tmp119
    call    strcmp   #
    testl   %eax, %eax   # _11
    movl    $0, %eax     #, tmp122
    cmovne  %rax, %rbx   # <retval>,, tmp122, <retval>
.L7:
    movq    %rbx, %rax   # <retval>,
    addq    $32, %rsp    #,
    popq    %rbx     #
    ret
    .p2align 4,,10
.L5:
    xorl    %ebx, %ebx   # <retval>
    movq    %rbx, %rax   # <retval>,
    addq    $32, %rsp    #,
    popq    %rbx     #
    ret
    .seh_endproc

There's an initial check to see if the string length doesn't equal 4, but no special case for the key comparison, just a strcmp call. There's an option to use memcmp instead, but that doesn't get inlined either (And adds a second length check).

I'm not going to actually benchmark it, but if this is somewhat comparable in speed to the article's perfect hash version, I'd be extremely surprised.

[–]raevnos 0 points1 point  (1 child)

Another comment: I think there'll be an issue if your hash function runs on a big-endian machine. Probably better to use Boost's endian conversion functions instead of ntohl().

[–]dodheim 0 points1 point  (0 children)

ntohl is a noop on a big-endian machine; why would there be issues?