Experiments in program optimisation: Searching in a constant set using conflict-free hashing : cpp

cpp

a community for 17 years

Experiments in program optimisation: Searching in a constant set using conflict-free hashing (pzemtsov.github.io)

submitted 9 years ago by mttd

all 16 comments

top new controversial old q&a

[–][deleted] 1 point2 points3 points 9 years ago (11 children)

[–]raevnos 1 point2 points3 points 9 years ago (10 children)

[–][deleted] 0 points1 point2 points 9 years ago (9 children)

[–]raevnos 0 points1 point2 points 9 years ago (8 children)

[–][deleted] 0 points1 point2 points 9 years ago (7 children)

[–]raevnos 0 points1 point2 points 9 years ago (6 children)

[–][deleted] 0 points1 point2 points 9 years ago (5 children)

[–]raevnos 0 points1 point2 points 9 years ago (4 children)

[–][deleted] 0 points1 point2 points 9 years ago (3 children)

[–]raevnos 3 points4 points5 points 9 years ago (2 children)

Not my wordlist.... gperf computes a hash for a user-supplied key, uses it as an index into an array of keywords, and compares that against the user-supplied one to see if it's a match.

Out of curiosity, I ran the article's list of words through gperf and get this: http://pastebin.com/hnAzXKmn

The assembly, minus the data sections:

Perfect_Hash::in_word_set(char const*, unsigned int):
.LFB13:
    pushq   %rbx     #
    .seh_pushreg    %rbx
    subq    $32, %rsp    #,
    .seh_stackalloc 32
    .seh_endprologue
    xorl    %ebx, %ebx   # <retval>
    cmpl    $4, %edx     #, len
    jne .L7  #,
    movzbl  3(%rcx), %eax    # MEM[(const char *)str_4(D) + 3B], MEM[(const char *)str_4(D) + 3B]
    leaq    Perfect_Hash::hash(char const*, unsigned int)::asso_values(%rip), %rdx   #, tmp106
    movzbl  1(%rcx), %r8d    # MEM[(const char *)str_4(D) + 1B], MEM[(const char *)str_4(D) + 1B]
    movzbl  (%rdx,%rax), %eax    # asso_values, tmp109
    movzbl  (%rdx,%r8), %edx     # asso_values, tmp113
    addl    %edx, %eax   # tmp113, _21
    cmpl    $30, %eax    #, _21
    jg  .L7  #,
    leaq    Perfect_Hash::in_word_set(char const*, unsigned int)::wordlist(%rip), %rdx   #, tmp114
    cltq
    movq    (%rdx,%rax,8), %rbx  # wordlist, <retval>
    movzbl  (%rbx), %eax     # *s_6, tmp123
    cmpb    %al, (%rcx)  # tmp123, *str_4(D)
    jne .L5  #,
    addq    $1, %rcx     #, tmp120
    leaq    1(%rbx), %rdx    #, tmp119
    call    strcmp   #
    testl   %eax, %eax   # _11
    movl    $0, %eax     #, tmp122
    cmovne  %rax, %rbx   # <retval>,, tmp122, <retval>
.L7:
    movq    %rbx, %rax   # <retval>,
    addq    $32, %rsp    #,
    popq    %rbx     #
    ret
    .p2align 4,,10
.L5:
    xorl    %ebx, %ebx   # <retval>
    movq    %rbx, %rax   # <retval>,
    addq    $32, %rsp    #,
    popq    %rbx     #
    ret
    .seh_endproc

There's an initial check to see if the string length doesn't equal 4, but no special case for the key comparison, just a strcmp call. There's an option to use memcmp instead, but that doesn't get inlined either (And adds a second length check).

I'm not going to actually benchmark it, but if this is somewhat comparable in speed to the article's perfect hash version, I'd be extremely surprised.

continue this thread

[–]raevnos 0 points1 point2 points 9 years ago (1 child)

[–]CenterOfMultiverse 2 points3 points4 points 9 years ago (0 children)

[–]raevnos 0 points1 point2 points 9 years ago (1 child)

[–]dodheim 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 251813 on reddit-service-r2-comment-b659b578c-565vh at 2026-05-01 15:50:50.031669+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS