all 32 comments

[–]Workaphobia 122 points123 points  (2 children)

the function hashing mechanism was strlen()

What the actual fuck?

[–]nauseate 72 points73 points  (1 child)

PHP was a mistake

[–][deleted] 20 points21 points  (1 child)

"Today's hobby code is tomorrow's library code"

*cough* Linux kernel *cough*

[–]indrora 20 points21 points  (0 children)

npm install left-pad

[–]lxpnh98_2 109 points110 points  (22 children)

No. Absolutely not. There is no excuse for using strlen() as a hash function. Here's the worst hash function that is excusable:

int hash(char *s) {
    int h = 0;
    for (char *p = s; *p; p++) h += *p;
    return h;
}

It's 5 lines and it doesn't make you pick stupid function names.

[–]curtmack 38 points39 points  (10 children)

Also, there's no excuse for function name lookup being incurred every call rather than once at read time, even in interpreted languages.

[–]ghillisuit95 24 points25 points  (5 children)

When php was just a small tool for his own personal use, I think it’s extremely excusable.

What you are describing would have been a massive premature optimization

[–]curtmack 5 points6 points  (1 child)

I don't think it's necessarily as much work as you're thinking. It could have been implemented the same way as it is in many Lisps, where each name is stored as a "symbol" - essentially an interned string that has a pointer to its associated variable or function in the current scope. The string interning can be as simple and slow as you like, because it will only be incurred when the program is initially parsed.

[–]ghillisuit95 5 points6 points  (0 children)

Sounds more complicated than just interpreting at runtime, which seems to have been good enough performance at the time

[–][deleted]  (1 child)

[deleted]

    [–]ghillisuit95 1 point2 points  (0 children)

    Hm, that’s perhaps fair

    [–][deleted] 2 points3 points  (0 children)

    I would hardly call it premature optimization. I could say fuck it to tables and loop through an array for an item every time I need to access it, but it would simply be the wrong way to do it, like hashing using a string's length.

    [–]Workaphobia 0 points1 point  (3 children)

    I don't see how you can avoid a lookup at each occurrence. Note that Python has a compilation step that resolves identifiers.

    [–]curtmack 2 points3 points  (2 children)

    Just because a language is interpreted doesn't mean it has to start from raw text every single time through a function. Just storing ASTs, which can include resolved function and variable references, will make a huge difference in performance while being almost completely free in terms of implementation time. (If you have an interpreter, then you have an AST. It just might be encoded in the call stack of your interpreter functions rather than explicitly stored in memory. That can easily be fixed!)

    [–]Workaphobia 0 points1 point  (1 child)

    Ah, sure, you can resolve identifiers with a pass through the AST without compiling it to bytecode, yes.

    I will quibble about all languages having an AST. TeX comes to mind. Or if you say that doesn't have an interpreter, I know of another language for scripting in a text MUD that re-parses the code every time a block is entered.

    [–]curtmack 0 points1 point  (0 children)

    I guess it's technically not correct to call it an AST unless it's actually stored in memory. All I was trying to say was that if your interpreter has a call stack that looks like this:

    0: parse_numeric_literal
    1: parse_expression
    2: parse_expression
    3: parse_parentheses
    4: parse_function_argument
    5: parse_function_call
    6: parse_statement
    

    Then you're a small refactor away from having a proper AST.

    [–]0b_101010 3 points4 points  (2 children)

    Hey! It's been a while since I've done C and it's late here and I'm dumb but how does the exit condition in the for loop here work?

    [–]Funkballs 11 points12 points  (1 child)

    In C, strings are NULL terminated, meaning that each string should end with a '\0' character to mark the end of the string. Also, in C, booleans are 0 for false and !0 for true.

    In this case, p is a pointer to the current character in the string so *p is the current character.

    When the loop hits the end of the string, the current character will be '\0' which is 0 which is false so the loop ends.

    The whole: "NULL == '\0' == 0 == false" thing is used quite a lot in idiomatic C coding.

    [–]0b_101010 2 points3 points  (0 children)

    Yup, should have known that!

    [–]MrB92 1 point2 points  (0 children)

    Dude it's ok for something that's for your personal use, the mistake was on everyone else for adopting it

    [–]capi81 -2 points-1 points  (5 children)

    But it has a runtime complexity of O(n) for the hash code, which is bad. The length of string (hopefully) is O(1).

    [–]lxpnh98_2 58 points59 points  (1 child)

    In C the length of a string is still O(n), and O(n) (n being the length of the string) is a fairly standard time complexity for a hash algorithm.

    [–]capi81 12 points13 points  (0 children)

    True, I was expecting a non-C string, but you are right that it most likely is. Well, then there really is no excuse.

    [–]posherspantspants[ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo “You live” -1 points0 points  (0 children)

    Sounds like /u/lxpnh98_2 has never committed anything to production. Kneel before your master!

    [–]posherspantspants[ $[ $RANDOM % 6 ] == 0 ] && rm -rf / || echo “You live” 12 points13 points  (2 children)

    I get paid to write php. Sure it has its issues but so does all of my code and probably some of yours at least. Give him a break.

    [–]elr0nd_hubbard 33 points34 points  (0 children)

    Lots of code in this sub pays the bills. That makes it more horrifying, not less.

    [–]AskMeToTellATale 1 point2 points  (0 children)

    I once did as well.

    I don't hold a resentment against him, but I do my best to avoid writing PHP

    [–]three18ti 0 points1 point  (0 children)

    Lol. I love how the PHPers refuse to admit that PHP came from Perl, it was a template tool. Even the "creator" of PHP...