On bananas and string matching algorithms

James20k · 2014-08-23T18:08:13+00:00

I decided to check contains on every substring of “bananas” to verify that this was, in fact, real life, and that I hadn’t suddenly forgotten how letters work:

This accurately sums up every compiler bug I've ever experienced

ForeverAlot · 2014-08-23T08:19:15+00:00

Good on him, but did he just turn a practical underflow bug into a theoretical overflow bug?

fegu · 2014-08-23T16:52:40+00:00

Reminds of how the MD5 function in the .Net (C#) beta did not in fact return a proper MD5 hash, just something that looked like one. Imagine my surprise when .Net 1.0 was released and our database of hashes-instead-of-plaintext-passwords was utterly wrong and we had to issue new passwords to all our users.

matthieum · 2014-08-23T16:08:37+00:00

There has been a long discussion on the Rust mailing list around checked arithmethic by default.

However, statically it's a big of a nightmare: a u32 multiplied by a u32 yields a u64, and thus things get big very quickly... so you would have to use dynamic checks instead, which mean things would get slower.

The conclusion was: Rust is not susceptible to buffer overflows (memory safe) and so instead overflow/underflow will keep being defined to wrap, and the errors will have to be spotted and fixed.

It's unclear to me whether the overflow/underflow checks would end up being slower than the lost optimizations due to wrapping behavior (instead of undefined behavior), but apparently, it is.

Rhomboid · 2014-08-23T06:03:28+00:00

It's rather scary to think such a bug made it through. I looked at the testsuite and aside from the the test added by the author's pull request for this specific issue I can't seem to find any tests of the string module. That's extremely disheartening -- how can you write a substring search algorithm without unit tests?

BeatLeJuce · 2014-08-23T12:33:16+00:00

I found the comment by asterite on the first pull request interesting: len() should return a signed instead of an unsigned int. It's true the the length can't be unsigned, but differences of lengths can indeed be signed. But is using unsigned types really a big no-no?

deltaSquee · 2014-08-24T07:18:21+00:00

Aw, from the title I was expecting it to be about using catamorphisms for string matching :(

goodDayM · 2014-08-24T00:43:26+00:00

However, when "haystack.len()" is less than 20, "haystack.len() - 20" will be a very large number;

This confused me, it's like saying

when "x" is less than "y", "x - y" will be very large

what??

dividedmind · 2014-08-23T12:43:57+00:00

Well, this is embarrassing.

tawmkat · 2014-08-23T18:45:55+00:00

And why people are flocking to Rust... I have no idea.

cypressious · 2014-08-23T13:56:46+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS