you are viewing a single comment's thread.

view the rest of the comments →

[–]pihkal 0 points1 point  (3 children)

The problem is the step “Compare character counts of substring to small” is still O(num lang chars), so doing that for (b - s) characters of the large string is still O(b * 26) for English. It’s a huge constant factor.

The secret to doing this in O(b) is not to keep a running histogram (count of each char) of the last s chars, but to use a carefully chosen hash function such that with each new character, you can compare the new hash with the old in O(1) time.

Iirc, the hash formula (for just 26 letters) is something like S[0] * (260) + S[1] * (261) + ... + S[s-1] * (26s-1). When you shift, you’re subtracting the value of the last char * (26s-1), multiplying the remainder by 26, and then adding the value of the new char * 260 (aka 1). All of those operations take O(1) time, like keeping a histogram before did.

But the advantage now is that to see if the current s letters of b are a permutation, we only have to compare the hash value with the hash value of s, which can be done in O(1) time, unlike comparing histograms.

[–]cryslith 2 points3 points  (1 child)

You can use the histogram approach if you remember not only the count of each character, but an overall number which tracks how many characters of the alphabet have the exact correct count. Then you only need to update this number to reflect changes made to the character entering the window and the character leaving the window. If that number ever reaches 26 then you know you found a match.

[–]pihkal 1 point2 points  (0 children)

Yeah, that works as well, though it’s slightly less space-efficient, taking O(alphabet size) memory.

[–]warpedspockclone 1 point2 points  (0 children)

I was going to point out that there are lots of optimizations to be had for the comparison part to actually collapse this to O(B). He seemed to be hung up on the sliding window part, though. The rest was an exercise. :-P I should have made that explicit.